Big Data

Big Data has already gone through “puberty”

07.12.2021
Big Data has already gone through “puberty” and has found a firm foothold in the business world. These ideas were heard at the Primetime for Big Data conference of 10 November 2021 at the National Library of Technology in Prague. The main topic of the conference was the streamlining of business in terms of data evaluation, determining the information context and identifying trends in the further development of data processing and data visualization.

After several very interesting lectures, which I summarize at the end of the article, one of the most engaging parts of the conference for me was a panel discussion moderated by Daniel Stach from Czech Television. The basic connecting thread of the debate was the maturity of work with data at companies and their role in strategic decision-making. The debaters agreed that perceiving the role of data in these processes is crucial, and they are no longer able to imagine “enlightened” managers not using and understanding the importance of real-time information availability. On the other hand, the use of information via data mining is still very closely associated with the essence of the business model of a given segment or industry. Companies exist that have already passed the big final exams and process data generated from their systems and open sources very efficiently; however, there are also companies where the use of data does not take place on a “daily” basis. All of the participants in the discussion agreed unanimously that “These companies need to be evangelized and promoted to make their business more efficient.”

I had expected more technical-technological discussions from the seventh year of the conference, so I was all the more pleasantly surprised that the format of the conference was built beyond technical details and led more to philosophical thinking and ideas about the further direction of work with and the use of data as such. Data and information have connected humanity since time immemorial, and that which has fundamentally changed access, speed of access and data quantity are the reasons why we are in the data age and we are trying to make sense of the flood of available data.

I would therefore like to recommend that you first look around and see what data you work with and how to create assets from them - you do not have to look right away to see what your neighbours are doing and get even more data, because sometimes less is more, or as we say at Cleverbee “Quality first.”

 

Pavel Weigner

CTO and data enthusiast

 

 

 

For those of you who are interested in more detailed content of individual lectures, I am summarizing my findings below:

If you are sad, talk to your bot (1). This is how the first topic presented by Jan Musil (2), an expert who ran away from the commercial sector to academia, could be paraphrased. Jan Musil was able to create a very successful professional team of young scientists who deal with the synthesis, recognition and generation of human speech in the form of dialogue. The excellent results of their work were confirmed by winning first place in the Amazon competition - Alexa Prize (3) with the social bot Alquist (4), named after the robot Alquist from the novel R.U.R. by Karel Čapek (5). You can discuss thirty different topics with Alquist. “But it won’t tell you much about skydiving,” he adds, smiling at the availability of the open data Alquist draws on when learning.

An indirect parallel with the novel R.U.R could be found in the second contribution by Jan Rompotl (6) from Dataclair (7) who talks about artificial intelligence being “out of control”. He framed his story with Goodhart’s rule (8): “When a measure becomes a target, it ceases to be a good measure.” And thereby calls for a constant revision of the defined rules and for the optimization of the measured processes. In his impressive and humorous presentation, he approached this rule with common examples from life, such as hunting overgrown cobras in Africa, when the colonizers motivated the local natives by giving them rewards in return for the cobras they caught. The result was the expansion of cobra breeding to be bought by the colonizers.

From the contribution of Jakub Rajský (9) from TV Nova and Radovan Jirka (10) from BizzTreat, it was literally palpable how much evangelical effort was made to create and consolidate the database at TV Nova. The initiator of the data revolution in television was the takeover of CME by the PPF group. They documented their efforts and successes in increasing the viewership in the now immortal series Ordinace v růžové zahradě (11) on the VOYO streaming platform (12), which is moving to the new “Video on Demand” business model. However, the young folks did not end there and their goals are by no means mundane. VOYO is meant to be a Czech NETFLIX (13). Let’s wish them good luck and hope that in the case of Ordinace v růžové zahradě, unlike Squid Games (14) (Squid Games, Netflix (15)), the blood will remain in the blood cans.

Michael Štencl (16) from Cogvio (17), a company full of experienced data business personalities, gave one example of how to monetize open and public data. Cogvio offers up-to-date, and especially verified data to its customers in the pharmaceutical business in the context of the client’s needs, an example being the analysis of the entry of a new drug onto the market.

Marcel Vrobel (18) from Adastra first compared the need for data availability to logistics of shipowners, where users’ needs for data availability and scope increase to a level comparable to air traffic logistics, of course to the times outside of the pandemic. An important aspect of the change to a new type of management is the support of the company’s management and internal consultants in the implementation of this type of project.

Petr Hájek (19) from Profinit introduced us to the comparison of the approach of registration and maintenance of process models in relation to technical data tools in the company. The historical approach favouring the analysis of business processes with the subsequent assignment of technical tools has proven to be unsustainable in the long term. Therefore, Profinit is providing an opposite solution. The created system can “sniff” the infrastructure and map the used resources and their connections to users. Here too, however, a human factor is necessary in that it identifies the business areas and correctly allocates the found resources to them. “An undoubted advantage is the speed of obtaining an overview, maps of the current state of the infrastructure, users, groups and roles using these systems,” concludes Petr Hájek.

Martin Gerneš (20), Tribe Lead for Data, Česká spořitelna a.s., explained what successful data governance looks like in the case of large corporate and how important this company places on working with data. Over the course of several years, the company completely changed its attitude to data and project implementation and began building a modern data warehouse, and it also switched to an agile project management methodology. As part of the data revolution, data nests are created and assigned to user groups. In the nests, data stewards work together to communicate with users whilst maintaining common dictionaries. Data engineers prepare data and data specialists create the necessary reports, all to the satisfaction of users. However, the path was not easy. Nevertheless, if the data teams in the savings bank have the charisma that Martin Gerneš has, then all this is just the beginning of a successful path to efficient use of data in banking and the bank’s clients have something to look forward to.

Many lines have been written about the fact that waste is a raw material. The importance of recycling and a gentle approach to nature is the daily bread for the media and the daily food for readers or listeners. Cyril Klepek (21), founder and CEO of Cyrkl.com, went from words to deeds and built Cyrkl  –  Waste2Resource Marketplace (22). This marketplace is based on a simple idea: “Waste can be a valuable resource for others.” Using the intelligence of his team and the artificial intelligence of computer algorithms, he analyzed waste producers, waste and other open sources. The result of this is a marketplace that “tailors” the supply of waste based on its demand. Those who missed the Big Data lecture can at least read an interview in Forbes magazine: “Waste as a Source” (23).

“I just came in before skiing …,” said David Votava, introducing himself. David came from Zurich, Switzerland to the National Library of Technology in Prague to acquaint the audience with his opinion – Data is an asset. The co-founder of data:diligence stated that although data as an asset is not part of the financial balance sheet of companies and cannot be insured, data is beginning to be seen as a commodity of value. He summed up this in a very concise definition: “Data has value if it becomes information.” The importance of information today cannot be disputed, as evidenced by the fact that a well-known company secured a bank loan with its data, more precisely its value. In the next part, Adam Votava introduced Nobel Prize winner Bill Schmarz (24), who stated the Economic Digital Asset Valuation (25) theorem: “You cannot fully separate the value of data from society.” With the support of this theorem, he demonstrated the basic principles of sustainability of data projects.

 

 

Reference

1. Šedivý, Jan. Social Bot. Google Docs. [Online] 9 November 2021. [Cited: 20 November 2021.] https://docs.google.com/file/d/1TKHfglscrIvUNi8j2Ez_8AbkNiogHYz0/view.

2. —. Jan Šedivý - Linked in. Linked in. [Online] 12 12 2021. [Cited: 12 12 2021.] https://www.linkedin.com/in/jasedivy/.

3. Amazon, Inc. Alexa Prize - Teams. Alexa Prize. [Online] 20 November 2021. [Cited: 20 November 2021.] https://developer.amazon.com/alexaprize/challenges/current-challenge/teams/alquist.

4. AlquistAI. Youtube. Youtube. [Online] 29 8 2018. [Cited: 10 November 2021.] https://www.youtube.com/watch?v=AgYHA8FjS70.

5. Čapek, Karel. R.U.R. Wikipedie. [Online] 31 10 2021. [Cited: 20 November 2021.] https://cs.wikipedia.org/wiki/R.U.R..

6. Rompotl, Jan. Jan Rompotl. Linked in. [Online] 20 November 2021. [Cited: 20 November 2021.] https://www.linkedin.com/in/janromportl/.

7. Dataclair. Artificial Intelligence. Firemní web. [Online] 20 November 2021. [Cited: 20 November 2021.] https://dataclair.ai/.

8. Goodhart, Charles. Goodhartovo pravidlo. Wikipedie. [Online] 6 8 2021. [Cited: 20 November 2021.] https://cs.wikipedia.org/wiki/Goodhartovo_pravidlo.

9. Rajský, Jakub. Jakub Rajský. Linked in. [Online] 20 November 2021. [Cited: 20 November 2021.] https://www.linkedin.com/in/jakub-rajsky-a135632a/.

10. Jirka, Radovan. Radovan Jirka. Linked in. [Online] 20 November 2021. [Cited: 20 November 2021.] https://www.linkedin.com/in/radovanjirka/.

11. Nova, TV. Ordinace v růžové zahradě. Nova Plus. [Online] 20 November 2021. [Cited: 20 November 2021.] https://voyo.nova.cz/serialy/1-ordinace-v-ruzove-zahrade-2.

12. VOYO, TV Nova -. TV Nova - VOYO. VOYO. [Online] 20 November 2021. [Cited: 20 November 2021.] https://voyo.nova.cz/.

13. Netlix. Netflix - Home. Netflix. [Online] 20 November 2021. [Cited: 20 November 2021.] https://www.netflix.com/.

14. Hwang, Dong-hyeok. Hra na oliheň. ČSFD. [Online] 17 09 2021. [Cited: 20 November 2021.] https://www.csfd.cz/film/772224-hra-na-olihen/prehled/.

15. Netflix. Hra na oliheň - trailer. Youtube. [Online] 2 9 2021. [Cited: 20 November 2021.] https://www.youtube.com/watch?v=oqxAJKy0ii4.

16. Štencl, Michael. Michael Štencl. Linked in. [Online] 9 November 2021. [Cited: 20 November 2021.]

17. Cogvio. Technology & Data Science meet Pharma & Healthcare. Firemní web. [Online] 20 November 2021. [Cited: 20 November 2021.] https://www.cogvio.com/.

18. Vrobel, Marcel. Marel Vrobel. Linked in. [Online] 20 November 2021. [Cited: 20 November 2021.] https://www.linkedin.com/in/marcelvrobel/.

19. Hájek, Petr. Petr Hájek. Linked in. [Online] 20 November 2021. [Cited: 20 November 2021.]

20. Gerneš, Martin. Martin Gerneš. Linked in. [Online] 20 November 2021. [Cited: 20 November 2021.] https://www.linkedin.com/in/martin-gerne%C5%A1-8423864/.

21. Klepek, Cyril. Cyril Klepek. Linked in. [Online] 20 November 2021. [Cited: 2021 November 2021.] https://www.linkedin.com/in/cyril-klepek/.

22. Klapek, Cyril. Waste2Resource Marketplace. cyrkl.com. [Online] 20 November 2021. [Cited: 20 November 2021.] https://www.cyrkl.com/en/.

23. Mertová, Jana. Odpad zdrojem. Cirkulární ekonomika je cesta z krize, říkají její propagátoři. forbes.cz. [Online] 21 2 2021. [Cited: 20 November 2021.] https://forbes.cz/odpad-zdrojem-cirkularni-ekonomika-je-cesta-z-krize-rikaji-jeji-propagatori/.

24. Schmarzo, Bill and Borne, Kirk. The Economics of Data, Analytics, and Digital Transformation. s.l. : Packt Publishing, 2020. ISBN: 9781800561410.

25. Smarzo, Bill and Borne, Kirk. The Schmarzo Economic Digital Asset Valuation Theorem. TechTarget. [Online] 4 3 2020. [Cited: 20 November 2021.] https://www.datasciencecentral.com/profiles/blogs/schmarzo-s-economic-digital-asset-valuation-theorem-formulas.

26. Stach, Daniel. Daniel Stach. Linked in. [Online] 20 November 2021. [Cited: 20 November 2021.] https://www.linkedin.com/in/danielstach/.