2021. What an interesting year. With the world turned upside down by a pandemic that seemingly had its sights set on...
With Striim, when it comes to focus, it’s all about streaming data integration!
Striim
DanWhatever has happened to discussions about Big Data? If you go to Google and search on Big Data you quickly come to realize that there remain many references, but then again, whereas there are sites talking about the history of Big Data, it makes you think twice. When it comes to Big Data, is it fair to say that the IT industry has indeed moved on? Remarkably, it’s as if Big Data really never happened, but in reality, Big Data simply opened the door to discussions today’s enterprises just had to have. How do we take the data we have and turn it to our benefit as we drive towards a customer centric view of the business?
“In January 2019, perennially unprofitable Hortonworks closed an all-stock $5.2 billion merger with Cloudera. In May 2019, another Hadoop-based provider, MapR, announced that it would shut down if it were unable to find a buyer or a new source of funding. On June 6, 2019, Cloudera’s stock declined 43% after it cut its revenue forecast and announced that its CEO is leaving the company. Valued at $4.1 billion in 2014, Cloudera’s current market cap is $1.4 billion.
“Is this just the end of Hadoop or is it the death of Big Data? Was our fascination with lots and lots of data only a temporary bubble?”
The above was part of an article by Gil Press that appeared July 1, 2019, on the Forbes web site under the tab, Innovation. The reference to MapR is now a moot point as MapR was acquired by HPE just a matter of weeks after this article was published, even as Striim along with many others in the industry remain puzzled by HPE’s decision. On the other hand, as Press notes, “was our fascination with lots and lots of data only a temporary bubble? Press headlined his article with Big Data Is Dead. Long Live Big Data AI. Even when ignoring the declining valuations and the rush to merge, enterprises have been a little skeptical about the costs of big data and the value derived when big data was in play and now the sceptics are out in full force, asking “whether an open-source distributed storage technology which Google invented (and quickly replaced with better tools) could survive as a business proposition at a time when enterprises have moved rapidly to adopting the cloud and “AI”—advanced machine learning or deep learning.”
Almost tongue-in-cheek in its conclusion, Press highlights in the Forbes article the following:
“Digital transformation is finding out what data can do to your business decisions and actions. It’s focusing your company on mining and benefiting from its second-most important resource after its people: Data. While digital-born, Web-native, data-driven companies such as Google and Salesforce have been doing this for twenty years, many other businesses around the world, large and small, are now in full digital transformation mode, exploring the power of data eating the world. In the process, they tap into IT resources and data science tools in the cloud and experiment with advanced machine learning or deep learning. The remarkable and rapid progress in computer vision and natural language processing capabilities over the last 7 years has been enabled by big data—lots of tagged and labeled online data. Deep learning is Big Data AI.”
Then again, Press did headline his article in part with the warning, Long Live Big Data AI. Separately, in another article originally published on Forbes, it was Striim CTO, Steve Wilkes, who built on these observations by Press. In his post to the Striim blog of August 19, 2019, Real-Time Data is for Much More Than Just Analytics Wilkes begins by stating:
“As the age of big data fades into the sunset — and many industry folks are even reluctant to use the term — there is much more focus on fast data and obtaining timely insights. The focus of many of these discussions is on real-time analytics (otherwise known as streaming analytics), but this only scratches the surface of what real-time data can be used for.”
Streaming analytics? Big Data AI? Turns out that what may be happening is a deeper understanding among enterprises today that in order to make sense of data you really do need to have captured all the data. Expressed another way, Wilkes simply notes how increasingly, even as enterprises pursue transformation to a digital enterprise, it all comes down to taking baby-steps and tackling one thing at a time:
“There are many reasons why streaming data integration is more common, but the main reason is quite simple: This is a relatively new technology, and you cannot do streaming analytics without first sourcing real-time data. This is known as a “streaming first” data architecture, where the first problem to solve is obtaining real-time data feeds.
“Organizations can be quite pragmatic about this and approach stream-enabling their sources on a need-to-have, use-case-specific basis. This could be because batch ETL systems no longer scale or batch windows have gone away in a 24/7 enterprise.”
Streaming first takes into consideration the need to build out the data warehouse or lake – on-prem or in the cloud – without taking downtime and with little to no disruption to the enterprise. As Wilkes points out in his closing observations:
“A better approach to minimizing or eliminating downtime is an online migration that keeps the application running. To perform this task, source changes from the in-house database, using a technology called change data capture (CDC), as real-time data streams, load the database to the cloud, then apply any changes from the real-time stream that happened while you were doing the loading. The change delivery to the cloud can be kept running while you test the cloud application, and when you cut over, it will be already up to date.”
This brings us to a topic familiar to all members of the NonStop community. The principles of CDC are well understood and have provided the foundation for many NonStop vendor products supporting data replication and the implementation of Disaster / Recovery sites. And yet, it’s the kind of baby-step approach that enterprises should welcome – CDC holds the key to ensuring minimal to no downtime even as it quickly brings up to operational “speed” the data warehouse / lake enterprises need on their journey to providing the business with real-time analytics!
It would be left to another blogger, Hyoun Park, who in a June 17, 2019, post The Death of Big Data and the Emergence of the Multi-Cloud Era on the web site, Amalgam Insights, provides something close to an obituary for Big Data when writing:
“Big Data will be remembered for its role in enabling the beginning of social media dominance, its role in fundamentally changing the mindset of enterprises in working with multiple orders of magnitude increases in data volume, and in clarifying the value of analytic data, data quality, and data governance for the ongoing valuation of data as an enterprise asset.”
Striim understands that Big Data may not be as dead as some industry pundits suggest, but its role as once envisioned may be on life support, enterprises are coming to realize that the real issue here is selecting the streaming data integration product. “While real-time analytics and instant operational insights may get the most publicity and represent the long-term goal of many organizations,” concluded Wilkes, “the real workhorse behind the scenes is streaming data integration.” And that is the business need that Striim is addressing today! If you want to know more about Striim, streaming data integration, real-time analytics and yes even Big Data AI, then email or call us as we would be only too happy to discuss the many ways your enterprise could benefit from deploying Striim across your enterprise.