2021. What an interesting year. With the world turned upside down by a pandemic that seemingly had its sights set on...
Data Integration For Your Hybrid Cloud
Modern, reliable data integration across your private and public cloud using change data capture and data streams. All in real-time!
Cloud Integration For Hybrid Operations
There is no getting around the topic of hybrid IT. Whether it is following a predetermined plan established by the enterprise or simply appearing somewhat randomly, the appearance of hybrid IT really shouldn’t come as a shock to any IT professional. The NonStop community has been well aware of the need for NonStop systems to coexist and cooperate in a hybrid world from the very first time a NonStop system was deployed front-ending mainframe applications. The value of NonStop was recognized based on its most fundamental attribute; unlike the mainframe it never needed to be taken offline making it the perfect platform to support any customer facing application requiring 24 x 7 availability.
Fast forward to today; it’s just as apparent that in the world of edge to cloud platforms as a service, as promoted by HPE, no one vendor or service provider can do it all. It may have gone out of fashion, but in all seriousness, best-of-breed still plays an important role in the way IT builds-out its enterprise wide infrastructure and nowhere is this more visible than when it comes to choosing cloud service providers. The hybrid world of IT that was a reflection on the presence of two or more systems has become more complex in that traditional, on-prem systems, are being complemented by clouds oftentimes supported by multiple services providers. In some cases this hybrid cloud environment can evolve to where it is the sole IT resource of the enterprise as the need for IT comes under close scrutiny in a business environment where the presence of IT has been distributed across the many operational divisions within the enterprise.
While the hybrid world of IT continues to evolve and where the presence of hybrid clouds has become a reality, there is the bigger question of just how best exactly is the enterprise going to manage the data created and captured in volumes that continue to multiply each year. This is the subject of the November 15, 2021, post to the Striim blog by John Kutay, Data Warehouse vs. Data Lake vs. Data Lakehouse: A Quick Overview. “What good is all that data if companies can’t utilize it quickly?” is the question Kutay poses before exploring the value proposition of the different data storage architectures, especially when the intent of gathering all the data created by the enterprise is in support of analytics and even machine learning (ML).
Data warehouses have been with us for some time. You can look back at the 1980s when data warehouses first appeared as a way take the data created at that time and transform it from a role supporting just operational transactional systems to where it could readily support decision support systems. While there was improved data standardization, quality and consistency, Kutay notes that it came at the expense of data flexibility. “Although data warehouses perform well with structured data, they can struggle with semi-structured and unstructured data formats such as log analytics, streaming, and social media data,” said Kutay. “This makes it hard to recommend data warehouses for machine learning and artificial intelligence use cases.”
At the risk of creating yet another silo, many years later the idea of data lakes appeared. In many ways data lakes combined structured and unstructured data leaving it to the process of Extract, Load and Transformed (ETL) in support of the analysis subsequently performed. However, even as data warehouses had their critics so too did data lakes. The critics of data lakes often quote Sean Martin, CTO of Cambridge Semantics:
“We see customers creating big data graveyards, dumping everything into Hadoop distributed file system (HDFS) and hoping to do something with it down the road. But then they just lose track of what’s there. The main challenge is not creating a data lake, but taking advantage of the opportunities it presents.”
Data lakes may be good at storing all sorts of data, but even so, as Kutay notes in his post, data lakes do come with “Poor performance for business intelligence and data analytics use cases.” Furthermore, “If not properly managed, data lakes can become disorganized, making it hard to connect them with business intelligence and analytics tools.” With this in mind, it provides the background for why the industry – in the hybrid IT world that creates so many of today’s headlines – has begun to talk about data lakehouses. View this new architecture in support of data as a potential marriage featuring the best of data warehouses and data lakes. Simply put, “Data lakehouse architecture combines a data warehouse’s data structure and management features with a data lake’s low-cost storage and flexibility.”
In the chart that is provided in this latest post to Striim’s blog, comparisons can be made between the competing architectures:
Data Warehouse vs. Data Lake vs. Data Lakehouse:
A Quick Overview
For the NonStop community where the topic of data integration has been the subject of numerous presentations of late, Striim can provide a data streaming platform that will ensure NonStop can thrive in the world if hybrid IT. This includes tapping into the world of hybrid cloud as well as supporting many NonStop users beginning to take data created on NonStop and stream it to data storage architectures supported by the major cloud service providers. It is into this marketplace that Striim can prove beneficial to any NonStop user looking to address their data integration requirements and with NonStop users having already deployed Striim, perhaps it is time you gave Striim consideration for your own data integration requirements when it’s all about an emerging world of hybrid clouds.
Should you have any questions about the Striim’s ability to provide a path into data lakehouses please don’t hesitate to reach out to the Striim team. We would be only too happy to hear from you, anytime and all the time.
Ferhat Hatay, Ph.D.
Sr. Director of Partnerships and Alliances, Striim, Inc