2021. What an interesting year. With the world turned upside down by a pandemic that seemingly had its sights set on...
Data Fabric: What is it and Why Do You Need it?
Striim
It was only a short time ago that we wrote about the benefits that come from embracing a data mesh if you truly want completely decentralized data architecture. However, as we continue to read of data base, data store, data warehouse, data lake, data mesh, data fabric and much more, it has become clear that there really isn’t a one size fits all and for the NonStop community there is always the added complexity of “fitting in” whenever IT promotes support of one approach or the other.
We also wrote at that time of how data mesh shouldn’t be confused with what has been called a data fabric, but as we noted, there are fundamental differences both in terms of where data is stored and in who is ultimately responsible for the data:
“For one, data fabric brings data to a unified location, while with data mesh, data sets are stored across multiple domains. Also, data fabric is tech-centric because it primarily focuses on technologies, such as purpose-built APIs and how they can be efficiently used to collect and distribute data. Data mesh, however, goes a step further. It not only requires teams to build data products by copying data into relevant data sets but also introduces organizational changes, including the decentralization of data ownership.”
In the October 11, 2021 post to the Striim blog, Data Fabric: What is it and Why Do You Need it? John Kutay and Mariana Park revisit this topic of data architectures and their focus returns to data fabrics. Whether it is data fabrics, data meshes or something not yet visible over the horizon the challenge remains; what is the preferred manner for managing the data end-to-end. For Striim this latest post represents an opportunity to update the NonStop community on just how flexible and indeed comprehensive Striim’s data streaming platform has become. Whatever data architecture a NonStop user may face, Striim has the implementation best suited to the goal of getting data created on NonStop to where a variety of analytics processes can turn that data into business insights. As Kutay and Park observe:
One of the ways organizations are addressing these data management challenges is by implementing a data fabric. Using a data fabric is a viable strategy to help companies overcome the barriers that previously made it hard to access data and to process it in a distributed data environment. It empowers organizations to manage mounting amounts of data with more efficiency.
Data fabric is one of the more recent additions to the lexicon of data analytics. Gartner listed data fabric as one of the top 10 data and analytics trends for 2021.
According to Gartner, a data fabric should have the following components:
- A data integration backbone that is compatible with a range of data delivery methods (including ETL, streaming, and replication)
- The ability to collect and curate all forms of metadata (the “data about the data”)
- The ability to analyze and make predictions from data and metadata using ML/AI models
- A knowledge graph representing relationships between data
Gartner listed data fabric third among its top 10 data and analytics trends for 2021 further highlighting just how important is consideration of this data architecture. For Gartner, this elevation in importance can best be summarized with the simple observation:
Data fabric reduces time for integration design by 30%, deployment by 30% and maintenance by 70% because the technology designs draw on the ability to use/reuse and combine different data integration styles. Plus, data fabrics can leverage existing skills and technologies from data hubs, data lakes and data warehouses while also introducing new approaches and tools for the future.
It is with this in mind, Kutay and Park then describe the value Striim provides the NonStop community faced with the dual challenges of decentralization together with leveraging the skillsets that they already possess. Of significance too is the awareness of the importance of the role NonStop systems play in supporting mission critical applications. As the graphic atop this article demonstrates, Striim continuously ingests transaction data and metadata from on-premise and cloud sources and is designed ground-up for real-time streaming with:
- An in-memory streaming SQL engine that transforms, enriches, and correlates transaction event streams
- Machine learning analysis of event streams to uncover patterns, identify anomalies, and enable predictions
- Real-time dashboards that bring streaming data to life, from live transaction metrics to business-specific metrics (e.g. suspected fraud incidents for a financial institution or live traffic patterns for an airport)
- Hybrid and multi-cloud vault to store passwords, secrets, and keys. Striim’s vault also integrates seamlessly with 3rd party vaults such as HashiCorp
Continuous movement of data (without data loss or duplication) is essential to mission-critical business processes. Whether a database schema changes, a node fails, or a transaction is larger than expected — Striim’s self-healing pipelines resolve the issue via automated corrective actions. All of which is of uttermost importance for the NonStop community when considering the options available today. With its history of supporting some of the biggest users in the NonStop community, Striim is ideally placed to help implement whatever data architecture you mandate, be that a lake or stream, a warehouse or even a mesh or fabric.
Should you have any questions about the Striim’s ability to provide an entry into data mesh architectures, please don’t hesitate to reach out to us, the Striim team. We would be only too happy to hear from you, anytime and all the time.
Ferhat Hatay, Ph.D.
Sr. Director of Partnerships and Alliances, Striim, Inc