2021. What an interesting year. With the world turned upside down by a pandemic that seemingly had its sights set on...
The Modern Data Stack is dead
Commentary taken from recent LinkedIn social media posts
NonStop Insider
For the NonStop community the concept of stacks isn’t entirely new. Indeed, from the earliest days of Tandem Computers, presentations on the NonStop “integrated stack” often took center stage. And for good reason. NonStop has seen many changes made to the hardware but throughout its history, it has been the software components of the stack that have captivated the attention of the NonStop community.
One example of what makes up a modern data stack
(And of course, NonStop SQL/MX is one such source)
However, today data is taking center stage as we move to hybrid IT and a desire to turn data into information that in turn feed analytics processes. In so doing, the data created in real time on NonStop represents the enterprises source of freshest data and as such, has the potential to provide state-of-the day insights into how any enterprise is faring.
With the emphasis being directed at data and on the improved paths between data creation and data analysis, it is not surprising to read how data is the subject of its own stack and its own architecture. As such, it might have more than passing interest among many NonStop users who are looking to ensure NonStop doesn’t remain a technology island supporting just a single application.
What follows is the perspective of one industry analyst on this topic that certainly have us thinking particularly as it does reference data being created and then analyzed in rea time. But first, some definitions:
Data Stack: A set of technologies and services that organizations use to store, manage, and access data. Typically this is shared as a list of technologies and services, but the work and theory behind a given stack is much more multi-faceted than the simple format lets on.
The data platform architecture for an enterprise would be highly involved and consist of complex technologies. Over time, organizations would break their data platform into sections specific to certain aspects – applications, analytics, etc. The term “data stack” came into vogue to define the set of components or technologies to support the flow and use of data for analytics.
Rapid innovation in cloud data technology and exponential growth in the number of new products and companies in the data and analytics space is what’s making this possible. A subset of these tools is often referred to as the “modern data stack.”
A Modern Data Platform is a future-proof architecture for Business Analytics. It is a functional architecture which has all components to support. Modern data warehousing. Machine Learning and AI development. Real-time data ingesting & processing.
How to build your first data stack
- Get a data warehouse. A data warehouse is the central nervous system of the business. …
- Choose a data ingestion tool. …
- Define a process for modelling your data. …
- Craft an analytics process that provides value. …
- Find a Reverse-ELT solution.
Those members of the NonStop community already familiar with products like Striim and even the new additions to DRNet® (following their aggressive and timely move in supporting Kafka with the DRNet®/Unified product suite), will recognize the need for a data ingestion tool even as they will recognize much of the language that follows. But delivering the data created on NonStop to where it can be ingested is probably where the rubber first meets the road.
What follows is commentary by Ethan Aaron, CEO and founder, Portable, Inc. that focuses on the modern data stack and where you can find out more on this topic by following this link to his site – https://portable.io/ The way Aaron looks at this topic may intrigue many of our NonStop users:
Everyone I’ve talked to at #snowflakesummit believes it’s a wildly outdated term that doesn’t represent the market we live in today. Here’s why: My takeaways so far –
1. It’s not modern whatsoever
The first real iteration of ‘modern’ data stacks is ~10 years old. That’s not the reality we live in today.
2. Data stacks look different today
It’s no longer just ELT + warehouse + visualization. And there’s no longer one path to manage data within your organization.
3. Analytics, operations, and product development are converging
When people coined the term ‘modern data stack’, it was to build dashboards. Nowadays manual tasks can be automated and entire applications and businesses can be built with off-the-shelf data tooling.
4. There’s a talent shortage for most companies
Everyone wants to ‘do’ data, but they can’t find the talent. We need a way of providing accessibility to data without giving junior analysts the keys to the castle. There’s a balance we need here. And a ton of companies to shepherd into the world of being data-driven organizations.
5. For the most mature companies, the tech stack is riddled with tech debt and risk
When you start your data journey with junior analysts (because you can’t hire senior architects), you end up with a great starter stack, but over time, your needs are different. Security, governance, regulation, etc. have to be top of mind, and things need to scale. Your junior analyst shouldn’t be doing your financial reporting as a public company…
6. The types of ‘data’ buyers are fragmenting
Analysts, data engineers, analytics engineers. Coders vs. Business People. Open source vs. cloud-hosted. Two data tools that look identical are selling to different buyers. That can’t be the same ‘modern data stack’.
7. We have an interface problem
There’s been an explosion in data vendors. Everyone is building custom interfaces and partnerships thinking that will help, but the buyers in the market deserve a seamlessly integrated offering. There are ‘data stack as a service’ solutions and process orchestration tools that are starting to dig in here, but we’re early in providing something usable at scale. No one company solves this.
8. Real-time is real
Legacy ‘modern data stacks’ aren’t built to move data in real-time at scale, but there are big needs for these capabilities. It’s data, movement, and processing, but not the same thing as your analytics environment.
9. Actual logic still isn’t being shared across companies
How do we have an entire ecosystem, but no legitimate way to share SQL logic and patterns yet?
I don’t have a name for what comes next. All I know is it’s coming.
Although, in a follow-up post by Aaron he was quick to point out –
An infrastructure first approach is a broken approach that ends up with unnecessary costs, tooling bloat, and major interoperability problems.
Teams need to focus on understanding the business problems they face at a point in time, coming up with measurable goals, and matching their data stack investment to the business value they want to create.
It’s significantly more important to get that right vs. the specific tools you use.
And there you have it and why I like it. For a community that is fully onboard with stacks, this should strike a positive chord with many NonStop users. I can recall my most memorable observation from a Gartner conferences decades ago where the high-profile analyst suggested that when it comes to legacy, the day you deploy a new application almost by definition it has become legacy.
What comes next has always been of interest even as I recall how many years ago and as a Tandem Computers product manager of challenging the comms stack folks with the question: How are we preparing for a post TCP/IP world? And yet, when it comes to massive data transfers the like of which we are now seeing, it’s not so much about TCP as it is about UDP. What comes next for NonStop users will hinge on what is chosen to ingest data and for the NonStop community knowledgeable in Change Data Capture (CDC) there are now multiple options from which to choose.