The final quarter of HPE’s financial year 2018 has just come to a close and the press announcements and analyst reports are...
Why UNIX is Faster than NonStop
And How the NonStop Community Can Catch Up
Caleb Enterprise Ltd.Dan
When discussing the performance of NonStop, there are often comparisons to UNIX. With the introduction of the NonStop X comparisons have lessened somewhat with no customers who have stated they are unhappy with the performance of their NonStop X system. But, for some of us who are pursuing perfection, it is undeniable that UNIX is just faster than NonStop; but need this be the case? Can steps be taken to overcome this performance gap? Addressing the performance gap may be less complex than many of us realize.
Though NonStop users aren’t demanding equivalent performance levels to Unix today, and have largely expressed their satisfaction with what HPE is now shipping, we suspect that as transactions become richer, as the number of devices multiplies, and as the floodgates open in support of IoT; perhaps there will emerge a demand for better performance from the NonStop systems already deployed.
The solution? Implement UNIX-like IPC across processors on NonStop. That’s it! That’s all! The quest to level this playing field is not nearly so simple. Until there was RDMA, it was impossible for NonStop to meet this challenge. That is no longer true – theoretically. To understand just why shared memory is so much faster, we must first understand the basic tools UNIX programmers use to build fast systems. Let’s start with the I/O latency fundamentals – how fast the I/O gets delivered – as illustrated by the following:
UNIX and NonStop do context switching in microseconds so that is a fairly level playing field. But things change very quickly from there. UNIX systems coordinate much of their work in memory – operating at nanosecond speeds. Because everything happens in memory, a set of standardized libraries collectively known as IPC (Inter-Process Communication) were written to facilitate inter-process messaging, coordinate access to mapping and partitioning of shared memory.
NonStop systems coordinate most of their work across the inter-process message (IPM) bus at microsecond speeds – an order of magnitude slower than IPC. Why is that? NonStop is a SNMP architecture where each CPU has its own memory and processor. Shared memory is not an option. The only way for processes in different CPUs to share is across the IPM bus. Share what? Lots! Disk I/O, inter-process messages (IPM), network I/O, operating system indications and more. NonStops can coordinate access between processes, but all of the synchronization occurs using files, or messaging across the bus on $RECEIVE queues.
One can readily see that the NonStop is at a severe latency disadvantage and until InfiniBand came along, it was impossible to address. ServerNet significantly improved NonStop bus architecture with multiple discreet data paths between devices, and wormhole routing, to significantly reduce latency. Then InfiniBand (IB) came along, operating about 50 times faster than ServerNet. But all inter-process I/O is still constrained by Guardian IPM on $RECEIVE, with context switches that drag throughput to a small fraction of IB potential.
So what is the answer? You need a framework that can utilize RDMA capabilities of IB and layer IPC-like services onto it, so that Guardian and OSS processes can completely side-step context switching and the IPM message system. The mainline of a NonStop server program would basically look like this:
While there are messages on $RECEIVE
Service them (non-blocking)
While there is IPC work to do
Service IPC messages (non-blocking)
Check for semaphore completions (non-blocking) and react to them
Until shutdown requested
Guardian and OSS processes spend most of their lives waiting on $RECEIVE for I/O completions. If data resides in memory instead of on disk, this will yield a quantum improvement in performance.
But there is a problem with processing data in memory, isn’t there? How do you apply transactional semantics to it in a world where you can’t afford loss or duplication of data? How do you roll back a memory write? Disk I/O with TMF and audit trails is a tough act to follow. These are the things that have been occupying my thinking for years; and I have viable solutions.
If NED executives are wondering why the NonStop customer base has not embraced the extreme performance capabilities of IB, these fundamental business issues are at the heart of it. How does one do something useful with RDMA?
If there were IPC tools that could work seamlessly across NonStop processors, then every single UNIX application that is predicated on IPC could be ported to NonStop. There are lots of application architects out there who know how to build IPC-based systems. With IPC services on NonStop, they could get the added value of fault tolerance and linear scalability. Would customers consider it a bonus if that shared memory was itself fault tolerant? Would customers consider it beneficial if their NonStops could integrate with UNIX and Windows servers using exactly the same IPC framework?
If you would like to see what my “vision” is for IPC on NonStop and hybrid computing, and how much is already in place, see:
If you want to see what shared memory tools HPE NED has to offer and my thoughts on them, see:
For those of you who have a vision to build a new hybrid application, or who want to leverage an existing application using the fundamentals articulated here and in my white papers, reach out to me at email@example.com and let’s brainstorm! We can take these ideas to NED to demonstrate that there is real interest in this kind of hybrid application framework. Let’s lead – not follow – in this emerging world of exa-scale RDMA/HPC applications that define the new frontier of fault-tolerant shared memory. Let’s build next-generation applications where the data stays at rest in universal shared memory.