Connect and the NSU40 planning team is hosting the first ever NonStop Hackathon event at this year’s Technical Boot Camp! If you...
Integrating Cloud Computing Services with NonStop
Canam Software and TIC SoftwareDan
Cloud computing is playing an increasingly key role in organizational IT strategies. Amazon, Microsoft, IBM, Google and others continue to improve their offerings, making a compelling case for using cloud-based resources. As providers continue to extend more and more services, new opportunities are presenting themselves. n tis article, we will look at some of the products offered by Amazon Web Services (AWS) and how they can be used to introduce big data analytics capabilities for NonStop applications.
To begin from a common starting point, let’s define cloud computing and its advantages.
What is Cloud Computing?
Amazon defines Cloud Computing as “the on-demand delivery of compute power, database storage, applications, and other IT resources through a cloud services platform via the internet with pay-as-you-go pricing.” A quick Google search will yield many definitions for Cloud Computing, but they are essentially the same, with the key concepts being:
- On demand delivery of servers, storage, databases, networking, software, analytics, and more—over the Internet.
- Pay as you go pricing.
- Accessing computer services over the internet instead of from your computer or company network.
- Accessing services that are managed for you by someone else.
There are three types of Cloud Computing deployments, with each one representing a different level of control. They are: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS).
Infrastructure as a Service involves renting individual components of computing resources hosted by someone else, such as storage, network endpoints, and virtual machines with a selection of operating systems. One is responsible for assembling and maintaining these components as well as deploying and maintaining one’s software. Examples of this include Microsoft Azure, Amazon Web Services, Google Cloud Platform, and IBM Cloud.
Platform as a Service typically involves renting access to pre-configured virtual machines hosted by someone else. Meaning, one provides the software that runs in the cloud and they provide the operative system and virtual hardware to host that software. Examples of this include Heroku, OpenShift. and any Docker-based applications.
Software as a Service involves renting access to a piece of software that someone else hosts on the internet. One pays for access to the service and they worry about setting up, providing and maintaining that service. Examples of this include SalesForce and Dropbox.
Advantages of Cloud Computing
Advantages of Cloud Computing include:
- Scale services up or down to meet needs.
- Stop guessing capacity.
- Reduced Cost
- Only pay for what you use (i.e. “pay as you go”).
- No major up-front capital expenditure.
- Best security practices followed.
- Deploy the infrastructure you need almost instantly.
- Get applications to market quickly without worrying about underlying infrastructure costs or maintenance.
- Strategic Value
- Access to latest, most innovative technology.
- Replication can reduce the impact of unexpected software outages.
- Most cloud offerings allow geographically dispersed deployments so that your software can be located close to your markets.
Cloud Computing and Big Data Analytics
Big data analytics is the examination of large volumes of data to detect patterns, trends, customer preferences and other information that can help organizations find ways to: increase revenue, provide better customer service, improve operational efficiency and more. Typically, this type of analysis is not done against transactional databases which are designed for high performance, transaction-based processing. Instead, data is moved to storage areas designed specifically for querying and analysis. Data lakes, data warehouses and data marts are examples of such storage areas.
A data lake is a storage area that holds large volumes of data in its original format. (No processing or formatting of the data is done before it is loaded). The thinking behind storing data this way is that one never knows how this data will be used in the future. Storing it in its raw format keeps all possibilities open. While a data lake stores data in an unstructured format, a data warehouse stores data in a structured format. Data warehouses are modeled for high performance querying and reporting. Data marts are subsets of a data warehouse geared to a specific functional area. Data lakes and data warehouses/data marts are sometimes considered mutually exclusive approaches to big data analytics; however, this doesn’t have to be the case. A data lake is an excellent way to source data for use by multiple data warehouses and data marts can meet both immediate and future analytic requirements.
Costs for storage, processing power, software, etc. can make implementing a data analytics solution an expensive undertaking. This, plus the fact that being scalable is a key requirement as data volumes grow, make cloud computing a great option for big data analytics.
AWS and NonStop
Transactional data captured by NonStop applications can be a key source for business analytics; however, the NonStop platform may not be ideal for storing and analyzing the information over a long period of time. Developing and hosting an in-house data analytics solution can be challenging – and expensive – option. An attractive alternative is AWS which provides all the cloud-based services needed to extend easily a NonStop application with a scalable, flexible, and cost-effective infrastructure. But how do you get the information from the NonStop to AWS and which AWS services do you use and how do you use them?
The diagram below shows just one possible approach for integrating data from the NonStop platform with AWS.
The AWS services in the above diagram can be split into 3 categories: Collection, Storage and Analyze.
AWS’ Direct Connect service can be used to connect NonStop application data to AWS.
AWS provides several storage services. The example above uses AWS’ Simple Storage Service (S3) to hold data in its raw form – thus providing an excellent data lake implementation. A concern with data lake implementations is that they can often turn into “data swamps.” This is a term used to describe a situation where the data stored cannot be easily queried or used, and can occur when data is simply stored in a data lake without any information about its context (date, source, identifiers, etc.). AWS’ data lake solution addresses this by storing data in packages and tagging each package with metadata. One can define the metadata one needs for your packages to keep them organized. AWS’ Elasticsearch and DynamoDB are used for storing and retrieving these packages. Redshift is a data warehouse service where data can be stored for sophisticated querying and analysis. Data can be loaded from the data lake into one or more data warehouses. Lambda is AWS’ serverless function environment. It can be used to develop event-driven code for receiving data from the NonStop and loading it to S3 and storing metadata in DynamoDB.
AWS provides many analysis services. In the above example, AWS Quicksite is used.
Integrating AWS with NonStop can provide a scalable, flexible and cost-effective platform for big data analytics. In our next article we’ll discuss the steps involved in more detail.
For further information please feel free to contact CanAm and/or TIC Software –
110 Matheson Blvd.
www.canamsoftware.com (Canam Company Site)
60 Cuttermill Rd.
Great Neck, NY 11021
www.ticsoftware.com (TIC Company Site)
http://blog.ticsoftware.com (TIC Talk Blog)