Share on Facebook Share on Twitter Share on Digg Share on Stumble Upon Share via e-mail Print

Architectures for Clustering: Shared Nothing and Shared Disk

by Craig S. Mullins

Quarter 1, 2003

An e-business is always online. Regardless of the time or day, customers expect e-businesses to be available to serve their needs. Meeting these expectations places heavy operational demands on the computing infrastructure. Full functionality and non-stop availability are a necessity. System outages, whether planned (for maintenance and tuning) or unplanned (due to hardware failure, bugs, or viruses), are the enemy of the successful e-business.


So e-businesses require highly available systems. But they’re not the only ones seeking out means to provide additional availability. As companies increasingly become global organizations – that is, they do not limit themselves to conducting business in their local country—they, too, need systems that operate around-the-clock.


Many organizations turn to clustering as they look to increase the availability and computing capabilities of their hardware. Clustering basically works on the principle that multiple processors can tackle a problem better, faster, and more reliably than a single computer. That concept seems easy enough to grasp.


But, for many companies considering clustering options, the devil is in the details. How should clustering be accomplished? What technologies and architectures provide the best approach to clustering? I’ll get to those answers after exploring the concept of clustering in more detail.

Why Use Clusters?


Companies generally turn to clustering to improve availability and scalability. Clusters improve availability by providing alternate options in case a system fails. As I mentioned, clustering involves multiple, independent computing systems working together as one. So, if one of the independent systems fails, the cluster software can distribute work from the failing system to the remaining systems in the cluster. Users won’t know the difference – they interact with a cluster as though it were a single server – and the resources they rely on will still be available.


Most companies consider enhanced availability the primary benefit of clustering. In some cases, clustering can help companies achieve “five nines” (99.999 percent) availability.


But clustering offers scalability benefits, too. When load exceeds the capabilities of the systems that make up the cluster, you can incrementally add more system to increase the cluster’s computing power and meet processing requirements. As traffic or availability assurance increases, all or some parts of the cluster can be increased in size or number.


Available since it was introduced by DEC for VMS systems in the 1980s, clustering packages are now offered by major hardware and software companies including IBM, Microsoft, Sun Microsystems.










DB2
M
agazine

Types of Clustering


Shared-disk and shared-nothing architectures are the predominant approaches to clustering. The names are fairly accurate descriptions of each type.


In a shared-nothing environment, each system has its own private (not shared) memory and one or more disks (see Figure 1). The clustered processors communicate by passing messages through a network that interconnects the computers. Client requests are automatically routed to the system that owns the resource. Resources include things like memory, disk, and really any computing resource at the disposal of the computer. Only one of the clustered systems can “own” and access a particular resource at a time. But, in the event of a failure, resource ownership may be dynamically transferred to another system in the cluster.

 










Figure 1. The shared-nothing architecture

Shared-nothing clustering offers excellent scalability. In theory, a shared-nothing multiprocessor can scale up to thousands of processors because the processors do not interfere with one another – no resources are shared. For this reason shared-nothing is generally preferable to other forms of clustering. Furthermore, the scalability of shared-nothing clustering makes it ideal for read intensive analytical processing typical of data warehouses.


In a shared-disk environment, all of the connected systems share the same disk devices (see Figure 2). Each processor still has its own private memory, but all the processors can directly address all the disks.

 










Figure 2. The shared-disk architecture

Typically, shared-disk clustering does not scale as well as shared-nothing for smaller machines. Because all the nodes have access to the same data, they need a controlling facility to direct processing so all the nodes have a consistent view of the data as it changes. Furthermore, attempts by two (or more) nodes to update the same data at the same time must be prohibited. These management requirements can impose performance and scalability problems for shared disk systems. But with some optimization techniques shared-disk is well-suited to the large-scale processing you find in mainframe environments. Mainframes are already very large processors capable of processing enormous volumes of work. To equal the computing power resulting from only a few clustered mainframes would require many, many clustered PC and midrange processors.


The specialized technology and software of the Parallel Sysplex capability of IBM’s mainframe family makes shared-disk clustering viable for DB2 (and IMS) databases. In particular, the coupling facility and DB2’s robust optimization technology helps to enable efficient shared-disk clustering.


Shared disk usually is viable for applications and services requiring only modest shared access to data as well as applications or workloads that are very difficult to partition Applications with heavy data update requirements probably are better implemented as shared-nothing because of the potential for the shared-disk lock management controller to become a bottleneck. Table 1 compares the capabilities of shared-disk versus shared-nothing clustering.







Shared-Disk

Shared-Nothing

Quick adaptability to changing workloads

Can exploit simpler, cheaper hardware

High availability

Almost unlimited scalability

Performs best in a heavy read environment (for example, a data warehouse)

Works well in a high-volume, read/write environment

Data need not be partitioned

Data is partitioned across the cluster

Table 1. Shared-disk vs. shared-nothing clustering architecture.

IBM offers both a shared-nothing (for Unix, Linux, and Windows) and a shared-disk (on the mainframe only) approach. I believe IBM chose the shared-disk approach on the mainframe because of the in-depth research and technology at their disposal. The Parallel Sysplex technology and the advanced technology of the Coupling Facility makes shared-disk clustering viable for the mainframe. The shared-nothing approach for non-mainframe DB2 also makes sense because of the state of clustering research and technology in that market.

No Sharing: DB2 UDB EEE


DB2 Universal Database (UDB) Enterprise-Extended Edition (EEE) takes advantage of shared-nothing clustering. DB2 UDB EEE can optimize its dynamic partitioning for either SMP-style parallelism with a low overhead, single copy of EEE or across a cluster using a shared-nothing architecture with multiple instances of EEE.  (Refer to Sidebar: SMP for additional details.)


With shared-nothing clustering data must be organized into partitions so that the system can rationally assign data to the control of a given node and know which node to use to access the data once it is stored. With dynamic partitioning support DB2 UDB EEE can automatically stripe data across partitions with no DBA or application involvement required.





Sidebar: SMP


SMP is an acronym for Symmetric Multiprocessing. SMP is an architecture that takes advantage of multiple CPUs to complete individual processes simultaneously (multiprocessing). With SMP any idle processor can be assigned any task, and additional CPUs can be added to improve performance and handle increased loads. Each individual application can benefit from SMP if its code allows multithreading.


To configure DB2 EEE to execute in a shared-nothing architecture means that each machine has exclusive access to its own disk and memory. There is no competition for the resources because they are not shared. In such an environment databases must be partitioned across multiple machines. This partitioning enables the DBMS to perform complex parallel data operations.

 

This clustering enables customers to run applications on more than one node for increased scalability and high availability.  DB2 UDB EEE supports a diverse set of hardware options including SMP, NUMA servers, and pSeries (RS/6000) clusters with a range of interconnect options. DB2 exploits AIX HACMP (High Availability Cluster Multiprocessing) features on RS/6000 processors (refer to Sidebar: HACMP for additional details). Further, DB2 UDB EEE can run on multiple operating systems including AIX, Linux, HP-UX, Solaris, and Windows NT.

 





Sidebar: HACMP


IBM's High Availability Cluster Multi-Processing (HACMP) software provides a high-availability solution for mission critical applications on the RS/6000 platform. The HACMP software manages a cluster of RS/6000 systems configured such that there is no single point of failure within the cluster. HACMP provides facilities to automatically detect hardware failures and reconfigure the cluster to provide a reliable application platform. It allows for all resources to be fully utilized – as long as there are no hardware malfunctions taking place.

HACMP addresses the majority of situations that cause computer downtime. With HACMP/ES (for Enhanced Scalability) IBM takes the next step to make the cluster more aware of software failures. HACMP/ES can detect failures that are not severe enough to crash or hang the operating system, yet are severe enough to interrupt the proper system operations. With HACMP/ES the cluster can detect these problems so they can be corrected before causing a critical failure.


Applications accessing DB2 UDB EEE databases benefit from the robust, cost-based optimizer that understands the shared-nothing implementation and create optimal parallel access paths to access the data from the cluster. The optimizer supports query rewrite, OLAP, SQL extensions, dynamic bit-mapped indexing ANDing (DBIA), and star joins. These features are commonly used in data warehousing applications with very large data requirements. No application code changes are required to take advantage of these advanced SQL optimization techniques.


DB2 UDB EEE uses intelligent data distribution to distribute the data and database functions to multiple hosts. DB2 UDB EEE uses a hashing algorithm that enables it to manage the distribution (and redistribution) of data as required. Initially, the DBA must create the database objects to support partitions across the shared-nothing cluster. This differs somewhat from traditional database DDL. Shared-disk clustering is a little less intrusive from the perspective of implementing database DDL.

Data Sharing with DB2 for OS/390


Historically, larger organizations with multiple mainframes often installed individual processors for dedicated groups of users. But when DB2 applications needed to span the organization, DBAs had to create a duplicate copy of the application for each DB2 subsystem within the organization. So, IBM developed a system for data sharing for DB2 for OS/390 and Parallel Sysplex (a collection of MVS systems that communicate and exchange data with each other).


DB2 data sharing allows applications running on multiple DB2 subsystems to concurrently read and write to the same data sets concurrently. Simply stated, data sharing enables multiple DB2 subsystems to behave as one.


Data sharing requires a complex combination of hardware and software. To share data, DB2 subsystems must belong to a predefined data sharing group. Each DB2 subsystem that belongs to a particular data sharing group is a member of that group. All members of a data sharing group use the same shared DB2 catalog and directory. The maximum number of members in a data sharing group is 32 (though most data sharing implementations are far fewer in number).


Each data sharing group is an MVS Cross-system Coupling Facility (XCF) group. The group services provided by XCF enable the management of the shared resources of the shared-disk implementation of DB2 data sharing. In addition, XCF enables the data sharing environment to track all members contained in the data sharing group. A site may have multiple MVS Sysplexes, each consisting of one or more MVS systems. Each individual Sysplex can consist of multiple data sharing groups.


DB2 data sharing requires a Sysplex environment consisting of:

 

 





Sidebar: Coupling Facility


DB2 uses the coupling facility to provide intermember communications. Coupling facility ensures data availability while maintaining data integrity across the connected DB2 subsystems. To do so, the coupling facility provides core services (such as data locking and buffering) to the data sharing group. The coupling facility uses three structures to synchronize the activities of the data sharing group members:



In the long run, most organizations using DB2 for z/OS and OS/390 will implement data sharing instead of relying on a single mainframe, or several independent, unclustered mainframes. [Craig, rather than what? Relying on a single mainframe? Or is it possible to cluster without data sharing? OK - CSM].

As with most clustering technologies, the primary benefit is enhanced data availability. Data is available for direct access across multiple DB2 subsystems, and applications can be run on multiple smaller, more competitively priced microprocessor-based machines, thereby enhancing data availability and the price/performance ratio. One of the original goals of data sharing was to harness the power of multiple CMOS machines. At the time data sharing came out IBM mainframes were battling client/server and smaller processors. CMOS moved the mainframe from water-cooled behemoths to sleeker, cheaper air-cooled processors. The thought was that by combining these together using Parallel Sysplex and DB2 data sharing an enormous amount of processing power could be generated in a more cost-effective manner than before. With data sharing, DB2 applications can run within any data sharing group member, which delivers enhanced data availability. One or more members of a group may fail without impacting application programs because the workload will be spread across the remaining DB2 members.

 An additional benefit is expanded capacity. Capacity is increased because more processors are available to execute the DB2 application programs. Instead of a single DB2 subsystem on a single logical partition, multiple CPCs can be used to execute a program (or even a single query).


Data sharing increases the flexibility of configuring DB2. New members can be added to a data sharing group when it is necessary to increase the processing capacity of the group (for example, at month end or year end to handle additional processing). The individual members that were added to increase the processing capacity of the data sharing group are easily removed when it is determined that the additional capacity is no longer required.


No special programming is required for applications to access data in a DB2 data sharing environment. Each individual subsystem in the data sharing group uses the coupling facility to communicate with the other subsystems. The inter-system communication provided by DB2 data sharing provides a system image that resembles a single, standalone DB2 subsystem to the application. No application programming changes are required. The only modifications that may need to be made to current application programs to run in a data sharing environment is to provide additional error checking for “data sharing” return codes.

Summary


Shared-nothing and shared-disk offer two differing techniques for clustering. Shared-nothing is the predominant clustering architecture used by most computing and database implementations. Scalability and performance are hallmarks of shared-nothing clustering and make it ideal for analytical and data warehousing applications. Shared-disk clustering requires a controlling facility to manage access to the shared data. IBM has built specialized technology and software to create robust shared-disk mainframe clustering.


IBM provides wide and deep support for clustering in its various incarnations of DB2. The shared-disk implementation on DB2 for OS/390 and z/OS provides high availability, load balancing, and failover with no DDL or application changes for the mainframe world. And the shared-nothing implementation of DB2 in the non-mainframe world offers extremely high scalability across a wide variety of simpler hardware options with no application changes accessing a database that is partitioned across the cluster. DB2 truly offers the best of both worlds when it comes to clustering.



From DB2 Magazine, Q1 2002.

© 2012 Craig S. Mullins,  


Resources


DB2 UDB EEE clustering
IBM Redbook Managing VLDB Using DB2 UDB EEE (SG24-5105-00)

www.redbooks.ibm.com


DB2 for OS/390 Data Sharing
IBM Redbook DB2 for MVS/ESA Version 4 Data Sharing Implementation (SG24-4791-00)

www.redbooks.ibm.com/