IDUG Solutions Journal: "Data Warehouse Administration: The Challenges Never Stop"

Mixing DB2 and Object Orientation?

By Craig S. Mullins

Many organizations have adopted object-oriented programming standards and languages because of the claimed advantages of the OO development paradigm. The primary advantages claimed by OO proponents include faster program development time and reduced maintenance costs due to usage of reusable objects. By piecing together reusable objects and defining new objects based on similar object classes, development time potentially can be dramatically reduced.

With benefits like these it is no wonder that object-oriented programming and development is being embraced by some IT organizations. Historically, one of the biggest problems faced by IT is the large project backlog that has accumulated. Some estimates show that more than 70% of the work done by IT organizations is maintenance, further exacerbating the project backlog. In many cases end users are forced to wait for long periods of time for their new applications because the backlog is so great and the requisite talent needed to tackle so many new projects is not available. Sometimes, this backlog can result in some unsavory phenomenon such as business people attempting to build their own applications or purchasing of third party packaged applications (and all of the potential administrative burdens that packages carry). So, it is very clear why the siren song of object orientation lures organizations.

But what is object oriented technology and how does it differ from the relational world of DB2? Let’s examine some of the key problem areas. For definitions of the OO terminology used, please refer to the Sidebar on OO terminology.

OO technology is fundamentally based upon, what else, but the object. Objects are defined based on object classes that determine the structure (variables) and behavior (methods) for the object. So, it can be seen that true objects can not be easily represented using a relational database. In the RDBMS, a logical entity is transformed into a physical representation of that entity solely in terms of its data characteristics. In DB2, you create a table that can store the data elements (in an underlying VSAM data file represented by a table space). The table contains rows that represent the current state of that entity. The table does not store all of the encapsulated logic necessary to act upon that data. By contrast, an object would define an entity in terms of both its state and its behavior. In other words, an object encapsulates both the data (state) and the valid procedures that can be performed upon the object's data (behavior).

Increasingly, RDBMSs are adding the capability to store more logic in the database. With triggers and user-defined functions and stored procedures, more behavior can be encapsulated in relational tables. Of course, this is not the same type of encapsulation espoused by OO purists. One difference is that user-defined functions and stored procedures are not limited to acting upon data in only one table. Methods in OO parlance are created to manipulate the state only of the object in which the method is encapsulated. Triggers are closer to encapsulated methods in function, because they are physically defined on a single table. However, triggers too are not limited in the data that they can impact. For example, a trigger defined on TableA can access data in TableB and TableC and modify data in TableD.

Realize, too, that there is more to object oriented technology than we have discussed, so far. This discussion has been purposely simplified to introduce the notion of encapsulation. The definition as stated does, however, introduce the basic difference between OO and relational methodologies.

A New Way of Thinking

Think for a moment in relational terms: many programs and procedures are required to operate upon data in tables. Each procedure must retrieve the data, operate upon it in some way, and possibly replace the data. In the OO paradigm, messages are passed to objects invoking the encapsulated methods. Because each object contains its own operations, or methods, most of the procedural code is eliminated.

To truly be an object-oriented programmer, you must learn to see the world in a different way. A traditional programmer using a 3GL against a relational database sees the world in terms of verbs. My program does this, then executes that iteratively, then writes those, updates these and exits. An OO programmer sees the world in terms of nouns. Object bank account increment your balance by $153.29. It is a different way of thinking and programming. To further demonstrate, consider the following code fragment:

This pseudo code represents the traditional, procedural way of coding a program to draw geometric shapes. Based on the type of shape, a different procedure is invoked to draw the correct shape. Now let’s look at the same code but using an object-oriented code fragment:

You can see where this piece of code is simpler and easier to understand. The code is shorter, more comprehensive, and more stable. It need never change. When a new shape is introduced, the drawYourself method is part of the new object. This code continues to function. The code shown earlier would have to be changed for each new shape that is introduced.

The biggest problem with this new way of thinking and programming is that it is anathema to developing efficient relational databases and applications. Fundamentally, object-oriented tenets state that the methods that impact the state of the variables of an object are encapsulated within the object. In many cases, organizations try to implement object-oriented programming with a relational, DB2 back-end to store the persistent data. One of the guiding principles of OO is that methods—the procedures that give objects their behavior—should be coded against just one object. Translated, this means that each SQL statement should access one and only one table. This might simplify the design and development process. It also fosters adherence to the principle of encapsulation. However, it is the wrong way to write relational queries.

A relational database has a relational optimizer. And DB2 has the best optimizer technology in the business. The relational optimizer analyzes complex SQL and determines the most efficient way to access the data based on the request, the database objects, and the environment. To best optimize relational applications, you should code the data access using all of the features and functionality available to you in SQL. This includes inner joins, outer joins, unions, and subselects: all features that act upon multiple tables and would not be allowed in most OO to DB2 implementation. If your adherence to an OO philosophy, methodology, or language prohibits accessing more than one table per SQL statement, you are building inefficient relational applications.

Synopsis

Object-oriented programming potentially offers some phenomenal benefits in terms of reduced development and maintenance time. However, if you are using DB2 for your databases, before implementing an OO methodology or language at your shop, be sure to analyze the impact of any trade-offs you must make causing you to forsake good relational programming and development techniques. Unless your application is simple or performance doesn’t matter (and when are either of these true), these trade-offs are probably detrimental to the performance of your DB2 applications.

Abstract Data Type a data type that is not defined into the programming language, but is defined by the programmer; usually, ADTs are used to build high-level, complex structures that model real world objects

Behavior refers to the way that an object functions and changes over time

Class a template for an object that defines the methods and variables for the object; all objects of a specific class have the same structure and exhibit the same behavior

Class Hierarchy tree structure that defines relationships among classes; each class hierarchy has a single top node and potentially many nodes under the top node, along different branches; a child class on the hierarchy inherits the parent class’s variables and methods

Encapsulation the technique of combining data together with process in a single, common area. This creates an environment in which all of the operations for a given set of data are organized and maintained in one place, thereby reducing confusion, eliminating misuse, and simplifying maintenance.

Inheritance the mechanism whereby classes can make use of the structure and methods defined in all classes above them in the class hierarchy

Message a request to an object to perform a method

Method a process encapsulated within an object

Object a representation of a real world thing encapsulating the data and all of its procedures (processes) within itself

Polymorphism the ability to send the same message to objects of different classes and have each class perform a method in its own way

State the "make up" of an object at a given point in time; the actual values stored in the variables

Craig S. Mullins
Return to Home Page
Summer 1999
	Mixing DB2 and Object Orientation? By Craig S. Mullins Many organizations have adopted object-oriented programming standards and languages because of the claimed advantages of the OO development paradigm. The primary advantages claimed by OO proponents include faster program development time and reduced maintenance costs due to usage of reusable objects. By piecing together reusable objects and defining new objects based on similar object classes, development time potentially can be dramatically reduced. With benefits like these it is no wonder that object-oriented programming and development is being embraced by some IT organizations. Historically, one of the biggest problems faced by IT is the large project backlog that has accumulated. Some estimates show that more than 70% of the work done by IT organizations is maintenance, further exacerbating the project backlog. In many cases end users are forced to wait for long periods of time for their new applications because the backlog is so great and the requisite talent needed to tackle so many new projects is not available. Sometimes, this backlog can result in some unsavory phenomenon such as business people attempting to build their own applications or purchasing of third party packaged applications (and all of the potential administrative burdens that packages carry). So, it is very clear why the siren song of object orientation lures organizations. But what is object oriented technology and how does it differ from the relational world of DB2? Let’s examine some of the key problem areas. For definitions of the OO terminology used, please refer to the Sidebar on OO terminology. OO technology is fundamentally based upon, what else, but the object. Objects are defined based on object classes that determine the structure (variables) and behavior (methods) for the object. So, it can be seen that true objects can not be easily represented using a relational database. In the RDBMS, a logical entity is transformed into a physical representation of that entity solely in terms of its data characteristics. In DB2, you create a table that can store the data elements (in an underlying VSAM data file represented by a table space). The table contains rows that represent the current state of that entity. The table does not store all of the encapsulated logic necessary to act upon that data. By contrast, an object would define an entity in terms of both its state and its behavior. In other words, an object encapsulates both the data (state) and the valid procedures that can be performed upon the object's data (behavior). Increasingly, RDBMSs are adding the capability to store more logic in the database. With triggers and user-defined functions and stored procedures, more behavior can be encapsulated in relational tables. Of course, this is not the same type of encapsulation espoused by OO purists. One difference is that user-defined functions and stored procedures are not limited to acting upon data in only one table. Methods in OO parlance are created to manipulate the state only of the object in which the method is encapsulated. Triggers are closer to encapsulated methods in function, because they are physically defined on a single table. However, triggers too are not limited in the data that they can impact. For example, a trigger defined on TableA can access data in TableB and TableC and modify data in TableD. Realize, too, that there is more to object oriented technology than we have discussed, so far. This discussion has been purposely simplified to introduce the notion of encapsulation. The definition as stated does, however, introduce the basic difference between OO and relational methodologies. A New Way of Thinking Think for a moment in relational terms: many programs and procedures are required to operate upon data in tables. Each procedure must retrieve the data, operate upon it in some way, and possibly replace the data. In the OO paradigm, messages are passed to objects invoking the encapsulated methods. Because each object contains its own operations, or methods, most of the procedural code is eliminated. To truly be an object-oriented programmer, you must learn to see the world in a different way. A traditional programmer using a 3GL against a relational database sees the world in terms of verbs. My program does this, then executes that iteratively, then writes those, updates these and exits. An OO programmer sees the world in terms of nouns. Object bank account increment your balance by $153.29. It is a different way of thinking and programming. To further demonstrate, consider the following code fragment: for (shape in wind) branchOn: typeOf (shape) circl: drawCirc (shape) rectang: drawRect (shape) traingl: drawTrian (shape) This pseudo code represents the traditional, procedural way of coding a program to draw geometric shapes. Based on the type of shape, a different procedure is invoked to draw the correct shape. Now let’s look at the same code but using an object-oriented code fragment: wind forEach: shape shape drawYourself You can see where this piece of code is simpler and easier to understand. The code is shorter, more comprehensive, and more stable. It need never change. When a new shape is introduced, the drawYourself method is part of the new object. This code continues to function. The code shown earlier would have to be changed for each new shape that is introduced. The biggest problem with this new way of thinking and programming is that it is anathema to developing efficient relational databases and applications. Fundamentally, object-oriented tenets state that the methods that impact the state of the variables of an object are encapsulated within the object. In many cases, organizations try to implement object-oriented programming with a relational, DB2 back-end to store the persistent data. One of the guiding principles of OO is that methods—the procedures that give objects their behavior—should be coded against just one object. Translated, this means that each SQL statement should access one and only one table. This might simplify the design and development process. It also fosters adherence to the principle of encapsulation. However, it is the wrong way to write relational queries. A relational database has a relational optimizer. And DB2 has the best optimizer technology in the business. The relational optimizer analyzes complex SQL and determines the most efficient way to access the data based on the request, the database objects, and the environment. To best optimize relational applications, you should code the data access using all of the features and functionality available to you in SQL. This includes inner joins, outer joins, unions, and subselects: all features that act upon multiple tables and would not be allowed in most OO to DB2 implementation. If your adherence to an OO philosophy, methodology, or language prohibits accessing more than one table per SQL statement, you are building inefficient relational applications. Synopsis Object-oriented programming potentially offers some phenomenal benefits in terms of reduced development and maintenance time. However, if you are using DB2 for your databases, before implementing an OO methodology or language at your shop, be sure to analyze the impact of any trade-offs you must make causing you to forsake good relational programming and development techniques. Unless your application is simple or performance doesn’t matter (and when are either of these true), these trade-offs are probably detrimental to the performance of your DB2 applications. Sidebar: OO Terminology Abstract Data Type a data type that is not defined into the programming language, but is defined by the programmer; usually, ADTs are used to build high-level, complex structures that model real world objects Behavior refers to the way that an object functions and changes over time Class a template for an object that defines the methods and variables for the object; all objects of a specific class have the same structure and exhibit the same behavior Class Hierarchy tree structure that defines relationships among classes; each class hierarchy has a single top node and potentially many nodes under the top node, along different branches; a child class on the hierarchy inherits the parent class’s variables and methods Encapsulation the technique of combining data together with process in a single, common area. This creates an environment in which all of the operations for a given set of data are organized and maintained in one place, thereby reducing confusion, eliminating misuse, and simplifying maintenance. Inheritance the mechanism whereby classes can make use of the structure and methods defined in all classes above them in the class hierarchy Message a request to an object to perform a method Method a process encapsulated within an object Object a representation of a real world thing encapsulating the data and all of its procedures (processes) within itself Polymorphism the ability to send the same message to objects of different classes and have each class perform a method in its own way State the "make up" of an object at a given point in time; the actual values stored in the variables From IDUG Solutions Journal, Summer 1999. © 1999 Mullins Consulting, Inc. All rights reserved. Home. Phone: 281-494-6153 Fax: 281-491-0637