LIMITATIONS OF OBJECT DATA MODELS

Andrew Girow

1. INTRODUCTION

"We have seen dramatic changes in the way we are using and programming a computer. The new challenges for the software industry are leading to object technology and many analysts expect this technology to be the main stream at least until the end of this decade"[1]

Objects are becoming the leading software development approach. Many papers describe advantages of object technology. However, I think that developers want know not only about advantages but also about limitations of this technology. My main purpose is to collect some problems concerning object data models. This paper comes out under the influents of different works (referenced below) of Abrial, Senko, Kent, Booch. I hope it will aid the reader to look at the current object technology from a more pragmatic point of view.


2. OBJECT DATA MODEL

We recall the OMG object model and the ODMG extensions [2],[3].

2.1. THE OMG OBJECT MODEL

Objects are structures that combine code and data. The OMG object model supports the notion of class of objects with attributes and methods. It also offers inheritance and specialisation. To illustrate this, let us define elementary objects.

example 2.1

class Person
{
public:
    long     socialSecurityNumer;
    d_String name;
    d_Date   birthday;

    Person();   // Constructor
    int age();
};

class Employee: public Person    //  A subclass of Person
{
public:
    long     emplyeeNumer;
    d_Date   hiredDate;
    float    salary;
};

Now we describe the extensions brought by ODMG to the OMG data model.

2.2. ONE-TO-ONE RELATIONSHIPS

An object refers to another object through a d_Ref. A d_Ref behaves as a C++ pointer, but it is a persistent pointer.

example 2.2

class Project;
class Department
{
public:
    String   name;
    Address  location;
    d_Ref <Project> project;

    Department();
};

2.3. COLLECTIONS

A collection is a container of elements of the same class. As usual, polymorphism is obtained through the class hierarchy. For instance a d_Set<d_Ref<Person>> may contain Persons as well as employees, if the class Employee is a subclass of the Person.

example 2.3

d_Set<d_Ref< Preson >> AllPersons; // The Person class extent.

2.4. MULTIPLE RELATIONSHIPS

An object can be related to more than one object through a relationship.

example 2.4

class Employee;
class Department
{
public:
    String    name;
    Address   location;
    d_Set<d_Ref<Employee>>  employees;

    Department();
    virtual int AddEmployee( d_Ref<Employee>);
};

d_Set<d_Ref<Department>> AllDepartments;  //The Departmemt class extent.


3. BASIC ASSUMPTIONS BEHIND OBJECTS

Object structure presumes a horizontal and vertical homogeneity in data and methods. Horizontally, each object of a given class contains the same fields and methods. Vertically, a given field contains the same kind of information and a given method defines the same "behaviour" of each object of a given class [7].

3.1. HOMOGENEITY OF RELEVANT FACTS

A given class describes a set of things in the real world. Objects fit best when the entire population has the same properties.

In many cases, although a certain set of individuals constitutes a single "kind " of thing, there is considerable variation in the facts relevant to each item in that set. The more that information deviates from the norm of homogeneity, the less appropriate is object configuration.

There is a technique for accommodating variability among objects in a class: Define the class to include the union of all relevant fields, where not all the fields are expected to have values in every object.

Consider employees:
example 3.1

class Employee:
{
public:
    long   employeeNumber;
    long   socialSecurityNumber;
    String name;
    float  salary;

    Employee();
};

Often "maiden name" is define as a part of a class definition for all employees, though it is only relevant to married females. Thus many objects might have null values in many fields. Furthermore, the limited relevance is not defined to the system. It is only the pattern of usage that reflects the limitation. If there is considerable variation over a population, then this solution becomes cumbersome and inefficient.

Employee is not the only such category. Tools, vehicles, furniture and people are just a few other such categories. This situation is fairly common. The employees of a multinational corporation might not all have social security numbers, or employee numbers. Some books do not have ISBN's. Oil companies have their individual conventions for naming their oil wells.

3.2 HOMOGENEITY WITHIN FACT TYPE

There is also a vertical homogeneity assumed. Within a class, a given field is expected to contain the same kind of value in every object.

This is not always true. Suppose that company cars can be assigned either to employees or to departments. Assignment is a simple fact, to which one might naively expect to be able to address a simple inquiry: "to whom car 97 assigned?". In such a case, we might like two-part answer: the type of the assignee (employee or department), plus the identification of the individual assignee. We could design a class with the following structure:

example 3.2.1

enum AssignType { employee, department };
class Employee;
class Department;
class Vehicle
{
public:
    long   vehicleNumber;
    d_Ref<Employee>   assignToEmp;
    d_Ref<Department> assignToDep;

    Vehicle();
    AssignType toWhomIsAssign();
};

We assume, of course, that only one of assignToEmp and assignToDep is filled. We have, incidentally, created a horizontal inhomogeneity. The assignToEmp is relevant for some vehicles, and the assignToDep for others. They are never both relevant for the same vehicle. If later on, cars can be assigned to other kind of things (to divisions, to branch offices) then the classes have to be redesigned with additional fields for the new assignee types.
The validation gets more complicated: only one of the last n fields may contain a value.

Another approach is to provide distinct classes, one for each type of assignee.

example 3.2.2

class Vehicle
{
public:
    long  vehicleNumber;

    Vehicle();
};

class VehicleEmployeeAssignment: public Vehicle { public:
    d_Ref<Employee> assignToEmp;
};

class VehicleDepartmentAssignment: public Vehicle { public:
    d_Ref<Department> assignTo;
};

class VehicleBranchOfficeAssignment: public Vehicle { public:
    d_Ref<BranchOffice> assignTo;
};

Each class has its name. Instead of "Vehicle" being a simple kind of fact, we have many kinds: "VehicleEmployeeAssignment", "VehicleDepartmentAssignment", "VehicleBranchOfficeAssignment", etc.

Instead of going to one class (or naming one relationship) to find the assignment of a vehicle, one has to know how many such classes there are, and their names. One has to be prepared to interrogate each one of them. It is even worse if you are interested in some other information about vehicle. That information might be in any one of the classes.

Validation is a problem: there is no system facility to keep the same vehicle from appearing in more than one class. Extensions are difficult, too. Every new assignee type requires the information of another class. Changing a vehicle's assignment is cumbersome. If the assignee type is also changing, an object of one class has to be deleted and an object of another class inserted.

These approaches look even worse if there is inhomogeneity on both sides of the relationship. We want to record more general equipment assignments. Equipment might include vehicles, furniture, tools, buildings, etc., each potentially having its class.

There is still another way. One can provide a uniform reference to all the entities involved by aggregating them into one "superclass". This permits assignees to be referenced uniquely and carefully in the assignment classes.

example 3.2.3

class Assignee;
class Vehicle
{
public:
    long  vehicleNumber;
    d_Ref<Assignee> assignTo;

    Vehicle();
};

class Employee: public Assignee{
};

class Department: public Assignee{
};

class BranchOffice: public Assignee{
};

Unfortunately, there is a drawback of this "superclass" approach.
It has to be reapplied for each different kind of multitype facts. Entities are structured one way for equipment assignment, another way to keep track of who manufactures what, another way for who owns what, another way as "employers", and so on. Each of these is potential grounds for another superclass hierarchy.
To net it all out, the object structure is not well suited to information exhibiting "vertical inhomogeneity".


4. SOME FEATURES OF OBJECT SYSTEMS

Most of current object processing systems have the following features.

4.1 CLASS DESCRIPTIONS ARE NOT INFORMATION

Information is obtained from an object by extracting the values of fields or through some methods.

One can answer the question "who manages the Accounting department?" by finding a certain field that contains the manager's name of a some method that returns it. However, it is not likely, that the object system can provide an answer to "how Billy Jones related to the Accounting department?" There are no fields and methods containing such entries as "is assigned to", "on loan to", "handles personnel matters for", etc.

Depending on how the objects are organised, the answer generally consists of a field name, method name or a class name. They are not contained in the objects.
To a naive seeker of information from the database (through a high-level query interface), it is not at all obvious why one question may be asked and the other may not.

The object data management systems do not provide a way to ask such questions whose answers are field names, method names or class names.

If the maximum number of employees is fixed by corporate policy, then an object system offers validation, outside the database itself. As a rule, the constraints are placed inside methods of objects.

example 4.1

class Department
{
public:
    String  name;
    Address  location;
    d_Set  <d_Ref<Employee>> employees;

    Department();
    virtual int employed( d_Ref<Employee>);  // Adds a new employee
};

class AccontingDepartment : public Department
{
public:
    AccountingDepartment();
    virtual int employed( d_Ref<Employee>); // The method is redefined
};

Our naive seeker of facts will then again find himself unable to ask the following questions:

1) What is the maximum number of employees permitted in any department?
2) How many more employees can be hired into the Accounting department?

Our naive seeker of facts will might well observe that other things having the effect of rules or constraints are accessible from the database. They are sales quotas, department budgets, safety standards. The only difference is that some such limits are intended to be enforced by the system, while others are not. It is not at all obvious to him why he can ask some questions and not others.

This suggests that we should represent such descriptions and constraints in the same format and in the same database as "ordinary" information.

4.2. FIELD NAMES ARE ONLY PLACE HOLDERS

Fields' names are used only to designate some space within an object. This allows for the system to provide its services. Certainly, one name is adequate for this purpose.However, for information modelling, we want to attach several labels to a field, including, for example, the kind of entity and its relationship to the subject of the object.

In practice, there are no discipline in the usage of fields' names. Sometimes they name the entity type, sometimes the relationship, sometimes a hybrid of the two, sometimes nothing intelligible.

example 4.2.1

class Employee:
{
public:
    long     employeeNumber;
    String   name;
    long     code;
    d_Ref <Department> worksIn;

    Employee();
};

Field and method names are just mnemonic aids to users, rather than anything that can be used by a system service to establish semantic connections.

If the field name specifies the entity type (or method name specifies a function that returns an entity), it is not likely to be the same as the corresponding class name. While a class might be name "Department", the corresponding field in a "Employee" class might be named "dept".

example 4.2.2

class Department;
class Employee
{
public:
    d_Ref <Department> dep;

    Employee();
};

class Project
{
public:
    d_Ref <Department> department;

    Project();
};

The same entity type might be spelled differently in different field names in different classes. Nothing prevents the same field name from meaning entirely different things in different classes.

4.3. STABILITY OF RELEVANT FACTS

The kinds of facts relevant to an entity are predefined and are expected to remain quite stable. It generally takes a major effort to add fields to objects that belong to base classes of classes' hierarchies (for example, the Person class).

example 4.3

class Person {};
class Employee: public Person {};
class Taxpayer: public Person {};
class Customer: public Person {};
class StockHolder: public Person {};

The stability of relevant facts may be acceptable and desirable in many cases. However, there are situations where all sort of information must to be stored. In these situations we need more flexible data structure.


5. INFORMATION CONCEPTS

For information modelling purposes, we have to account for such concepts as entities and relationships.

5.1 ENTITIES

It is a natural to identify entities with objects. An object is an elementary unit of creation and destruction, as well as of data transmission. Objects are classified into classes just as entities are classified into types. Such a correspondence between entities and objects would be enormously simple. How well do their properties match? Is there a 1:1 correspondence between objects and entities?

5.1.1. Entities Are Not Always Single Typed

There is some difficulty in equating classes and entity types. It seems natural to view a certain person as a single entity. However, such an entity might be an instance of several entity types such as person, employee, taxpayer, customer, stockholder, etc. It is difficult to define a class corresponding to each of these, and then permit a single object to simultaneously be an occurrence of several of the classes.

example 5.1.1

class Person {};
class Employee: public Person {};
class Taxpayer: public Person {};
class Customer: public Person {};
class StockHolder: public Person {};
...
// Permits a single person to simultaneously be an occurrence
// of several of the classes.

class EmployeeAndCustomer: public Employee, public Customer {};
class EmployeeAndTaxpayer: public Employee, public Taxpayer {};
class EmployeeAndStockHolder: public Employee, public StockHolder {};
class CustomerAndTaxpayer: public Customer, public Taxpayer {};
...
class EmployeeAndCustomerAndStockHolder: public Employee,
                                         public: Customer,
                                         public: StockHolder
{
};
// and so on...

We are not dealing with a simple nesting of types and subtypes: all employees are people, but some customers and stockholders are not. Subtypes are not mutually exclusive: some people are employees, some are stockholders, and some are both.

We need to perceive entity types as though they did not overlap. We should think of customers and employees as always distinct entities, sometime related by a "is the same person" relationship.
We have to be very careful about the number of entities being modelled. If an employee is a stockholder, there will be two objects for him. Is he two entities? Or is he a some special "employeestockholder " entity? And can he be an "employeestockholdercustomer " entity a later on?

5.1.2. Type Is Not Always Homogeneous

Within a single entity type, there may be facts that are relevant to some instances and not others. Within the current object processing technologies, each object of a given class contains the same fields and methods.
We cover these points in section 3.1.

5.1.3. Entities With Many Objects

We might have too many classes. As we have already mentioned, a common solution to the problem of overlapping classes is to define them as disjoint classes, and allowing an entity (person) to be represented by an object in class.

More generally, there is no discipline preventing the definition of several classes corresponding to one entity type. So, we could have several classes, with each class containing different attributes and methods of the subject entity. Regardless of the motivation, such a configuration is permitted in all object systems. Thus these systems have not a well defined 1:1 correspondence between entities and objects.

5.1.4. Objects With Many Entities

If there is a 1:1 relationship between entities, then a single object might be perceived as "representing" all of them.

example 5.1.3.

class Address;
class Building
{
public:
    Address  address;
    Building();
};

d_Set<d_Ref<Building>> AllBuildings; // The Building class extent.

Since each address occurs in exactly one building object, one could view these objects as representing addresses as well as buildings.

5.2 RELATIONSHIPS


5.2.1 Many Representations Of Relationships

A binary relationship is a very simple concept. It is a named link between two entities. However, there are many ways to implement binary relations in current object technology.

Most of these ways involve pairing identifiers (references) of the two entities in one object. It might be in the object representing one entity or the other. It might be in a separate object representing the relationship itself. It might be embedded in an object representing some other entity. These ways correspond to several combinations in which the two entity references might occur in an object.

example 5.2.1.1

class Department
{
public:
    Project project ;   // pair  { this, project }

    Department();
};

class Department
{
public:
    d_Ref <Project> project;  // pair  { this, project }

    Department();
};

class Assignment
{
public:
    d_Ref <Department> department;
    d_Ref <Project>    project;   // pair  { department, project }

    Assignment();
};

In addition, a relationship might be presented indirectly, being implied by other relationships.

Suppose that projects are assigned to single departments. Each employee works on all of his department's projects. Then an employee's assignment to a department and a project's assignment to the department together imply the employee's working on the project.

example 5.2.1.2

class Employee
{
public:
    d_Ref <Department> department;

    Employee();
};

class Department
{
public:
    d_Ref <Project> project;   // pair  { this, project }

    Department();
};

The problem is what to do with this variety of alternatives? Why it is necessary to make choices and what are the criteria?

5.2.2 Relationships Are Not Described

There is no regular way to reflect the name of the relationship in the description. Sometimes it is a class name, sometimes it is a field name, and sometimes it does not occur at all (for example, implied relationships). When the relationship names occur as a field name, there is no discipline.

5.2.3 Attributes

There is no an effective way to define "attributes", or to distinguish them from relationships. We can write:

class Employee
{
 public:
    d_String  Department;

    Employee();
};

or

class Department;
class Employee
{
 public:
    Department  department;

    Employee();
};

or

class Department;
class Employee
{
 public:
    d_Ref<Department> department;

    Employee();
};

or,

class Department;
class Employee
{
 public:
    d_Set <d_Ref<Department>>  departments;
    Employee();
};

If a field value is the reference or pointer of some object, then it represents a relationship, otherwise it is an attribute. So we have this need to map things into objects seems to be the main force motivating a distinction between attributes and relationships.


6. CONCLUSIONS

We have outline a number of ways in which the current object data models fail to model the semantics of information accurately and unambiguously. Other models deal with these problems, with varying degrees of success. A discussion of such models is beyond the scope of this paper. For example, you can look at [8], [9].


7. REFERENCES

[1] POET Technical Reference. 1996, POET Software.

[2] F. Bancilhon and G. Ferran. ODMG-93: the Object Database Standard. In Proceedings of 13th World Computer Congress 94, Volume 2, IFIP, 1994. Technical Report No 12, O2 Technology.

[3] F. Bancilhon and G. Ferran. Object Databases and the ODMG Standard. Object Magazine, 4(9), 1995. Technical Report No 15, O2 Technology.

[4] J. R. Abrial. Data semantics, in Database Management Systems, J.W. Klimbie and K. L. Koffeman,eds., North Holland, New York, 1974.

[5] W. Kent. Limitations of Record-Based Information Models, in ACM Transactions of Database Systems, Vol.4, No. 1, 1979.

[6] M. E. Senko. DIAM as a detailed example of the ANSI SPARC architecture, in Modelling in Data Base Management Systems, G.M. Nijssen, ed., North-Holland Publishing, Amsterdam,1976.

[7] G. Booch. Object oriented design with applications. The Behjamin/Cummings Publishing, 1991.

[8] A. Girow. Objects and Binary Relations, in Object Currents Vol. 1, Issue 6, SIGS Publications, NY, 1996.

[9] A. Girow. Binary Relations Approach to building Object Data Model, in Object Currents Vol. 1, Issue 11, SIGS Publications, NY, 1996.




©1997 SIGS Publications, Inc., New York, NY, USA. All Rights Reserved. 1