The characteristic that differentiates a DBMS from an RDBMS is that the RDBMS provides a set-oriented database language. For most RDBMSs, this set-oriented database language is SQL. Set oriented means that SQL processes sets of data in groups.
NOTE: Nonprocedural means what rather than how. For example, SQL describes what data to retrieve, delete, or insert, rather than how to perform the operation.
Two standards organizations, the American National Standards Institute (ANSI) and the International Standards Organization (ISO), currently promote SQL standards to industry. The ANSI-92 standard is the standard for the SQL used throughout this book. Although these standard-making bodies prepare standards for database system designers to follow, all database products differ from the ANSI standard to some degree. In addition, most systems provide some proprietary extensions to SQL that extend the language into a true procedural language. We have used various RDBMSs to prepare the examples in this book to give you an idea of what to expect from the common database systems. (We discuss procedural SQL--known as PL/SQL--on Day 18, "PL/SQL: An Introduction," and Transact-SQL on Day 19, "Transact-SQL: An Introduction.")
1. Information rule-- All information in a relational database (including table and column names) is represented explicitly as values in tables.
2. Guaranteed access--Every value in a relational database is guaranteed to be accessible by using a combination of the table name, primary key value, and column name.
3. Systematic null value support--The DBMS provides systematic support for the treatment of null values (unknown or inapplicable data), distinct from default values, and independent of any domain.
4. Active, online relational catalog--The description of the database and its contents is represented at the logical level as tables and can therefore be queried using the database language.
5. Comprehensive data sublanguage--At least one supported language must have a well-defined syntax and be comprehensive. It must support data definition, manipulation, integrity rules, authorization, and transactions.
6. View updating rule--All views that are theoretically updatable can be updated through the system.
7. Set-level insertion, update, and deletion--The DBMS supports not only set-level retrievals but also set-level inserts, updates, and deletes.
8. Physical data independence--Application programs and ad hoc programs are logically unaffected when physical access methods or storage structures are altered.
9. Logical data independence--Application programs and ad hoc programs are logically unaffected, to the extent possible, when changes are made to the table structures.
10. Integrity independence--The database language must be capable of defining integrity rules. They must be stored in the online catalog, and they cannot be bypassed.
11. Distribution independence--Application programs and ad hoc requests are logically unaffected when data is first distributed or when it is redistributed.
12. Nonsubversion--It must not be possible to bypass the integrity rules defined through the database language by using lower-level languages.
Figure 1.1.
Codd's relational database management system.
This method has several advantages and many disadvantages. In its favor is the fact that the physical structure of data on a disk becomes unimportant. The programmer simply stores pointers to the next location, so data can be accessed in this manner. Also, data can be added and deleted easily. However, different groups of information could not be easily joined to form new information. The format of the data on the disk could not be arbitrarily changed after the database was created. Doing so would require the creation of a new database structure.
Codd's idea for an RDBMS uses the mathematical concepts of relational algebra to break down data into sets and related common subsets.
Because information can naturally be grouped into distinct sets, Dr. Codd organized his database system around this concept. Under the relational model, data is separated into sets that resemble a table structure. This table structure consists of individual data elements called columns or fields. A single set of a group of fields is known as a record or row. For instance, to create a relational database consisting of employee data, you might start with a table called EMPLOYEE that contains the following pieces of information: Name, Age, and Occupation. These three pieces of data make up the fields in the EMPLOYEE table, shown in Table 1.1.
Name | Age | Occupation |
Will Williams | 25 | Electrical engineer |
Dave Davidson | 34 | Museum curator |
Jan Janis | 42 | Chef |
Bill Jackson | 19 | Student |
Don DeMarco | 32 | Game programmer |
Becky Boudreaux | 25 | Model |
SELECT * FROM EMPLOYEERemember that the exact syntax is not important at this point. We cover this topic in much greater detail beginning tomorrow.
Because the various data items can be grouped according to obvious relationships (such as the relationship of Employee Name to Employee Age), the relational database model gives the database designer a great deal of flexibility to describe the relationships between the data elements. Through the mathematical concepts of join and union, relational databases can quickly retrieve pieces of data from different sets (tables) and return them to the user or program as one "joined" collection of data. (See Figure 1.2.) The join feature enables the designer to store sets of information in separate tables to reduce repetition.
The join feature.
Figure 1.3 shows a union. The union would return only data common to both sources.
Figure 1.3.
The union feature.
Here's a simple example that shows how data can be logically divided between two tables. Table 1.2 is called RESPONSIBILITIES and contains two fields: NAME and DUTIES.
Name | Duties |
Becky Boudreaux | Smile |
Becky Boudreaux | Walk |
Bill Jackson | Study |
Bill Jackson | Interview for jobs |
Name | Age | Occupation | Duties |
Becky Boudreaux | 25 | Model | Smile |
Becky Boudreaux | 25 | Model | Walk |
In addition to the development of the relational database model, two technologies led to the rapid growth of what are now called client/server database systems. The first important technology was the personal computer. Inexpensive, easy-to-use applications such as Lotus 1-2-3 and Word Perfect enabled employees (and home computer users) to create documents and manage data quickly and accurately. Users became accustomed to continually upgrading systems because the rate of change was so rapid, even as the price of the more advanced systems continued to fall.
The second important technology was the local area network (LAN) and its integration into offices across the world. Although users were accustomed to terminal connections to a corporate mainframe, now word processing files could be stored locally within an office and accessed from any computer attached to the network. After the Apple Macintosh introduced a friendly graphical user interface, computers were not only inexpensive and powerful but also easy to use. In addition, they could be accessed from remote sites, and large amounts of data could be off-loaded to departmental data servers.
During this time of rapid change and advancement, a new type of system appeared. Called client/server development because processing is split between client computers and a database server, this new breed of application was a radical change from mainframe-based application programming. Among the many advantages of this type of architecture are
The most commonly used statement in SQL is the SELECT statement (see Day 2, "Introduction to the Query: The SELECT Statement"), which retrieves data from the database and returns the data to the user. The EMPLOYEE table example illustrates a typical example of a SELECT statement situation. In addition to the SELECT statement, SQL provides statements for creating new databases, tables, fields, and indexes, as well as statements for inserting and deleting records. ANSI SQL also recommends a core group of data manipulation functions. As you will find out, many database systems also have tools for ensuring data integrity and enforcing security (see Day 11, "Controlling Transactions") that enable programmers to stop the execution of a group of commands if a certain condition occurs.
NOTE: The term SQL can be confusing. The S, for Structured, and the L, for Language, are straightforward enough, but the Q is a little misleading. Q, of course, stands for "Query," which--if taken literally--would restrict you to asking the database questions. But SQL does much more than ask questions. With SQL you can also create tables, add data, delete data, splice data together, trigger actions based on changes to the database, and store your queries within your program or database.Unfortunately, there is no good substitute for Query. Obviously, Structured Add Modify Delete Join Store Trigger and Query Language (SAMDJSTQL) is a bit cumbersome. In the interest of harmony, we will stay with SQL. However, you now know that its function is bigger than its name.
NOTE: In addition to serving as an SQL reference, this book also contains many practical software development examples. SQL is useful only when it solves your real-world problems, which occur inside your code.
Most examples are directed toward the beginning programmer or first-time user of SQL. We begin with the simplest of SQL statements and advance to the topics of transaction management and stored procedure programming. The Oracle RDBMS is distributed with a full complement of development tools. It includes a C++ and Visual Basic language library (Oracle Objects for OLE) that can link an application to a Personal Oracle database. It also comes with graphical tools for database, user, and object administration, as well as the SQL*Loader utility, which is used to import and export data to and from Oracle.
We chose the Personal Oracle7 RDBMS for several reasons:
NOTE: Personal Oracle7 is a scaled-down version of the full-blown Oracle7 server product. Personal Oracle7 allows only single-user connections (as the name implies). However, the SQL syntax used on this product is identical to that used on the larger, more expensive versions of Oracle. In addition, the tools used in Personal Oracle7 have much in common with the Oracle7 product.
Figure 1.4.
Oracle's SQL*Plus.
TIP: Keep in mind that nearly all the SQL code given in this book is portable to other database management systems. In cases where syntax differs greatly among different vendors' products, examples are given to illustrate these differences.
Figure 1.5.
Microsoft Query.
Figure 1.6.
ODBC structure.
The unique feature of ODBC (as compared to the Oracle or Sybase libraries) is that none of its functions are database-vendor specific. For instance, you can use the same code to perform queries against a Microsoft Access table or an Informix database with little or no modification. Once again, it should be noted that most vendors add some proprietary extensions to the SQL standard, such as Microsoft's and Sybase's Transact-SQL and Oracle's PL/SQL.
You should always consult the documentation before beginning to work with a new data source. ODBC has developed into a standard adopted into many products, including Visual Basic, Visual C++, FoxPro, Borland Delphi, and PowerBuilder. As always, application developers need to weigh the benefit of using the emerging ODBC standard, which enables you to design code without regard for a specific database, versus the speed gained by using a database specific function library. In other words, using ODBC will be more portable but slower than using the Oracle7 or Sybase libraries.
The ANSI 1992 standard (SQL-92) extended the language and became an international standard. It defines three levels of SQL compliance: entry, intermediate, and full. The new features introduced include the following:
Call-level interfaces should not be a new concept to application programmers. When using ODBC, for instance, you simply fill a variable with your SQL statement and call the function to send the SQL statement to the database. Errors or results can be returned to the program through the use of other function calls designed for those purposes. Results are returned through a process known as the binding of variables.
A Until recently, if you weren't working on a large database system, you probably had only a passing knowledge of SQL. With the advent of client/server development tools (such as Visual Basic, Visual C++, ODBC, Borland's Delphi, and Powersoft's PowerBuilder) and the movement of several large databases (Oracle and Sybase) to the PC platform, most business applications being developed today require a working knowledge of SQL.
Q Why do I need to know anything about relational database theory to use SQL?
A SQL was developed to service relational databases. Without a minimal understanding of relational database theory, you will not be able to use SQL effectively except in the most trivial cases.
Q All the new GUI tools enable me to click a button to write SQL. Why should I spend time learning to write SQL manually?
A GUI tools have their place, and manually writing SQL has its place. Manually written SQL is generally more efficient than GUI-written SQL. Also, a GUI SQL statement is not as easy to read as a manually written SQL statement. Finally, knowing what is going on behind the scenes when you use GUI tools will help you get the most out of them.
Q So, if SQL is standardized, should I be able to program with SQL on any databases?
A No, you will be able to program with SQL only on RDBMS databases that support SQL, such as MS-Access, Oracle, Sybase, and Informix. Although each vendor's implementation will differ slightly from the others, you should be able to use SQL with very few adjustments.
2. How can you tell whether a database is truly relational?
3. What can you do with SQL?
4. Name the process that separates data into distinct, unique sets.