LANG_UPDTAB Package
This package supports management of table data.
This package includes the following routines:
UPDATE_TABLE A generic routine for copying, moving and comparing data in tables.
CONVERT_NUMBER TO_NUMBER which returns NULL instead of VALUE_ERROR
CONVERT_DATE TO_DATE which returns NULL instead of VALUE_ERROR
QUOTE Returns a string enclosed in single-quotes
CONCAT_IF Appends to a string if a condition is true
UPDATE_TABLE is most important and extensive procedure in the package. The other routines are fairly straightforward and provided as a convenience for and support of using UPDATE_TABLE (QUOTE and CONCAT_IF are used to help construct WHERECLAUSE parameters, while the CONVERT.. functions can be used to construct views to map/format a source table to the destination table). The rest of this documentation is dedicated to describing the UPDATE_TABLE procedure in detail.
UPDATE_TABLE
A generic routine for copying, moving and comparing data in tables.
This procedure provides a truly generic routine for copying data from one table to another with "like" structures. For the procedure to work, only "source" and "destination" tables have to be specified. The source and target tables do not have to be identical, only column(s) which have the same names in BOTH tables are considered during the update (warnings will be issued if the columns do not match in data-type or size, or there are NOT NULL columns in the target table which are found the source, etc.) The procedure dynamically generates an anonymous PL/SQL block the perform the update based on the underlying structure of the tables (from the data dictionary), a unique key (which can also be determined from the data dictionary), and the flags/parameters you specify. Because the code is generated on the fly base on the data dictionary, errors caused by lengthy column list in INSERT and UPDATE statements are eliminated--and if one of the tables is modified, the procedure will accommodate it automatically at runtime. Dynamic code generation also allows us to include many flags and parameters (most of which have default values) which provide a great amount of flexibility and features--you will find that you can fit this procedure into almost any situation--without costing performance. Here is a brief feature summary:
INSTALLATION
Installation Requirements
To Install
Connect to schema who tables you want to manage with SQL*Plus.
SQL> @langutab.sql (package spec)
SQL> @langutab.plb (package body)
USAGE NOTES
The UPDATE_TABLE procedure contains 19 parameters (and the FLAGS parameter can contain 19 different flags as well!) However, most situations only require you to use four or five of those parameters, and only three (STAB, DTAB, and FLAGS) are mandatory. The other parameters provide additional control and a great deal of flexibility to adapt to many different situations.
When using, keep in mind the following:
With that in mind, here are some recommendations for using the package:
SQL> SET SERVEROUTPUT ON SIZE 10240
SQL> exec lang_updtab.update_table(dtab=>'EMPLOYEES', stab=>'OLD_EMPLOYEES', FLAGS=>’ie’);
In SQL*Plus, if you don't SET SERVEROUTPUT ON (10240 or larger recommended) before running the procedure, you will not see any output. All output (including an execution report) is sent to DBMS_OUTPUT and can be retrieved using this package from any tool (SQL*Plus outputs it to the screen after the procedure completes.) In other tools (such as Forms or PRO*C), you can use DBMS_OUTPUT.GET_LINE procedure to retrieve and display the output in an appropriate way.
If the procedure submitted is as a job (via the "j" flag) the user must COMMIT before the job will begin executing. This is the normal behavior of the DBMS_JOB package.
A complete description of each parameter and flag is described in the next section.
You only need to know three parameters to start using the procedure
DTAB name of target / destination table
STAB name of source table
FLAGS specify whether to insert and/or update records
Here is how to INSERT all the rows from the OLD_EMPLOYEES table that do not exist in(to) the EMPLOYEES table, based on the PRIMARY KEY of EMPLOYEES (and output any errors such as foreign key violations):
SQL> set serveroutput on size 10240
SQL> exec lang_updtab.update_table(dtab=>’EMPLOYEES’, stab=>’OLD_EMPLOYEES’, flags=>’ie’)
If you wanted to also update existing records you would add a ‘U’ to the flags parameter. To only update rows that have changed, add a ‘C’. If you want to see what would happen without actually doing and INSERT/UPDATEs, add an ‘R’. Its that simple.
HOW IT WORKS
How does it work? Well, the procedure takes the parameters you supply it and generates an anonymous PL/SQL block to perform the operation you request, along with collecting statistics to provide and execution report of what happened. The PL/SQL block is then executed dynamically using the DBMS_SQL package. The basic algorithm for the generated code is quite simple (you actually have two to choose from):
PSEUDO CODE FOR ALGORITHM #1 (default; "row-based" or "for-cursor-loop" based algorithm)
if isflag(‘D’) then
for each row in dtab where not exists in stab
loop
delete where current of dtab;
end loop;
end if;
for each row in STAB
loop
if row_exists_in_dtab then
if isflag(‘U’) then
if not isflag(‘C’) or row_is_changed then
update table
set col1=stab.col1
, col2=stab.col2
...
where current of stab;
end if
end if;
else
if isflag(‘I’) then
insert into table (col1, col2, ...)
values (stab.col1, stab.col2);
end if;
end if;
end loop;
PSEUDO CODE FOR ALGORITHM #2 (via "S" flag; "set-based" or "single-SQL statement" algorithm)
if isflag(‘D’) then
delete from dtab
where not exists (
select 0 from stab where stab.key = dtab.key
);
end if;
if isflag(‘U’) then
update dtab
set (col1, col2, ...) = (
select col1, col2
from stab
where key = dtab.key
)
where exists (
select 0
from stab
where key = dtab.key
);
end if;
if isflag(‘I’) then
insert into dtab
select field1, field2, ...
from stab
where not exists (
select 0
from dtab
where key = stab.key
);
end if;
The actual code generated will be a little more sophisticated but basically follows the same flow. Information about the columns and constraints of each table is retrieved from the data dictionary. The procedure also relies heavily on the DBMS_SQL package for its internal operation. If you want to see the actual code being generated, simply add an ‘L’ to the FLAGS parameter—the entire block will be sent to DBMS_OUTPUT instead of being executed. You can spool the output to a file and view or customize the code as you wish (Note: the code is generated for compactness, not readability—however, it is not difficult to follow or understand.)
Some code optimization does occur during generation. The code is generated to be efficient PL/SQL. Unnecessary code is removed depending on flags specified or not specified at run-time.
The drawback of generating new code each time is the added cost of generating and re-parsing each time the procedure executes. UPDATE_TABLE has been designed to reduce generation time as much as possible, but the actual time depends totally on your server and database. On a medium or high-end server, the time is typically < 2 seconds.
PARAMETERS
This section describes the parameters for the UPDATE_TABLE procedure. UPDATE_TABLE has 19 parameters, and the FLAGS parameter excepts up to 19 flags (which act as additional boolean parameters.) Note that all the parameters have default values. The only required parameters are DTAB, STAB, and FLAGS.
procedure update_table(
-- REQUIRED (EVEN THOUGH DEFAULT)
dtab varchar2 default null -- DESTINATION TABLE
, stab varchar2 default null -- SOURCE TABLE
, flags varchar2 default 'ie' -- CONTROL FLAGS (see below)
-- OPTIONAL
, keycols varchar2 default null -- PRIMARY KEY (comma-delimited)
, whereclause varchar2 default null -- RECORDS TO EXCLUDE FROM SOURCE
, exclcmp varchar2 default null -- EXCLUDE COLUMNS IN UPDATE/COMPARE
, exclupd varchar2 default null -- EXCLUDE COLUMNS FROM UPDATE
, exclins varchar2 default null -- EXCLUDE COLUMNS FROM INSERT
, rbseg varchar2 default null -- SELECT ROLLBACK SEGMENT
, limtot number default 0 -- STOP WHEN n RECORDS PROCESSED
, limupd number default 0 -- STOP WHEN n RECORDS INS/UPD
, limerr number default 0 -- STOP WHEN n ERRORS OCCURRED
, commitpoint number default 0 -- COMMIT EVERY n RECORDS
, oncommit varchar2 default null -- EXECUTE SQL ON COMMIT
, skiprec number default 0 -- SKIP N ROWS
, btab varchar2 default null -- REPORT EXCEPTIONS TO THIS TABLE
, next_date date default null -- SUBMIT DATE
, interval varchar2 default null -- SUBMIT INTERVAL
, procname varchar2 default null -- CREATE OR REPLACE AS PROCEDURE
);
DTAB (destination or "target" table; default=NULL (display HELP page))
Name of the destination / target table or view (which changes will be applied to) in the format "[OWNER.]TABLE" (is not case-sensitive. ) If owner not specified then the current (invoking) user is assumed to be the owner. Synonyms are not followed. The table or view must be a local object to which the owner of the package has access.
dtab=>’employees’
STAB (source table; default=NULL)
Name of the source table or view (which changes will be applied from) in the format "[OWNER.]TABLE[@DB]" (is not case-sensitive. ) If owner not specified then the current (invoking) user is assumed to be the owner. Synonyms are not followed. The table or view may be a local or remote object to which the owner of the package has access.
stab=>’employees_stage’
If STAB is remote, you may see a slow down in the time it takes for UPDATE_TABLE to generate the PL/SQL block. This is because data dictionary information for the source table must be looked up in the remote database. You can alleviate this by creating a local view which encapsulates the remote object. Obviously, if STAB is remote, the update itself will probably be slower than (local) normal.
FLAGS (control general, code-generation, and reporting behavior; default=’ie’)
This string indicates 19 additional flags which control how the procedure will operate. They basically operate as BOOLEAN parameters in that specifying a flag turns the option ON. Flags are specified as a set of single characters concatenated together (i.e 'IUCE'.) Each flag is one character from A-Z, are not case sensitive, can be specified multiple times, and can be in any order. At a minimum one of I, U, D, or W must be set. The most common flags are I, U, C, E, W, and R. D, O, and X are rarely used. Some flag combinations do not make sense or conflict with on another, while others can only be used in conjunction with one another (specified in parentheses below). You will be warned of any illegal flag combinations at runtime.
GENERAL
I Insert new rows from source into destination.
A (I) Append on insert without checking if row exists.
U Update rows from source which exist in destination.
C (U) Update rows from source which exist in destination and have at least one column different; using this option can save redo and performance if many existing rows are unchanged. If most rows are changed then it will be more efficient not to use this option.
D Delete rows from destination not found in source (rarely used)
CODE-GENERATION
O Traverse source in key order (requires a sort on the source table)
S Generate as single SQL statements for DELETE, UPDATE, and INSERT instead (algorithm #2) of the for-loop cursor-based default approach; limits options and reporting features, but will typically execute faster for larger updates; parameters like LIMTOT, COMMITPOINT, and EXCEPTIONS have no meaning.
X Uses discrete transactions; see Oracle documentation for the restrictions / limitations of using discrete transactions (you may need to use COMMITPOINT=1, for example.)
P Execute commit-point after N rows processed rather than N rows updated. This can be helpful in preventing
"ORA-01522: Snapshot too old…" errors which can occur during the following conditions: STAB is large, the number of updates is small, the COMMITPOINT is large, the number and/or size of online rollback segments is small, there is high activity on the database.F (I) Filter source to obey foreign key constraints of target table. This feature can be useful when trying to extract "referentially intact" subsets of data, perhaps when populating a test database. This does not handle the problem of populating multiple tables in the proper order to maintain referential integrity—you have to do this on your own. Note: only affects inserts (via "I" flag), and all foreign key constraints are considered whether they are enabled or not. Code is added to the where clause as shown below:
select *
from table1 stab
where ...
/* added with F flag */
and exists (select 0 from parent_table1 where fk1 = stab.fk1)
and exists (select 0 from parent_table2 where fk2 = stab.fk2)
...
Note: the F flag can cause performance problems when STAB is a remote table. The database can have difficulty optimizing some kinds of distributed queries. Also, using this flag will be wasteful if you expect little or no foreign key violations because UPDATE_TABLE already traps the exceptions for you.
N Ignore NULL values in source table, which two effects on generated code:
- values in source table are treated as
- if used with ‘C’ flag then eliminate code checking if source is null and target is not null
The net effect is that NULL values in the source table are not considered as differences and are not propagated to the destination.
ERROR-HANDLING
E Echo all errors during processing; includes UID of offending row and the Oracle error number and message.
B Break on any error. Similar to setting LIMERR=>1 except that with LIMERR the error is counted and the procedure exits normally—with the B flag, the procedure exits with an exception.
W display warnings about differences or incompatibilities between the two tables which could potentially cause errors during the load. The W flag can be specified by itself just to test for these potential problems. Warnings are generated for the following conditions:
01 Column in destination but not in source (normal but reported anyway)
02 Column NOT NULL in destination but not in source (may cause NOT NULL violation on INSERT)
REPORTING
R Report changes only. This powerful flag lets you run the routine to see what would happen without any changes actually being made. Code is generated to count the number of inserts, updates, etc. that would have occurred without the "R" flag. Note that some counts (such as Errors:) cannot be counted without actually doing the insert or update.
V (R) Verbose reporting. Outputs the action (insert/update) of each row processed.
Y (UCR) Include counts of differences by columns.
MISC
L Output code to DBMS_OUTPUT instead of executing. This useful flag will let see the code that the procedure is generating. It can be spooled to a file to be viewed, edited, and executed. Use this flag to see how the procedure generates code or have it generate a template for creating customized routines.
J Using this flag will cause the procedure to submit itself as a job (without the "j" flag) via the DBMS_JOB package. The NEXT_DATE and/or INTERVAL parameters are passed along the parameters of the same names in the DBMS_JOB.SUBMIT procedure.
-- SOME EXAMPLES
flags=>’ie’ -- INSERT IF NOT EXISTS, SHOW ERRORS
flags=>’iuew’ -- ABOVE + UPDATE IF EXISTS
flags=>’iucew’ -- ABOVE + UPDATE ONLY IF EXISTS AND CHANGED
flags=>’iucr’ -- REPORT ONLY
flags=>’iucl’ -- OUTPUT CODE GENERATED TO DBMS_OUTPUT
G Enforce security rights of the "invoking" user on tables accessed by the procedure. Currently, only access to local objects can be enforced—an exception is raised if the source table is in a remote database (you can use view to work around this). Whether the "g" flag is specified or not, final access to the table ultimately depends on the privileges of the owner of the package (see the previous "USAGE NOTES" section).
Note: to check user security requires the package owner to have access to SYS.DBA_TAB_PRIVS and SYS.DBA_SYS_PRIVS views. You will have to create a wrapper procedure to force the G flag to be used by users.
KEYCOLS (manually specify UID columns; default=NULL (determined from PK))
Specifies an alternate case-insensitive, comma-delimited list of columns to serve as the unique identifier (UID) (instead of the PRIMARY KEY) for the algorithm. If this parameter is left blank, then the procedure will try to determine the key columns from the PRIMARY KEY of the destination table in the data dictionary (an available UNIQUE KEY is randomly chosen if no PRIMARY KEY exists.) If you want to use something other than the PRIMARY KEY of the table, you must specify it explicitly. Every key column MUST exist in both tables or an error is generated. If no key columns can be determined, the procedure will exit immediately.
You may want to explicitly specific the key columns if the table has no primary key or there is another unique key besides the primary key you want to use for comparison.
The procedure does not actually force the columns in KEYCOLS to be unique. The algorithm will function regardless, but you may see duplicates inserted or have "double-updates" occur.
keycols=>’clock_num’ -- EMPOYEES table
keycols=>’clock_num,timestamp,job_num’ -- TIME table
WHERECLAUSE (filter source with custom WHERE clause; default=NULL)
Allows you to specify a WHERE clause to filter rows applied from the source table (i.e. STAB.) Make sure to use double single-quote notation to imbed quote marks in the string as shown in the example. The package also provides the convenience function
whereclause=>'state=''MI'''
whereclause=>’state=’||lang_updtab.quote(‘MI’)
If the clause needs to specify a correlated sub-query (a sub-query which references the outer query), you can use the alias "
whereclause=>’exists (select 0 from some_table where some_column=stab.some_column)’
EXCLCMP (exclude from testing for differences on UPDATE; default=NULL)
Case-insensitive, comma-delimited list of columns to exclude from comparison when determining a rows in the destination table is to be updated (which means it is only valid using the "UC" flags)
exclcmp=>’added_user,added_date,updated_user,updated_date’
-- DO NOT COMPARE THESE
Note: the list can be interpreted as an "include" (vs. "exclude") list by pre-pending an exclamation point "!" to the list. Column names can also include the '%' wildcard to exclude/include groups of columns (i.e.
'added%,updated%'. This functionality works with the EXCLCMP, EXCLUPD, and EXCLINS parameters.
EXCLUPD (exclude from UPDATE; default=NULL)
Case-insensitive, comma-delimited list of columns to exclude from updating (valid with "U" flag)
exclupd=>’added_user,added_date’ -- DO NOT UPDATE THESE
EXCLINS (exclude from INSERT; default=NULL)
Case-insensitive, comma-delimited list of columns to exclude from inserting (valid with "I" flag)
exclins=>’a_field’ -- DO NOT INSERT THIS COLUMN
RBSEG (specify rollback segment; default=NULL)
Specify rollback segment to use; a
rbseg=>’RBIG’ -- SET ROLLBACK SEGMENT TO "RBIG"
LIMTOT (limit total number of rows processed; default=0 (unlimited))
Stops when N rows have been processed from source.
limtot=>1000 -- STOP AFTER 100 ROWS READ FROM SOURCE
LIMUPD (limit total number of inserts/updates; default=0 (unlimited))
Stops when N inserts/updates have occurred.
limupd=>1000 -- STOP AFTER 1000 ROWS INSERTED OR UPDATED
LIMERR (limit total number of errors; default=0 (unlimited))
Stops when N errors have occurred.
limerr=>100 -- STOP AFTER 100 ERRORS
COMMITPOINT (commit every N record processed; default=0 (no commits))
Commits after every N inserts/updates. If the "P" flag is set then a commit will happen after every N rows processed. This can help prevent running of rollback space before the update completes.
commitpoint=>100 -- COMMIT AFTER EVERY 100 INSERTS OR UPDATES
ONCOMMIT (execute custom SQL at each COMMITPOINT; default=NULL)
Executes the specified command prior to every commit (used in conjunction with the COMMITPOINT parameter.) One typical use of this feature is to update a "status" table to monitor the update’s progress. Do not include a trailing semi-colon.
oncommit=>’insert into load_status(timestamp,text) ‘
||’values (sysdate, ’’100 rows inserted’’)’;
SKIPREC (skip a number of records before starting to insert/update; default=0)
Skip this number of rows before attempting to apply any changes. Dependent on the physical order Oracle happens to select from source table at particular time.
skiprec=>999 -- JOB BROKE, RESTART AT ROW 1000
BTAB (insert all bad records (errors) into a exceptions table; default=NULL)
Report exceptions to the named "BAD" table instead of the DBMS_OUTPUT (used in conjuction with "e" flag.) The bad table must include all columns (same name and data-type) of the source table (i.e. STAB) or errors will occur.
btab=>’MYTABLE_BAD’ -- INSERT EXCEPTIONS INTO MTABLE_BAD
NEXTDATE (specify start date when submitted as a job; default=NULL (SYSDATE))
Set submit job start time when executed with "j" flag. This parameter is passed as NEXT_DATE to the DBMS_JOB.SUBMIT() procedure. A null value will default to SYSDATE when the DBMS_JOB.SUBMIT is executed--see DBMS_JOB.SUBMIT for details.
nextdate=>SYSDATE+1/24 -- START 1 HOUR FROM NOW
nextdate=>SYSDATE -- START IMMEDIATELY
nextdate=>null -- START IMMEDIATELY (DEFAULT)
INTERVAL (specify job repeat interval; default=NULL)
Sets submit job interval expression. This parameter is passed as INTERVAL to the DBMS_JOB.SUBMIT() procedure. See DBMS_JOB.SUBMIT for details.
interval=>’TRUNC(SYSDATE+1)+6/24’ -- EVERY DAY AT 6:00AM
interval=>’SYSDATE+6/24’ -- EVERY 6 HOURS
interval=>null -- EXECUTE ONLY ONCE (DEFAULT)
PROCNAME (CREATE OR REPLACE as procedure; default=NULL)
Will cause the anonymous PL/SQL block to be created as a procedure in the database. Basically this just adds
procname=>’LOAD_TABLE1’ -- CREATE OR REPLACE PROCEDRE LOAD_TABLE1
Note: CREATE [ANY] PROCEDURE privilege must be granted directly to package owner or PUBLIC.
EXAMPLES
Example 1.
Add new rows from EMPLOYEES_STAGE into EMPLOYEES.
lang_updtab.update_table(
dtab=>’employees’
, stab=>’employees_stage’
, keys=>’clock_num’
, flags=>’ie’
);
Example 2.
Same as above but update rows which exist and have changed. Also commit after every 100 inserts / updates.
lang_updtab.update_table(
dtab=>’employees’
, stab=>’employees_stage’
, keys=>’clock_num’
, flags=>’iuce’
, commitpoint=>100
);
Example 3.
Production table now contains ADDED/UPDATED_DATE/USER columns so ignore them during compare (Note: if these column don’t exist in stage table then you don’t need to list them in the EXCLCMP parameter.) Also, do not touch the ADDED_DATE/USER columns on update.
lang_updtab.update_table(
dtab=>’employees’
, stab=>’employees_stage’
, keys=>’clock_num’
, flags=>’iuce’
, exclcmp=>’added_date,added_user,updated_date,updated_user’
, exclupd=>’added_date,added_user’
, commitpoint>=100
);
Example 4.
Refresh a "snapshot" in a remote database based on the EMPLOYEES table. The table is created identically in the remote database and the procedure is "submitted" in the schema that owns the table. The snapshot is refreshed every hour.
lang_updtab.update_table(
dtab=>’employees’
, stab=>’employees@master’
, flags=>’iucj’
, rbsseg=>'big_rb'
, interval=>'sysdate+1/24'
);
SAMPLE OUTPUT
Here is a sample summary report (output is sent via DBMS_OUTPUT.)
Version 0.8.10. Tue Dec 16, 1997.
Unique key is (CLOCK_NUM)
Updated EMPLOYEES <== EMPLOYEE_STAGE (flags=IEUCR)
(REPORTING ONLY--NO CHANGES MADE!!!)
0 records skipped
516 records processed
502 records existed
0 records deleted
152 records updated
14 records inserted
2 commits
0 errors updating
0 errors inserting
Started: Tue Aug 26, 1997 11:05:45
Finished: Tue Aug 26, 1997 11:05:47
Elapsed: 0.03 mins
26 recs/s (93,600 recs/hr)
COMMON USES AND TECHNIQUES
Below lists some other common uses for the UPDATE_TABLE procedure.
Table structures are not identical
Need to update changes to existing records
Moving large amounts of data and need to set COMMITPOINT / RBSEG
Exclude some columns on insert and/or update
Take advantage of any of the other features in UPDATE_TABLE
Create a stage table which matches the raw data file
SQL*load the data file into the stage table
Create of view (if necessary) to map the stage table to the production table
Use UPDATE_TABLE to post the changes to the production table
Use UPDATE_TABLE with "IUCR" to count number of different rows
Add "Y" flag to tally differences for each column
Rename the existing table
Create the new table
Use UPDATE_TABLE to move data to the new table
Create copy of table in remote database
Schedule UPDATE_TABLE as job to refresh
Advantages:
Simple to setup
Is primary-key based
Use WHERECLAUSE to include subsets of data
Structure of master can be changed without affecting snapshot (*obviously, there are some restrictions to this, for example: primary key and mandatory columns cannot be removed)
Snapshot is a regular table which can have indexes, constraints, etc.
Snapshot source or target does not require extra views, logs, or triggers
Conveniently scheduled as a job with the "J" flag
Best used when:
Small changes in source table
Source table is small / medium sized or contains indexed UPDATED_DATE-type column
Need to maintain constraints on snapshot
Can create new or update existing with changes only
Can access remote database
Use COMMITPOINT to move large amounts of data
Use WHERECLAUSE to move subsets of data
Future version of the package may include additional support for some or all of the above operation built around the UPDATE_TABLE procedure.
MISCELLANEOUS TIPS
Because UPDATE_TABLE generates a new pl/sql block each time it executes, it is more suited toward batch-type operations (it would probably not be reasonable to invoke from a "FOR EACH ROW " trigger, for example.) However, you can use it to generate template code or a stored procedure as an alternative.
Use triggers on the destination table to implement customized INSERT / UPDATE logic.
Use views on the source table to implement customized formatting / mapping from source to the destination table.
LIMITATIONS