Improve Performance and Data Consistency with Oracle Workspaces

Andrew Simkovsky's picture
articles: 

Managing large DML operations against a live database is one of the most common challenges Oracle professionals face on a regular basis. Such operations often cause serious performance problems and can cause inconsistencies in the data, especially when many referential integrity constraints exist among the tables. This article will explore how Workspaces can be used to solve these problems.

A long-running DML operation can consume vast amounts of database and server resources, and can acquire transaction locks on many pieces of data simultaneously. Under normal circumstances, a large change to the data must be scheduled during periods of low database activity, or worse yet, the database must be made completely unavailable until the operation is complete.

Additionally, there may be a need to generate multiple versions of a set of data for reporting purposes without modifying the actual data. Company management may want to generate "what-if" scenarios based on the data to assist in strategic decisions. For example, a product vendor may want to consider reassigning key sales accounts to different sales representatives, or move accounts from one assigned region to another. Management would need the capability of temporarily making changes to reassign accounts without making changing the actual data, which could potentially affect the accuracy of "true" reports. Additionally, most reporting capabilities are driven off of large data warehouses with hundreds of gigabytes of historical data. Making multiple copies of complete data sets would be extremely cost prohibitive and very time consuming.

To address issues like these, Oracle9i introduced the concept of the database workspace. A workspace is essentially a scratch area where large DML operations can be performed without affecting any transactions that might be running. The figure below illustrates the concept of the database workspace.

concept of the database workspace

In this figure, several a live table exists in the database. A consistent image of a table has been made in a workspace. DML operations can be performed against the table in the workspace to avoid data inconsistencies in the live data, and to reduce the possibility of performance issues caused by DML locks. The changes in the workspace will be transparent to the users, who will only see the live version of the tables. Once the changes to the new version of the data have been completed, the data in the workspace can be either merged back into the live tables, or simply blown away. If the workspace is merged with its previous version, any conflicts in data consistency can be managed by a specific set of criteria that will determine which version of a record will be kept. The new version of the entire data set is then available to the users.

The data in a workspace is completely isolated from data in any other workspace. Also, a user can access only the data that exists in the workspace the user is currently working from. To access the data in another workspace, the user must first move into that workspace. This eliminates the possibility of users accessing versioned data, and also eliminates the possibility of the data from one version of a table intermingling with the data of another version.

From a workspace perspective, the actual database itself that contains the live data is already in a workspace, called the LIVE workspace, and is treated as the highest version of the data. By default, all users start in the LIVE workspace when they first connect to the database. The users will only be allowed to access the data in this workspace, and no other versions of the data will even be visible from this workspace. If a user wants to access another version of the data, he or she must first move into the workspace that contains that version.

For a table that has been versioned-enabled, the live copy of the data is kept in the LIVE workspace. If a row in the table is changed for any reason, instead of being overwritten, a new copy of the row is made to one of the workspaces. The new copy of the row is visible only from within the workspace that modified the row. When sitting in that workspace, a user can query the table as normal. The results from the query will reflect all rows from the table, except in their modified form. If the user moves back to the LIVE workspace and queries the tables again, only the original version will be visible. To illustrate the concept of the workspace, consider the following illustrations.

the LIVE workspace

In this illustration, the database itself, considered the LIVE workspace, contains two tables, ACCOUNTS and SALESREPS. The asterisk indicates that the user is currently sitting in the LIVE workspace.

A user then creates a new workspace, called WORKSPACE1.

WORKSPACE1 created

In order to make a new version of the data the user must then move into the workspace. In this next illustration, the asterisk is now in WORKSPACE1, indicating that the user has moved into the workspace.

User moved into workspace

The user executes an UPDATE statement to modify the ACCOUNTS table. The user reassigns account number 5 to sales rep 200. After this update is complete, the user issues a COMMIT statement.

New copies of all rows affected by the UPDATE statement are created in the WORKSPACE1 workspace

New copies of all rows affected by the UPDATE statement are created in the WORKSPACE1 workspace. These modified rows are only visible from within WORKSPACE1. Any user sitting in WORKSPACE1 can query the ACCOUNTS table as they normally would, and would see all rows in the table, which will reflect any changes made from within that workspace. The user can now run a report joining the ACCOUNTS table with tables containing historical sales data to show the possible performance of the sales reps if the account is reassigned from one rep to another. The key here is that the original version of the data is never modified throughout the entire operation.

Workspaces can be nested, allowing users to create multiple branches of versioned data. When modifying data in a workspace, Oracle will automatically look in the next higher workspace to find any row that needs to be updated. If the row does not exist in the next higher workspace, it will keep moving up through workspaces until it finds it, which may be all the way up in the LIVE workspace. Once found, Oracle will make a copy of that row into the workspace the user is in and modify it as instructed.

If the user decides that the version in WORKSPACE1 reflects the results he is looking for, he can then merge the new version from WORKSPACE1 into LIVE to make it visible to all users. However, if the user decides not to keep the new version, he can just move back to the LIVE workspace, and simply drop WORKSPACE1.

It is even possible to execute a single query across all versions to see the difference. To do this, you would first have to move back to the highest workspace that is common to all the workspaces you want to query from. The query would then be executed across all workspaces, providing a comprehensive report based on the differences across all versions of the data.

So let's see this feature in action using some sample data!

Your company wants to generate some reports to assist them in deciding whether or not to reassign one of their key sales account from one salesrep to another. The three tables that they want to query are SALESREP, ACCOUNTS, and ANNUAL_TRX. The SALESREP table contains the following data:

ORA92> select * from salesrep;

SALESREP_NBR LAST_NAME
------------ ---------------------------
100 SMITH
200 JONES
300 JOHNSON
400 KIRBY

The ACCOUNTS table stores the list of customer accounts, along with the ID numbers of the salesreps that manage them, as shown here:

ORA92> select * from accounts;

ACCOUNT_NBR SALESREP_NBR ACCOUNT_NAME
----------- ------------ -----------------------
5 100 ABC INC
10 200 XYZ INC
20 300 PQR INC
30 400 MNO INC
40 200 GHI INC

The ANNUAL_TRX table provides a summary of all transactions for the year 2003 for each account:

ORA92> select * from annual_trx;

ACCOUNT_NBR YEAR TRX_TOTAL
----------- ---------- ----------
5 2003 15523.04
10 2003 5645.5
20 2003 45328.01
30 2003 5464.04
40 2003 48965.25

By joining these three tables together, we can generate a report to show the overall performance of each salesrep that works for the company:

ORA92> select s.last_name, sum(t.trx_total)
  2  from salesrep s, accounts a, annual_trx t
  3  where s.salesrep_nbr = a.salesrep_nbr and
  4  a.account_nbr = t.account_nbr
  5  group by s.last_name
  6  /

LAST_NAME SUM(T.TRX_TOTAL)
--------------- ----------------
JOHNSON 45328.01
JONES 54610.75
KIRBY 5464.04
SMITH 15523.04

As you can see by these results, Jones is performing very well, while Kirby seems to be struggling. You may have noticed by the output from the ACCOUNTS table that Jones is currently carrying two accounts, XYZ Inc and GHI Inc. However, all other salesreps are carrying only one account.

In an effort to balance out the revenue across all salesreps, management wants to see what the revenue per salesrep would look like if they reassigned XYZ Inc to Kirby. However, the original data must not be modified since other applications are currently using it for reporting purposes. This is where workspaces come in.
By creating a workspace, you can reassign XYZ Inc to Kirby in an isolated environment without affecting the accuracy of other reports. All workspace functionality is implemented through the use of a PL/SQL package supplied by Oracle, called DBMS_WM. You would use this package to enable version management for tables, create and manage workspaces, and navigate from one workspace to another.

Before changing any data in the ACCOUNTS table, you would first enable versioning for that table with the ENABLEVERSIONING procedure in the DBMS_WM package:

ORA92> exec dbms_wm.enableversioning('ACCOUNTS')

PL/SQL procedure successfully completed.

When this procedure is executed, Oracle automatically creates all supporting objects and structures required to support versioned data for this table. These objects include auxilliary tables to store changes, a view over the original table, and an INSTEAD-OF trigger on the view.

Next, you would create a workspace using the CREATEWORKSPACE procedure:

ORA92> exec dbms_wm.createworkspace('WORKSPACE1')

PL/SQL procedure successfully completed.

This procedure creates a new workspace inside the workspace the user is currently sitting in. Since the user currently sitting in the LIVE workspace, which is the default for all new connections, the WORKSPACE1 workspace was created within LIVE.

Just creating the workspace, however, does not automatically move the user into that new workspace. That is done with the GOTOWORKSPACE procedure:

ORA92> exec dbms_wm.gotoworkspace('WORKSPACE1')

PL/SQL procedure successfully completed.

This moves the user's session into the WORKSPACE1 workspace. Any changes to versioned-enabled tables made from within WORKSPACE1 will only be visible from within this workspace. You can determine which workspace you are currently working from with the GETWORKSPACE function:

ORA92> select dbms_wm.getworkspace
  2  from dual;

GETWORKSPACE
----------------------------------------
WORKSPACE1

You can now modify any table you want from within this workspace, just as you would if you were in the LIVE workspace. Modifying a version-enabled table will cause Oracle to generate copies of the modified rows into the workspace for the exclusive use of users in that workspace. The changes will not be visible to any other workspace outside of WORKSPACE1.

You would now modify the ACCOUNTS table, changing the SALESREP number for XYZ Inc from 200 (Jones) to 400 (Kirby):

ORA92> update accounts
  2  set salesrep_nbr = 400
  3  where account_nbr = 10;

1 row updated.

ORA92> commit;

Commit complete.

You can now see the results of your changes in the ACCOUNTS table:

ORA92> select * from accounts;

ACCOUNT_NBR SALESREP_NBR ACCOUNT_NAME
----------- ------------ -----------------
5 100 ABC INC
20 300 PQR INC
30 400 MNO INC
40 200 GHI INC
10 400 XYZ INC

As shown by this output, XYZ Inc has been reassigned from salesrep 200 to salesrep 400. This change is only visible from within WORKSPACE1 and has not affected any other reporting currently happening in the database. We can now run our annual revenue report to show how this change would affect salesrep performance:

ORA92> select s.last_name, sum(t.trx_total)
  2  from salesrep s, accounts a, annual_trx t
  3  where s.salesrep_nbr = a.salesrep_nbr and
  4  a.account_nbr = t.account_nbr
  5  group by s.last_name
  6  /

LAST_NAME SUM(T.TRX_TOTAL)
--------------- ----------------
JOHNSON 45328.01
JONES 48965.25
KIRBY 11109.54
SMITH 15523.04

The annual revenue for Jones has dropped but the revenue for Kirby has increased.

Let's move back to the LIVE workspace to see the original data:

ORA92> exec dbms_wm.gotoworkspace('LIVE')

PL/SQL procedure successfully completed.

ORA92> select s.last_name, sum(t.trx_total)
2 from salesrep s, accounts a, annual_trx t
3 where s.salesrep_nbr = a.salesrep_nbr and
4 a.account_nbr = t.account_nbr
5 group by s.last_name
6 /

LAST_NAME SUM(T.TRX_TOTAL)
--------------- ----------------
JOHNSON 45328.01
JONES 54610.75
KIRBY 5464.04
SMITH 15523.04

As you can see here, the original version of the data has remained untouched!

Oracle even provides the ability to execute a query that spans across multiple workspaces, allowing you to do a side-by-side comparison of multiple versions of data. This is done by first enabling the use of multiple workspaces with the SETMULTIWORKSPACES procedure, passing in the names of all workspaces you want to query from:

ORA92> exec dbms_wm.setmultiworkspaces('LIVE,WORKSPACE1')

PL/SQL procedure successfully completed.

When this produre is executed, Oracle enables the ability to query from a new view for each versioned enabled table that has modified data in the list of workspaces provided. The structure of each view is similar to that of its base table, but also contains some additional columns to indicate which workspace a set of changes had been made from. The name of each multi-workspace view uses the convention TABLENAME_MW (the MW indicating multi-workspace). You can now query from the ACCOUNTS_MW view along with the live version of the ACCOUNTS table to see a side-by-side comparison of each version of the data:

select s.salesrep_nbr, s.last_name, live.trx_total live, ws1.trx_total workspace1
FROM salesrep s,
     (select a.salesrep_nbr, sum(t1.trx_total) trx_total
	 from accounts a, annual_trx t1
	 where a.account_nbr = t1.account_nbr
	 group by a.salesrep_nbr) live,
     (select amw.salesrep_nbr, sum(t2.trx_total) trx_total
	 from accounts_mw amw, annual_trx t2
	 where amw.account_nbr = t2.account_nbr
	 group by amw.salesrep_nbr) ws1
where s.salesrep_nbr = live.salesrep_nbr and
      s.salesrep_nbr = ws1.salesrep_nbr

While this query may look somewhat intimidating, it is actually not that complex. The two inline views in the FROM clause, LIVE and WS1, each generate the revenue per salesrep as it appears in different workspaces. WS1 shows the revenue per salesrep as it can be seen from within WORKSPACE1, while LIVE shows the revenue per salesrep as the rest of the user population sees it. The two inline views are then joined back to the SALESREP table to perform a simple lookup to get the LAST_NAME column using the SALESREP_NBR. Here's the results of this query:

ORA92> select s.salesrep_nbr, s.last_name, live.trx_total live, 
              ws1.trx_total workspace1
  2  FROM salesrep s,
  3       (select a.salesrep_nbr, sum(t1.trx_total) trx_total
  4   from accounts a, annual_trx t1
  5   where a.account_nbr = t1.account_nbr
  6   group by a.salesrep_nbr) live,
  7       (select amw.salesrep_nbr, sum(t2.trx_total) trx_total
  8   from accounts_mw amw, annual_trx t2
  9   where amw.account_nbr = t2.account_nbr
 10   group by amw.salesrep_nbr) ws1
 11  where s.salesrep_nbr = live.salesrep_nbr and
 12        s.salesrep_nbr = ws1.salesrep_nbr
 13  /

SALESREP_NBR LAST_NAME LIVE WORKSPACE1
------------ ------------ ---------- ----------
100 SMITH 15523.04 15523.04
200 JONES 54610.75 48965.25
300 JOHNSON 45328.01 45328.01
400 KIRBY 5464.04 11109.54

The workspaces feature available in Oracle9i and higher provides the powerful ability to generate multiple versions of a set of data without affecting the integrity of the original version. This allows users to generate 'what-if' scenarios based on actual reported performance statistics, which can greatly enhance a company's ability to make better strategic decisions. Workspaces also greatly simplify administration requirements by eliminating the need to generate multiple copies of different sets of data to support various reporting requirements.

Comments

This article is very informative and of great value.

Good effort to make people aware of advanced functionality provided by the Oracle. But i fear this suits more to an OLAP(DSS) system then to any OLTP. Certainly this option adds a lot of value to any DSS system.