Hotsos Symposium Speaker – Mark Farnham

Mark Farnham is a returning Hotsos Symposium presenter.

Biography

Mark FarnhamOracle ACE, Mark W. Farnham, is a native of Dunkirk, NY where he attended public schools and enjoyed the rich musical and farming heritage of Chautauqua County. After graduating with a dual major in physics and biology from Dartmouth College in 1977, Mark became interested in computer science and database technology. Mark began using Oracle at DTSS in the late 1980's and helped Burlington Coat Factory Warehouse move from DTSS to Unix and Oracle from 1989 to 1994. Mark was one of the founders of the OAUG and participated in Oracle's VLDB group and MOSES. In 1994, Mark co-founded Rightsizing, Inc. with Jerry Ireland where Mark continues as president to this day. Rightsizing helps businesses make smart use of Oracle technology, and Mark often is called upon when high throughput and massive migrations are involved. Mark is a frequent contributor to oracle-l and is proud to be a member of the OakTable Network and co-founder of the APPSPERF consortium. Mark continues his service to the OAUG as a board member of the OAUG DBSIG and was awarded the OAUG Lifetime Service Award in 2010. Mark's handle on Twitter and YouTube is pudge1954, and he is pudge on OTN. Mark's written papers on performance began in 1981 when he developed a pruning algorithm to solve best fits of piecewise linear regressions by excluding large ranges of the potential solution space early. (Respiration Physiology Volume 47 Issue 1, January 1982, pp 97-106). Mark's public speaking on Oracle performance began at the 1990 IOUW in Anaheim, CA, speaking on managing many databases on many disk drives. Most recently, Mark presented at MOTS, OOW 2010, and RMOUG Training Days 2011, Hotsos Symposium 2011, Collaborate 2011, KSCOPE11, and OOW 2011.

Presentation Title

Managing Transactions with Disappearing Indexes

Abstract

"If a system receives information that drives transaction processing, it is sometimes necessary to follow the transaction detail through several stages of processing or through processing steps that may vary depending upon the source data. When processing of a given row is "complete", there are more efficient means than deletion to remove it from consideration in continuing passes of transaction processing. One method, using a "disappearing index", is particularly useful and efficient.

This method takes advantage of the fact that index column sets that become completely null are eliminated from the index, so that at a steady state a fast full scan of the index to find candidate rows for further processing remains fast and efficient even for a very large partition or table. When the statuses of interest for various phases of processing or inquiry are non-overlapping values, an even more efficient range scan can be used, and keeping the total index cardinality much less than the number of rows in the table minimizes the btree depth to find range starting and ending points.

The complexity required of the disappearing index column or column set depends on the complexity of processing required. Requirements for "first in, first out" (FIFO) processing, deadline processing, priority processing not based on time values, and parallel candidacy for multiple processes are among the complicating factors that illustrate different requirements for the nullable set of columns used to efficiently manage selection of candidates for further transaction processing. The cases described are by no means claimed to be exhaustive, but they illustrate a methodical way to choose the column set used to manage transactions.

A key point is that columns designed to be nulled to manage transactions sometimes contain initial values such as received date or a monotonically increasing transaction id that must not be discarded. At data model design time, it is important to identify such attributes and include both a column to contain the persistent value and a column to be nulled as part of the transaction processing.

When the time each step in processing must be tracked row by row is of enduring value, it is often useful to define default values meaning that stage of processing has not yet been reached, so that the overhead of managing a changing row length is avoided and each block in the transaction table can have a lower percentage of its space left free without risk of forcing row migration. In this way, it is often possible to design the transaction details such that rows never growth in length, and only change length at all when they shrink a small amount to reflect that the null value has been reached for the column set being used to manage transactions.

Of course, all these columns and physical planning for efficient transaction processing should be subordinated to good initial data modeling. Whether the entire design of the transaction control values should in fact be a separate relation linked by a primary key or some other means to the transaction data is a debate into which Mark does not intend to enter and which this paper and presentation does not address.

Conversely, with using the disappearing indexes to identify candidates for further transaction processing, all the relevant columns being null can be used in the processing of transaction data in bulk into partitions or history tables, so that entire blocks are processed an archived at a time, making the entire life cycle of the transaction more efficient.

In the context of Oracle RDBMS processing, at least from versions 7.0 through 11.2, I hope by means of this paper and presentation to:

  • Raise awareness of the importance that usually all null keys simply do not appear in Oracle indexes
  • Show how a final status of NULL can keep identification of rows needing work fast with a small index
  • Establish that the optimizer definitely makes use of indexes in service of "is not NULL" predicates
  • Show how ordered key ranges can be used to process first-in first-out transactions quickly
  • Show how to this is useful for overall information life cycle management

From a DBA perspective, it is sometimes possible to finesse existing program design relying on a non-NULL status value with a view that translates NULL to the desired final status value.

Presentation Materials

Presentation materials are available online to attendees only.

Schedule

The speaker schedule is as follows: