Databases & Spools and Writing Your Wrongs

Quite a bit of Relational Databases governing principles were developed in the California Bay Area.

IBM has the Santa Theresa Lab in San Jose, Oracle is in Redwood Shores, Sybase in Emeryville before relocating to Pleasanton, Informix in Menlo Park, and Tandem in Cupertino.

A Story was told that in the early days, a bug ticket came in to the IBM Engineers working on System R:

Raises were earmarked for employees earning less than a certain salary, but during the actual running of the job the system kept granting the identified employees salary increases.

The employees traced the problem back to the fact that in cases where the underlying Row Location changes, the system kept granting raises until the employee’s salary passed the said limit.

As always, Wikipedia has a better explanation:

http://en.wikipedia.org/wiki/Halloween_Problem
In computing, the Halloween Problem refers to a phenomenon in databases in which an update operation causes a change in the physical location of a row, potentially allowing the row to be visited more than once during the operation. This could even cause an infinite loop in some cases where updates continually place the updated record ahead of the scan performing the update operation.

The potential for this database error was first discovered by Don Chamberlin, Pat Selinger, and Morton Astrahan in 1976, on Halloween day while working on a query that was supposed to give a ten percent raise to every employee who earned less than $25,000. This query would run successfully, with no errors, but when finished all the employees in the database earned at least $25,000, because it kept giving them a raise until they reached that level. The expectation was that the query would iterate over each of the employee records with a salary less than $25,000 precisely once. In fact, because even updated records were visible to the query execution engine and so continued to match the query’s criteria, salary records were matching multiple times and each time being given a 10% raise until they were all greater than $25,000.

Continuing Wikipedia’s write up:

The name is not descriptive of the nature of the problem but rather was given due to the day it was discovered. As recounted by Don Chamberlin,

Pat and Morton discovered this problem on Halloween …. I remember they came into my office and said, ‘Chamberlin, look at this. We have to make sure that when the optimizer is making a plan for processing an update, it doesn’t use an index that is based on the field that is being updated. How are we going to do that?’ It happened to be on a Friday, and we said, ‘Listen, we are not going to be able to solve this problem this afternoon. Let’s just give it a name. We’ll call it the Halloween Problem and we’ll work on it next week.’ And it turns out it has been called that ever since.

To solve problems such as this, some database engines use Spools to locally cache the data and process data from that cache rather than return back to fetching data from the original data-set.

It is over 35 years later and because these guys were entrusted with systems with zero tolerance for in-inaccuracies, businesses after businesses came to trust and rely upon it.

Enterprise companies such as IBM\Informix, Oracle\MySQL, Microsoft, and SAP\Sybase continue to earn and re-invest billions of dollars each year.

Start-up companies springing from Academia (Berkeley and Stanford) continue to receive millions of dollars from Sand Hill Venture Capitalists.

We have the NoSQL evangelist talk up eventual consistency and treat failure as first-class citizen and thus write out N (>= 3) copies of the same data.  Of course, one has the entire network of data clusters, as well.

And, in the same season, In-Memory databases, are pushing the envelope.  As are memchached systems.

Michael Stonebraker who spent many years lecturing at Berkeley and tutored many Industry luminaries has a few You-Tube videos where he discusses databases (Traditional / NoSQL / NewSQL).

And, along the ways he talks about professional intermediaries, how the Internet has disseminated data acquisition and how databases are being deluded.  And, he touches on the complexity of database systems; the big ones being latching, recovery, buffer pools, and locking.

It goes without saying that just like his database peer, Larry Ellison, Michael Stonebraker says what he thinks.

Here is one of Larry’s many quotations:

(Ellison isn’t dainty about the way he does it. He put up one chart about reliability.) “If you really want 50 machines that break all the time and require lots of labor, go with the guys in Redmond,” he told the audience, referring to Microsoft’s Redmond, Wash., headquarters. Then he alleged that IBM followed Microsoft’s technology path. “It must have been the unreliability that appealed to them,” he said of IBM.

 

Here are some of Michael Stonebraker’s YouTube discussions:

Industry Response

SAP Response

http://en.wikipedia.org/wiki/SAP_HANA
SAP HANA is a synthesis of a set of technologies which includes the MaxDB database and TREX search engine.  It started from a research at the Hasso Plattner Institute in 2006.  In 2008, SAP AG had teams working with Hasso Plattner Institute and Stanford University demonstrate an application architecture for real-time analytics and aggregation, mentioned as “Hasso’s New Architecture” in SAP executive Vishal Sikka’s blog. Before the name HANA settled in, people referred to this product as New Database.

The product was officially announced in May 2010. In November 2010, SAP AG announced the release of SAP HANA 1.0, an in-memory appliance for business applications and business Intelligence allowing real-time response.  The first product shipped in December 2010.

Microsoft

Hekaton Breaks Through
http://research.microsoft.com/en-us/news/features/hekaton-122012.aspx

“In traditional models, the assumption is that data lives on disk and is stored on disk pages,” Larson explains. “This creates a lot of overhead when you try to access records. When data lives totally in memory, we can use much, much simpler data structures. Hekaton’s index data structures and storage structures are optimized on the basis that when a table is declared memory-optimized, all of its records live in memory.” – Paul Larson, principal researcher with the Database Group

Hekaton Breaks Through
http://research.microsoft.com/en-us/news/features/hekaton-122012.aspx

The Hekaton team also found that multiversion concurrency control (MVCC) proved robust in scenarios with higher workloads and higher contention. Databases designed with single-version concurrency control handle updates by overwriting data with each change. MVCC techniques handle updates by marking old data as obsolete, then adding a newer version. At any time, there could be multiple versions of the data, but only one is the latest. The big benefit is that update transactions can add new versions without interfering with concurrent read activity.

Hekaton implements an entirely new, optimistic MVCC approach that enables a transaction to be processed without blocking. The paper describing this work, High-Performance Concurrency Control Mechanisms for Main-Memory Databasesby Larson; Spyros Blanas and Jignesh M. Patel of the University of Wisconsin-Madison; Diacon;, Freedman; and Zwilling—was presented in August during the 38th International Conference on Very Large Databases.

My Take

My read is that it is quite extraordinaire to hear Database back-end guys talk about all the ills/non-performance of their code-lines.  I mean these guys are not supposed to talk to people, yes machines and themselves, but not be involved in a conversation that extend to anyone outside of self.

Jim Gray, still at sea, is probably docking that ship at San Francisco Bay, and making his way back to his Market Street office.

And, that young Rapper a few hundred miles down Highway 101, is quite right in saying that the only way you can right your wrongs is by writing them down.

Though a day early, here is a Friday drink to my hopes that you get to self call-out your ills.

References

Halloween Problem

In-Memory Databases

Hekaton

Memory Caches

Larry Ellison

Jim Gray

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s