Archive for July, 2008
A foreign key can be used to implement table design
patterns that span multiple tables. By choosing how
a foreign key handles a DELETE attempt on the parent
table, you can structure your table designs to
follow two standard patterns.
Welcome to the Database Programmer blog. This series
of essays is for anybody who wants to learn about
databases on their own terms. There is a complete
>Table of Contents, as well as a summary of
>Table Design Patterns. There is a new essay in
this spot each Monday morning.
A Simple Example of Two Foreign Keys
Picture a basic shopping cart, with its two basic tables
of CART and CART_LINES (or ORDERS and ORDER_LINES if you
are more old-fashioned). The table CUSTOMERS is also
in there as a parent to CARTS. Our three tables look
something like this:
CUSTOMERS | | /|\ CART Cart is child of customers | | /|\ CART_LINES Lines is child of Cart
There are two foreign keys here. CART has a foreign key
to CUSTOMERS, and CART_LINES has a foreign key to CART,
but the two foreign keys should behave very differently.
Table Types and Table Design Patterns
>A Sane Approach To Choosing Primary Keys we saw
that table design begins with identifying the basic
kinds of tables: Reference and Small Master Tables,
Large Master Tables, Transactions, and Cross-References.
Just as we picked different kinds of primary keys
for the different tables, so will we pick different
kinds of foreign keys between these tables.
Deleting a Customer
Imagine you have a customer who has made 10 orders
in 2 years. A system administrator, who is allowed
to basically do anything, goes into your admin
screens, looks up the customer, and clicks [DELETE].
What should happen?
The near-universal answer is that the user should
be denied the action. An error should come back that
says “That customer has orders, cannot delete.”
We want it this way because we never want to delete
any parent row and “orphan” the child rows.
Database programmers know from long experience that
if you allow the DELETE, your queries will give incorrect
results, or you will work extremely hard with lots
of weird LEFT JOINS and UNIONS trying to
get them to come back correctly.
This is not an issue of “flexibility”, where a more
robust system would allow the deletion. This is a
basic question of record-keeping. If the customer has
orders on file then the customer must be kept on file.
Enforcing this rule keeps code clean and simple, and
trying to avoid this rule in the name of “flexibility”
just makes heaps of work for everybody.
Going further, the administrator in question, who
supposedly can do anything, may not violate the rule.
An administrator is simply somebody who can do anything
that would not produce bad data. Administrators
should not be given the ability to violate the basic
structure of the data, they simply have full
rights to do anything within the structure of the
The DELETE RESTRICT Foreign Key
The behavior we want here is called DELETE RESTRICT.
On most database servers this is the default
behavior for a foreign key. It means that you cannot
delete a parent table row if there are matching
rows in the child table.
The DELETE RESTRICT pattern is almost universally used
when the child table is a transaction table and the
parent table is a master table or reference table.
The syntax looks something like this:
-- Most database servers implement DELETE RESTRICT -- by default, so this syntax: Create table CART ( customer integer REFERENCES customers ,order integer..... ) -- ...is the same as this explicit syntax: Create table CART ( customer integer REFERENCES customers ON DELETE RESTRICT ,order integer..... )
Deleting An Order and DELETE CASCADE
Now let us say a staff member is on the phone with
a customer, enters an order, enters five lines,
and then the customers says “forget it” and the user
needs to delete the entire order from the CART.
In this case the user wants to go delete the order,
and he expects the computer to also delete the
lines. This makes perfect sense, why keep the
lines if we don’t want the order?
It may seem strange that in the case of deleting
a customer it makes perfect sense to stop the user,
but when deleting an order it makes perfect sense
to delete the lines as well.
The difference is that an entry in the CART table
is a transaction entry. When a user deletes a
transaction they almost always want to automatically
delete all of the relevant rows from all child tables
as well. The two rules basically are:
- The user cannot delete a master entry that
- Deleting a transaction means deleting the
NOTE: By “transaction” here I mean financial transaction
or other interaction between master elements. I do not
mean a database transaction.
The syntax for DELETE CASCADE looks something like this:
-- if the user deletes a row from CART, -- do them the favor of deleting all of the -- lines as well Create table CART_LINES ( order integer REFERENCES CART ON DELETE CASCADE ,order_line integer.... )
Conclusion: Different Tables Types, Different Foreign Key Types
I have said many times in these essays that the foreign key
is the only meaningful way to connect data in different
tables. This week we have seen that the kind of foreign
key you choose depends on what kind of tables you are
connecting together. Children of master tables generally
get DELETE RESTRICT, and children of transaction tables
generally get DELETE CASCADE.
A history table allows you to use one table to track
changes in another table. While the basic idea is
simple, a naive implementation will lead to bloat
and will be difficult to query. A more sophisticated
approach allows easier queries and can produce not
just information about single rows, but can also
support aggregrate company-wide queries.
This week in the Database Programmer Blog we return to
table design patterns with an essay on history tables.
The basic premise of this blog is that good coding skills
do not lead magically to good database skills — you can
only make optimal use of a database by understanding it
on its own terms. There is a new essay each Monday,
and there is a >Complete Table of Contents and a >List of Table Design Patterns.
What to Put Into A History Table
Naive approaches to history tables usually involve making
a complete copy of the original (or new) row when
something changes in the source table. This turns out
to be of little use, for reasons I will explain below.
A much more useful approach is to track only a few
columns and to store any combination of old values,
new values, and differences. A history table designed
this way can be tremendously useful.
We will start with the example of a sales order table,
called ORDERS. The columns we are interested in might
look like this:
ORDER | CUSTOMER | DATE | LINES | TAX | TOTAL | PAID | BALANCE ------+----------+----------+--------+-------+--------+--------+--------- 1234 | 9876 | 5/1/08 | 48.00 | 5.00 | 53.00 | 0 | 53.00 2345 | 9876 | 5/3/08 | 150.00 | 0 | 150.00 | 150.00 | 0 3456 | 5544 | 6/8/08 | 25.00 | 2.60 | 27.60 | 15.00 | 12.60 4567 | 3377 | 7/3/08 | 125.00 | 7.00 | 132.00 | 50.00 | 82.00
We first have to ask which columns must be copied into history
so that we can link the history table back to the ORDERS table.
The only column we need for tracking is ORDER (the order
the history table will always have an ORDER column.
We should also assume that the history table will contain at least
a timestamp and a column to track the user who made the change,
which brings us to a minimum of three columns.
Finally, it tends to be very useful to track what action
caused the history entry, be it an INSERT, UPDATE, or DELETE.
This brings us up to four minimum columns.
Next we ask which columns we will definitely not need. There
are two groups of columns we will not need, which are
1) the columns that never change and 2) the columns we do not
care about. Columns that do not change are likely to be
the CUSTOMER and the DATE column. There is no need to bloat
the history table with these valus because we can just get
them out of the ORDERS table. The second group, columns we
do not care about, are are usually things like ship-to address,
maybe an email, and other information. Naturally there is
no hard-and-fast rule here, it depends entirely upon the
needs of the application.
So now we know what we definitely need and what we definitely
do not need, and we are ready to begin work considering
the columns that will change. Not surprisingly, these are
usually all about the numbers. Next we will see how to
track the numbers.
Tracking Changes to Numbers
While it is certainly useful to store one or both of the
old and new values for a number, it far more useful to
store the change in the value, or the delta.
Having this number in the history table makes for some
really nifty abilities. If you store all three of the
old, new, and delta, then you can more or less
find out anything about the ORDER’s history with very
So we are now ready to consider what the history table
might look like. We will take the case of an order that
was entered by user ‘sax’, updated twice by two other
users, and in the end it was deleted by user ‘anne’.
Our first stab at the history table might look like this:
ORDER | USER_ID | ACTION | DATE | LINES_OLD | LINES_NEW | LINES_DELTA ------+----------+--------+---------+-----------+-----------+------------- 1234 | sax | UPD | 5/1/08 | 0.00 | 48.00 | 48.00 1234 | arkady | UPD | 5/7/08 | 48.00 | 58.00 | 10.00 1234 | ralph | UPD | 6/1/08 | 58.00 | 25.00 | -33.00 1234 | anne | DEL | 6/4/08 | 25.00 | 0.00 | -25.00
I should note that if you keep LINES_OLD and LINES_NEW, then
strictly speaking you do not need the LINES_DELTA columns.
Whether or not you put it in depends on your approach to table
design. If you framework allows you to guarantee that it will
be correct, then your queries will be that much simpler with
the LINES_DELTA column present.
You may wonder why there is no entry for the original INSERT.
This is because you must enter an order before you can
enter the lines, so the original value will always be zero.
Only when lines start going in does the ORDER get any
numbers. This is true for header tables, but it would not
be true for detail tables like ORDER_LINES_HISTORY.
Some of the Obvious Queries
There are few obvious queries that we can pull from the
history table right away. These include the following:
-- Find the value of of the line items of an -- order as of June 1st SELECT LINES_NEW FROM ORDERS_HISTORY WHERE ORDER = 1234 AND DATE <= '2008-06-01' ORDER BY DATE DESC LIMIT 1; -- Find the original value of the line items, -- and the user who entered it. SELECT LINES_NEW, USER_ID FROM ORDERS_HISTORY WHERE ORDER = 1234 ORDER BY date LIMIT 1; -- Find the users who have worked on an order SELECT DISTINCT USER_ID FROM ORDERS_HISTORY WHERE ORDER = 1234;
Most of queries should be pretty obvious, and there
are plenty more that will suggest themselves once you
start working with the history tables.
Queries Involving the Delta
The real power of the DELTA column comes into play
when you are trying to compute back-dated values
such as the company's total open balance on
June 1, 2008. If you have a naive history table that
stores only the old value or only the new value, this
is truly a tortuous query to write, but if you have both
then it is really quite easy.
-- Query to calculate the total open balance of all -- orders as of a given date SELECT SUM(BALANCE_DELTA) FROM ORDERS_HISTORY WHERE DATE <= '2008-06-01';
This magical little query works because paid orders
will "wash out" of the total. Consider an order that
is entered on May 20 for $200.00, and is then paid
on May 23rd. It will have +200 entry in the
BALANCE_DELTA column, and then it will have a -200.00
entry 3 days later. It will contribute the grand sum
of zero to the total.
But an order entered on May 25th that has not been
paid by June 1st will have only a +200 entry in
the BALANCE_DELTA column, so it will contribute the
correct amount of $200.00 to the balance as of
If the company owner wants a report of his total
open balances on each of the past 30 days, you can retrieve
two queries and build his report on the client:
-- Get begin balance at the beginning of the period SELECT SUM(BALANCE_DELTA) as BEGIN_BALANCE FROM ORDERS_HISTORY WHERE DATE < '2008-06-01'; -- Get the total changes for each day. When you -- build the report on the client, add each day's -- change amount to the prior day's balance SELECT SUM(BALANCE_DELTA) AS BALANCE_DELTA FROM ORDERS_HISTORY WHERE DATE BETWEEN '2008-06-01' AND '2008-06-30' GROUP BY DATE;
Keeping History Tables Clean
A clean history table is one that contains no unnecessary
information. You normally do not want entries going
into the history table if nothing relevant changed.
So your history table mechanism should examine the
columns it is tracking, and only make an entry to the
history table if one of the columns of interest actually
Problems With The Naive History Table
A very basic history table will usually
copy the entire original row from the source table
into the history table whenever an INSERT, UPDATE
or DELETE occurs. One simple problem is that you
end up with bloated history tables. Because they are
cluttered with unnecessary repititions, they are difficult
to work with by inspection.
A much more serious technical problem with the naive
approach is that it is horribly difficult to produce
the queries demonstrated above. You must reproduce
the concept of a delta by either running through all
of the rows on the client, or you must make a difficult
(and often impossible) JOIN of the history table
to itself in which you connect each row to the
row that came just before it.
All I can say is, no thanks, I'll go with the delta.
History Table Security
History tables always involve some concept of
auditing, that is, keeping track of user actions.
This means we need to protect against deliberate
falsification of the history tables, which leads
to two rules. First, a user must have no ability
to directly DELETE rows from the history table,
or they could erase the record of changes.
Second, the user must have no ability to
directly INSERT or UPDATE existing rows, because
if they could they can falsify the history. These
rules apply to both regular users and system
administrators, the administrator must have no
privelege to subvert or manipulate the history.
Since history tables have a tendency to become
seriously bloated, there must be some priveleged
group that can DELETE from the history tables,
which they would do as a periodic purge operation.
This group should have no ability to
UPDATE the tables, because such priveleges would open
a potential hole for subverting the history.
Regular system administrators should not be in
this group, this should be a special group whose only
purpose is to DELETE out of the history tables.
If you are making use of DELTA columns, then stricly
speaking you do not want to purge, but compress
history tables. If you want to purge out all entries
in 2005, you must replace them with a single entry
that contains a SUM of the DELTA columns for all
So to sum up, we have the following security rules
for a history table:
- No system user should be able to DELETE from the
- No system user should be able to UPDATE the
- No system user should be able to directly control
the INSERT into the history table.
- A special group must be defined whose only ability
is to DELETE from the history table, so that the
tables can be purged (or compressed)
from time to time.
As always, you have your choice of implementing the
history mechanism in the client code or in the database
The best performing and most secure method is to
implement history tables with triggers on the source
table. This is the best way to implement both
security and the actual business rules in one
encapsulated object (the table).
However, if you have no current practices
for coding server-side routines, or you do not have a
data dictionary system that will generate the code for
you, then it may not be practical to go server-side
for a single feature.
Implementing history tables in code has the usual
benefit of keeping you in the language and habits
you are most familiar with, but it means that you cannot
allow access to your database except through your
application. I cannot of course make a general rule here,
this decision is best made by the design team based
on the situation at hand and anticipated future needs.
History tables have many uses. Beyond the obvious
first use of finding indidivual values at some point
in the past, well crafted tables can produce
company-wide aggregations like total open balances
on a given day, changes in booked orders on a day
or in a range of days, and many other queries along
those lines. Security is very important to prevent
history tables from being subverted.
Lightshow Magic is a Dreamweaver Extension that enables you to instantly create professionally designed interactive presentations—utilizing images, text content. or both. Lightshow Magic is revolutionary in its refinement – going leaps and bounds beyond ordinary presentation tools. The state of the art just got redefined.
When a desktop programmer tries to write database applications
for the browser, he faces a great many challenges, both technical
and cultural. Both sets of challenges appear because the browser
and the web were invented for purposes different than our own.
On the technical side we must reinvent huge amounts of functionality
that we got “for free” with the old desktop systems of Foxpro, Delphi,
VB Classic and so on, and on the cultural side we must wade through
mountains of irrelevant or downright damaging advice that is aimed
at people working on the next version of Facebook or eBay.
In this essay we look at as many of these challenges as I can
Why A Desktop Developer Would Move to The Web
When the browser first appeared, it totally lacked the technical
powers required to replace desktop applications. Nevertheless,
some programmers immediately began to
ponder how to move into the world of the browser. The reasons were
simple then, are simple now, and have not changed:
- Far easier deployment — nothing to install.
- Worldwide access — businesses with multiple locations are
suddenly much easier to take care of.
- You could now create a public website and give customers and
vendors limited access to certain information.
- Operating System independence. This is far more of a reality
now than we dreamed it might be in the darkest days of
Microsoft’s Total World Domination, but there were visionaries
early on who saw the possibilities.
So there are many programmers, and I am one of them, who continue to
work on the same kinds of applications we did before the web existed,
but who now deploy these applications in the browser, for the reasons
listed above. Here now is our tale of woe and sorrow!
The Cultural Divide
While desktop programmers were scratching their heads
and trying to figure out how to fit into
this new world, a new
generation of programmers was growing up who were
perfecting this new platform and developing applications
that were undreamed of before. Unfortunately, some of
the good advice they dreamed up is either irrelevant
or counter-productive to the database programmer who
is deploying to the web.
The driving reality for the database application programmer
is that her users are not surfing. They are using
a dedicated program written for the purposes of the business
they work for. Most of what the browser can do is either
not necessary or positively in their way, and the browser
lacks productivity tools that they took for granted
in “the old system.” This fact is central to the
cultural divide between application programmers and
The Infamous Back Button Problem
If I surf over
to www.osnews.com and click on an article, when
I am finished I click “BACK” once or twice until I’m back
to osnews, and then pick another article. But to the
application user, who is not surfing but is using
a dedicated program, who has clicked “New Patient”, typed in
the info, and clicked “Save”, the back button is a positive
menace. It is misleading and dangerous. This has led
to who-knows-how-many rants from web programmers telling
application programmers, “You don’t understand the web,
you shouldn’t write it that way,” in which the desparate
application programmer replies only, “but you don’t
understand, I must have it work this way.” The simple
fact is that when a user is modifying data in a browser
there is no concept of BACK. There may “UNDO” or “REVERT”,
but once the data is saved it is saved. This is why
application programmers resort to trying to hide or disable
the button, or why they think they should be able to modify
the history (which of course they cannot do because that would
be a huge security hole for public sites).
Ajax only Makes The Back Button Worse
user on the customers screen, who then goes to the menu
and picks the vendors screen. They work for five
minutes on the vendors screen, and their wonderful snappy
AJAX application is fetching search results and navigating
from row to row and saving changes. Then they decide they
made a mistake and hit [BACK] and wham! they’re on the
customer screen! It seems that the better the applications
become, the worse the [BACK] button becomes. In my own
shop we have finally decided to have the login program
pop up a new window which does not have the [BACK]
button or the address bar. This is considered heresy
by web programmers (you don’t understand the web! they
cry) but of course what is true for them is not true
for us, and vice-versa.
This also leads to much work. We must provide for
such features as UNDO with no native support in
the browser, and worse, with whatever native support
the browser does have been intended for something
This is dead wrong for the application programmer.
Application programmers have a power that is
totally outside the experience of a pure web
programmer: we can dictate system requirements to
the customer. This led to many unhappy problems
before the web, but with Firefox (and firebug!) we
now have a platform that is free and robust. We
simply install Firefox (or instruct the IT
department to do so) and we have a platform that
we know will support our application.
Nothing illustrates the divide between the web
and the desktop like keyboard shortcuts.
When Windows 95 swept the
office world (but before the web really came into
its own), programmers developed a new term for
applications that required constant use of the
mouse: we called them “mousetraps”. The worst
kind of mousetrap program requires the user to
constantly lift their hand from the keyboard and
go to the mouse, then back again. This is fatiguing,
confusing, and terribly counter-productive for the
But the real problem is that the browser
was born a mousetrap. From the perspective
of the desktop programmer, keyboard shortcuts are
clearly an afterthought, a “red-headed stepchild”
as they say. Native HTML supports only the
ACCESSKEY attribute, and recently Firefox was changed
so that the default key combination is CTRL+ALT
instead of ALT. This small change led me to finally
realize that these folks, to put it mildly, have
never lived in my world and haven’t the slightest
clue what my users need. I could expect no help
from them on this front.
The solution for the web programmer is to
remember my users are not surfing, they are using
a dedicated program. Therefore it is the
Right Thing for the application programmer to
hijack the CTRL-N key and have it mean “New Patient”
(or New Customer, New Vendor, etc)
instead of opening a new browser window. Moroever,
he must kill the CTRL-N so that it does nothing
on a page where there is no [New] button.
If he does not, then sometimes CTRL-N
will create a new patient, and sometimes it will pop
up a new window with my.yahoo.com! So the application
programmer confidently rewires the “standard” browser
keys and has happier customers for his effort.
Technical Problems In the Browser
No Default Focus
Have you ever gone to a website where the first thing you
must do is log in, but the user id input does not have
focus? That is a sure sign that the page was written
by a web programmer with no desktop experience.
When you put a database program into the browser, you
expect the user to be typing constantly, looking things
up, adding information, and so forth. So the application
programmer must ensure that his first control always
receives focus. Call it petty if you like, but without
it your program becomes a mousetrap. Perfection comes
by concentrating on these small things that either
annoy or please users.
Tabbing Off to Mars
Default browser behavior is to allow the user to TAB
through controls in the order they were created.
This can be modified by explicitly assigning TABINDEX
attributes to the control. However, when you get to
the last control, the browser then Tabs you up to the
menu, or the address bar, or anywhere else.
In a business application, where the user is not
surfing the web, this is wrong. Tabbing out of
the content area is equilivant to exiting the
application, it throws the user into a context that
they do not need and (sad to say) do not understand.
When I first began deploying business apps in the
browser I would get calls saying, “it’s frozen” or
“i’m typing and nothings happening” and other such
mysterious claims. Once I observed the users I realized
they were “tabbing to Mars”, the focus was up on the
menu or in the address bar or somewhere else equally
irrelevant. So we created the idea of the “Tab Loop”,
so that when the user hits TAB on the last control it
loops back to the first. This completely ended
The Tower Of Babel
Desktop programmers have a luxury undreamed of by
web programmers: they can do all or nearly all of
their programming in a single language, like
classic Visual Basic, Foxpro, or Delphi (or heck,
COBOL or 4GL!). Most of
these programmers also know SQL, but it is not seen
as a burden to learn it, it is just part of the
But when the application programmer moves to the web,
he is confronted with at least four systems he must
grasp if he is to perform as a master crafstman:
server languages like
Ruby, PHP, Java, etc. These different technologies
all have syntaxes
and philosophies that are different from each other
and from past experience. All I can say is I’m
glad I made the effort but I sure as heck hope I
never have to make a change that dramatic again.
Let’s Not Talk About State
When an application programmer moves to the web,
he is confronted with the totally alien concept
that he cannot maintain state. This idea has been
discussed much in past years, and I suspect it may
not be the problem it once was, as most of us have
long since gotten past it. I did not want to leave
it out completely, but it is way too large an issue
to discuss in a paragraph and I doubt anyway that I
could add to the wisdom that is already out there.
The HTML SELECT element stinks. Every serious
application programmer either downloads a
replacement or writes his own replacement. In
the old world of the desktop we did not have to
do such things.
The final piece of the puzzle in my shop was
The irony of jQuery is that it seems to me
its core function is DOM traversal and
manipulation, but its elegant simplicity has
drawn creative minds to do things like create
really nice widgets for entering time.
In my shop we finished off our desktop-in-browser
framework by using jQuery extensively, this
solved lots of our lousy widgets problem.
In my case I set out on the task of doing browser-based
applications four years ago, when none of these
technologies existed. So I developed my own simple
browser-side framework. jQuery let me round off the
rough edges, and now I have no need of a
third-party framework. So I am afraid I cannot offer
any experience in the use of these others.
But even so, if I could use somebody else’s dedicated
work and put my efforts elsewhere, that would only
be wise, so I continue to watch them closely.
When Do Application Programmers Accept
Advice From Web Programmers?
Does all of this mean that we application programmers
can learn nothing from web programmers? Of course
not, that we be arrogant in the extreme.
The answer is that when we enter the world
of the public website, we must seek the advice and
guidance of the experts in that world. So when I
write the public portion of the application, the
part that is visible to customers, vendors and other
trading partners, I have to follow the standards of
should be optional, it should work on IE (arg),
and so on and so forth.
Conclusion: When Worlds Collide
The desktop application programmer who decides to
deploy business applications in the browser will
face two broad challenges: techical and culturual,
both of which stem from the origins of the web,
which are so different from the origins of the
desktop. The technical challenges tend to center
around features of the browser that are either
lacking or downright counter-productive, and this
is made worse because the advice a programmer will
receive comes from a culture whose goals are very
different from his own. The database programmer who
wants to deploy applications in the browser must
be prepared to reproduce a lot of features we took
for granted in the desktop, and he must also be
prepared to filter through the received wisdom
and throw out anything that does not meet the
needs of his end-users.
Addendum: After reading a few comments on ycombinator.com I should probably stress that the job of getting the application into the browser is completely doable (in fact I’ve done it myself). The browser can now easily handle the job of desktop applications. The OP lists the hazards I and many other have faced and overcome in getting there. A big part of the conclusion is that it is a lot easier to get done if you recognize why it seems so hard: which is that you may be getting advice from people whose goals are very different form yours.
Many database performance decisions come down to “pay me now or
pay me later.” Some decisions will produce faster inserts and
updates at the cost of slower and more complex reads,
while other decisions will
slow down inserts and updates but provide faster and easier
Welcome to the 30th regular post in the Database Programmer blog, there is a new post each Monday morning. This blog is for anybody who wants to see practical examples of how databases work and how to create lean and efficient database applications. There is a
>Complete Table Of Contents that is updated each week, and a
>Master list of table design patterns
that is updated whenever a new design pattern is presented.
Paying With Indexes
An index speeds up SELECT operations. We are not going to go
into any detail about how indexes work, this week we will
stick to how indexes affect performance.
Imagine a table of 10,000 sales orders. You wish to pull out
a handful of fields for orders placed on 5/1/08,
so you issue this SELECT:
SELECT customer,order_total FROM orders WHERE date = '2008-05-01'
If you do not have an index on the table, then the database
server will have to scan every single row in the table
to find the rows that match the WHERE clause. On the other
hand, if you had an index on the date column, the server would
first read the index to find pointers to the correct rows,
and then read only the rows you needed. The index itself
is optimized by various methods so that only a very few reads are
necessary to find the correct values. Most databases support
the following syntax:
CREATE INDEX some_unique_name ON orders (date)
As far as performance goes, an index will slow down write
operations (INSERT, UPDATE, and DELETE) because the index
must be updated when the write operation occurs. This cost
is on top of the write operation itself.
(Addendum added July 7: Jochen
correctly in his comment that this statement oversimplifies
things. While it is true that the index must be updated
for writes, the index can also dramatically speed up UPDATE
and DELETE operations if those operations use a WHERE
clause that can benefit from the index.)
In terms of “pay me now or pay me later”, when you regularly
add a lot of indexes you are opting to “pay me now.” You pay
the price of slower writes to get faster reads. If you regularly
avoid adding any indexes you are opting to “pay me later.”
You defer the costs of access to read time to get faster writes.
I should note that it is not possible to completely avoid
indexes, nor is there any value in trying to. For instance,
a primary key requires an index because otherwise you have
to scan the entire table every time you do an INSERT, which
is just plain crazy. Foreign keys benefit from indexes as
well for similar reasons.
Paying With Views
A “view” is a stored SQL statement that you can SELECT from
as if were a table. Imagine we have a table of
TEACHERS and a table of COURSES that they are teaching in
a particular year. We often need to display a list of
courses with the names of the teachers. We can do this with
a JOIN, but a view gives us an easier pre-defined way to
CREATE VIEW courses_teachers AS SELECT courses.room,courses.period,courses.teacher ,courses.year ,teacher.first_name,teacher.last_name FROM courses JOIN teachers ON courses.teacher = teachers.teacher
…which now lets you do the easier SELECT:
SELECT * FROM courses_teachers WHERE year='2008';
In terms of “pay me now or pay me later” a view is always a “pay me
later” decision. It makes for easier coding but the server must go
out on every SELECT and gather together the data required.
The “pay me later” nature of a VIEW meets its greatest extreme
when the view contains aggregations. Consider the following view
which gives you easy access to customers and their lifetime history
of orders and payments:
CREATE VIEW customers_extended AS SELECT customers.* ,SUM(orders.order_total) as orders_total ,SUM(invoices.balance) as balance FROM customers JOIN orders ON customers.customer = orders.customer JOIN invoices ON customers.customer = invoices.customer -- Pulling from the VIEW requires a complete read -- of relevant ORDERS and INVOICES tables SELECT * FROM customers_extended WHERE customer = X;
This view is a “pay me later” proposition because every time you
issue a SELECT from the view, it will have to scan many rows from
the ORDERS and INVOICES tables. The contrasting method is to
denormalize which is a “pay me now” approach.
Paying With Denormalization
- FETCH operations, where a value such as an item’s price is
copied from the ITEMS table into the ORDER_LINES table.
- EXTEND operations, where you take the QTY and the PRICE
columns in the ORDER_LINES table and write the EXTENDED_PRICE.
- AGGREGATE operations, such as writing the total of ORDER_LINES
onto the ORDERS table.
All of these operations fall into the “pay me now” category.
When these denormalized columns are put into tables, they
add to the the size of the table and increase the cost of
write operations. However, when it comes time to SELECT out
of the tables the values are all there ready to go, usually
with fewer JOINs and lower overall disk activity.
Extreme Pay Me Now
In my line of work I deal with line-of-business programs that
are commissioned or purchased by businesses to do their daily
work. User counts are low and resources are high, because
often I will have 10 users on a single server, with access
via internet limited to only a few thousand potential customers
of which very few are ever on at the same time.
In this context, I prefer to take the “pay me now” approach
to its fullest realization. This means I tend to design my
systems so that:
- Any column a user is likely to filter on has an index.
- Tables are fully denormalized, containing a wealth
of derived values.
This means that all write operations on my systems are slower
than they might otherwise be. However, this is more than
acceptable within this context because the server is
largely untaxed, and users do not notice the difference between
100ms and 200ms to save a row. So I can pay when the user does
not notice and as reward I have very rich reporting and
The extreme pay-me-now approach has one more advantage.
The wealth of derived values in the database lets end-users
find what they are looking for without calling a programmer
and asking for a special page or report. Generally the more
derived values there are the truer this becomes.
Extreme Pay Me Later
The extreme form of pay-me-later is a fully normalized database
with no derived values and a minimum of indexes. Calculated
values are available either in views, client-side code or both.
This type of database is tuned for lots of fast writes because
the cost of an INSERT or UPDATE has been kept to an absolute
minimum. The database will be slower to perform ad-hoc or
one-off queries because the server will have to do table scans
whenever a user filters on anything except primary keys and
The lack of derived values in fully normalized databases also
leads to more phone calls and emails asking the programmer
to create a report or page that will work out derived values
that are not present in the database.
Conclusion: Know Your Context
This week we have taken common database technologies such as
indexes and views and seen how they affect performance.
All of these technologies can be judged
in terms of the “pay me now or pay me later”
Database programmers normally choose to “pay me later”
when they must support a large number of simultaneous
write operations with a minimum of contention. These
situations call for fewer indexes and strict normalization.
The trade-off is that ad-hoc or one-off queries will
involve more JOINs, more table scans and an increased
likelihood the programmer will be called in for
When read operations are more common than writes, or
when inquiries and reports are likely to be unpredictable,
database programmers will choose to “pay me now” by
doing more work on the write operation. There will
be more indexes and more denormalized values, so that
the user is more likely to quickly locate whatever
they want without programmer intervention.