Pages

Thursday, January 17, 2008

Avoid NULLs in database

Joe Celko said it best: "NULLs confuse people..."

My sentiments exactly. Of course, I don't expect to convince you by flashing a few quotes from reputable authors. Let's talk for a minute about what exactly NULLs do that cause this type of reaction. The first problem is that the definition of NULL is "unknown." So, one problem is determining whether one value is (not) equal to another value, when one or both values are NULL. This trickles down to many problems for a database engine and any associated applications. The following list details some of those problems:
 they are interpreted differently depending on compatibility level and ANSI settings;

For example, let's consider two values, x and y, that are both NULL. Since the definition of NULL is unknown, then you can't say x = y. However, with the ANSI setting ANSI_NULLs, this can be different. When this setting is FALSE, x = y ... however, when TRUE, x <> y. Confusing, no?

 the storage engine has to do extra processing for each row to determine if the NULLable column is in fact NULL -- this extra bit can have performance implications;
 they produce weird results when using calculations, comparisons, sorting and grouping;
 they create problems with aggregates and joins, such as different answers for COUNT(*) vs. COUNT(fieldname);
 they produce unpredictable results in statistics computations, particularly WITH ROLLUP and WITH CUBE;
 applications must add extra logic to handle inserting and retrieving results, which may or may not include NULL values;
 they cause unpredictable results with NOT EXISTS and NOT IN subqueries (working backwards, SQL determines that NULL columns belong or do not belong to the result set, usually for the wrong reasons);
 no language that supports embedded SQL has native support for NULL SQL values.

1 comment:

Anonymous said...

I like this post..the fundametal rule in design is try to avoid nulls...