What's wrong with nullable columns in composite primary keys?

T

Tomalak

Primary keys are for uniquely identifying rows. This is done by comparing all parts of a key to the input.

Per definition, NULL cannot be part of a successful comparison. Even a comparison to itself (NULL = NULL) will fail. This means a key containing NULL would not work.

Additonally, NULL is allowed in a foreign key, to mark an optional relationship.(*) Allowing it in the PK as well would break this.

(*)A word of caution: Having nullable foreign keys is not clean relational database design.

If there are two entities A and B where A can optionally be related to B, the clean solution is to create a resolution table (let's say AB). That table would link A with B: If there is a relationship then it would contain a record, if there isn't then it would not.

I've changed the accepted answer to this one. Judging by the votes, this answer is the clearest to more people. I still feel that the answer by Tony Andrews explains the intention behind this design better; do check it out as well!

Q: When do you want a NULL FK instead of a lack of a row? A: Only in a version of a schema denormalized for optimization. In non-trivial schemas unnormalized issues like this can cause problems whenever new features are required. otoh, the web design crowd doesn't care. I would at least add a note of caution about this instead of making it sound like a good design idea.

"Having nullable foreign keys is not clean relational database design." - a null-free database design (Sixth normal form) invariably adds complexity, the space-savings gained are often outweighed by the extra programmer work needed to realise those gains.

what if it's an ABC resolution table ? with optional C

I tried to avoid to write "because the standard forbids it", as this really explains nothing.

T

Tony Andrews

A primary key defines a unique identifier for every row in a table: when a table has a primary key, you have a guranteed way to select any row from it.

A unique constraint does not necessarily identify every row; it just specifies that if a row has values in its columns, then they must be unique. This is not sufficient to uniquely identify every row, which is what a primary key must do.

In Sql Server a unique constraint that has a nullable column, allows the value 'null' in that column only once (given identical values for the other columns of the constraint). So such a unique constraint essentially behaves like a pk with a nullable column.

I confirm the same for Oracle (11.2)

In Oracle (I don't know about SQL Server), the table can contain many rows where all the columns in a unique constraint are null. However, if some column(s) in the unique constraint are not null and some are null then uniqueness is enforced.

How does this apply to composite UNIQUE?

@Dims As with nearly anything else in SQL databases "it depends on the implementation". In most dbs a "primary key" is actually a UNIQUE constraint underneath. The idea of "primary key" is not really any more special or powerful than the concept of UNIQUE. The real difference is that if you have two independent aspects of a table that can be guaranteed UNIQUE then you don't have a normalized database by definition (you're storing two types of data in the same table).

z

zxq9

Fundamentally speaking nothing is wrong with a NULL in a multi-column primary key. But having one has implications the designer likely did not intend, which is why many systems throw an error when you try this.

Consider the case of module/package versions stored as a series of fields:

CREATE TABLE module
  (name        varchar(20) PRIMARY KEY,
   description text DEFAULT '' NOT NULL);

CREATE TABLE version
  (module      varchar(20) REFERENCES module,
   major       integer NOT NULL,
   minor       integer DEFAULT 0 NOT NULL,
   patch       integer DEFAULT 0 NOT NULL,
   release     integer DEFAULT 1 NOT NULL,
   ext         varchar(20),
   notes       text DEFAULT '' NOT NULL,
   PRIMARY KEY (module, major, minor, patch, release, ext));

The first 5 elements of the primary key are regularly defined parts of a release version, but some packages have a customized extension that is usually not an integer (like "rc-foo" or "vanilla" or "beta" or whatever else someone for whom four fields is insufficient might dream up). If a package does not have an extension, then it is NULL in the above model, and no harm would be done by leaving things that way.

But what is a NULL? It is supposed to represent a lack of information, an unknown. That said, perhaps this makes more sense:

CREATE TABLE version
  (module      varchar(20) REFERENCES module,
   major       integer NOT NULL,
   minor       integer DEFAULT 0 NOT NULL,
   patch       integer DEFAULT 0 NOT NULL,
   release     integer DEFAULT 1 NOT NULL,
   ext         varchar(20) DEFAULT '' NOT NULL,
   notes       text DEFAULT '' NOT NULL,
   PRIMARY KEY (module, major, minor, patch, release, ext));

In this version the "ext" part of the tuple is NOT NULL but defaults to an empty string -- which is semantically (and practically) different from a NULL. A NULL is an unknown, whereas an empty string is a deliberate record of "something not being present". In other words, "empty" and "null" are different things. Its the difference between "I don't have a value here" and "I don't know what the value here is."

When you register a package that lacks a version extension you know it lacks an extension, so an empty string is actually the correct value. A NULL would only be correct if you didn't know whether it had an extension or not, or you knew that it did but didn't know what it was. This situation is easier to deal with in systems where string values are the norm, because there is no way to represent an "empty integer" other than inserting 0 or 1, which will wind up being rolled up in any comparisons made later (which has its own implications)*.

Incidentally, both ways are valid in Postgres (since we're discussing "enterprise" RDMBSs), but comparison results can vary quite a bit when you throw a NULL into the mix -- because NULL == "don't know" so all results of a comparison involving a NULL wind up being NULL since you can't know something that is unknown. DANGER! Think carefully about that: this means that NULL comparison results propagate through a series of comparisons. This can be a source of subtle bugs when sorting, comparing, etc.

Postgres assumes you're an adult and can make this decision for yourself. Oracle and DB2 assume you didn't realize you were doing something silly and throw an error. This is usually the right thing, but not always -- you might actually not know and have a NULL in some cases and therefore leaving a row with an unknown element against which meaningful comparisons are impossible is correct behavior.

In any case you should strive to eliminate the number of NULL fields you permit across the entire schema and doubly so when it comes to fields that are part of a primary key. In the vast majority of cases the presence of NULL columns is an indication of un-normalized (as opposed to deliberately de-normalized) schema design and should be thought very hard about before being accepted.

[* NOTE: It is possible to create a custom type that is the union of integers and a "bottom" type that would semantically mean "empty" as opposed to "unknown". Unfortunately this introduces a bit of complexity in comparison operations and usually being truly type correct isn't worth the effort in practice as you shouldn't be permitted many NULL values at all in the first place. That said, it would be wonderful if RDBMSs would include a default BOTTOM type in addition to NULL to prevent the habit of casually conflating the semantics of "no value" with "unknown value".]

This is a VERY NICE answer and explains a lot about NULL values and it's implications through many situations. You, sir, have now my respect! Not even in college I got such a good explanation over NULL values inside databases. Thank you!

I support the main idea of this answer. But writing like 'supposed to represent a lack of information, an unknown', 'semantically (and practically) different from a NULL', 'A NULL is an unknown', 'an empty string is a deliberate record of "something not being present"', 'NULL == "don't know"', etc are vague & misleading & really only mnemonics for absent statements re how NULL or any value is or can or was intended to be used--per the rest of the post. (Including in inspiring the (bad) design of SQL NULL features.) They don't justify or explain anything; they should be explained & debunked.

C

Cogsy

NULL == NULL -> false (at least in DBMSs)

So you wouldn't be able to retrieve any relationships using a NULL value even with additional columns with real values.

This sounds like the best answer, but I still don't understand why this is prohibited upon primary key creation. If this was just a retrieval problem, you could use where pk_1 = 'a' and pk_2 = 'b' with normal values, and switch to where pk_1 is null and pk_2 = 'b' when there are nulls.

Or even more reliably, where (a.pk1 = b.pk1 or (a.pk1 is null and b.pk1 is null)) and (a.pk2 = b.pk2 or (a.pk2 is null and b.pk2 is null))/

Wrong answer. NULL==NULL -> UNKNOWN. Not false. The catch is that a constraint is not considered violated if the outcome of the test is UNKNOWN. This often makes it SEEM as if the comparison yields false, but it really doesn't.

R

Rami Ojares

The answer by Tony Andrews is a decent one. But the real answer is that this has been a convention used by relational database community and is NOT a necessity. Maybe it is a good convention, maybe not.

Comparing anything to NULL results in UNKNOWN (3rd truth value). So as has been suggested with nulls all traditional wisdom concerning equality goes out the window. Well that's how it seems at first glance.

But I don't think this is necessarily so and even SQL databases don't think that NULL destroys all possibility for comparison.

Run in your database the query SELECT * FROM VALUES(NULL) UNION SELECT * FROM VALUES(NULL)

What you see is just one tuple with one attribute that has the value NULL. So the union recognized here the two NULL values as equal.

When comparing a composite key that has 3 components to a tuple with 3 attributes (1, 3, NULL) = (1, 3, NULL) <=> 1 = 1 AND 3 = 3 AND NULL = NULL The result of this is UNKNOWN.

But we could define a new kind of comparison operator eg. ==. X == Y <=> X = Y OR (X IS NULL AND Y IS NULL)

Having this kind of equality operator would make composite keys with null components or non-composite key with null value unproblematic.

No, the UNION has recognized the two NULLs as non-distinct. Which is not the same thing as "equal". Try UNION ALL instead and you'll get two rows. And as for the "new kind of comparison operator", SQL already has it. IS NOT DISTINCT FROM. But that by itself is not sufficient. Using this in SQL constructs such as NATURAL JOIN, or the REFERENCES clause of a foreign key, will require yet additional options on those constructs.

Aha, Erwin Smout. Truly a pleasure to meet you also on this forum! I was not aware of SQL's "IS NOT DISTINCT FROM". Very interesting! But it seems that it is exactly what I meant with my made-up == operator. Could you explain me why you say that: "that by itself is not sufficient" ?

The REFERENCES clause builds on equality, by definition. A kind of REFERENCES that matches a child tuple/row with a parent tuple/row, based on the corresponding attribute values being NOT DISTINCT instead of (the stricter) EQUAL, would require the ability to specify this option, but the syntax doesn't allow it. Ditto for NATURAL JOIN.

In order for a foreign key to work the refered must be unique (ie. all values must be distinct). Which means that it could have a single null value. All the null values could then refer to that single null if the REFERENCES would be defined with the NOT DISTINCT operator. I think that would be better (in the sense of more useful). With JOINs (both outer and inner) I think the strict equals is better because the "NULL MATCHES" would multiply when nulls on the left side would match all nulls on the right side.

A

Adriaan Davel

I still believe this is a fundamental / functional flaw brought about by a technicality. If you have an optional field by which you can identify a customer you now have to hack a dummy value into it, just because NULL != NULL, not particularly elegant yet it is an "industry standard"

What's wrong with nullable columns in composite primary keys?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Links

Contact US