Is a GUID unique 100% of the time?

C

Community

While each generated GUID is not guaranteed to be unique, the total number of unique keys (2128 or 3.4×1038) is so large that the probability of the same number being generated twice is very small. For example, consider the observable universe, which contains about 5×1022 stars; every star could then have 6.8×1015 universally unique GUIDs.

From Wikipedia.

These are some good articles on how a GUID is made (for .NET) and how you could get the same guid in the right situation.

https://ericlippert.com/2012/04/24/guid-guide-part-one/

https://ericlippert.com/2012/04/30/guid-guide-part-two/

https://ericlippert.com/2012/05/07/guid-guide-part-three/

Wouldn't they be called a UUID, then? ;)

A GUID is microsoft's specifica implementation of the UUID standard. So, it's both. Globally unique ID vs Universally unique ID.

Technically, it is not 2^128, because in a v4 GUID, you have one hex digit that will always be a 4 (effectively removing 4 bits), and two bits further on are also reserved. However, 2^122 valid V4 GUIDs still leaves about 5x10^36, which will do for me. and for you too. Each star will have to accept just about 1.1x10^14 GUIDs apiece.

If you're like me, then you'll want to know that 2^128 written out is approximately: 34,028,236,692,093,846,346,337,460,743,177,000,000. Statistically, if you calculated 1000 GUIDs every second, it would still take trillions of years to get a duplicate.

I just thought its funny to read it out so here have fun guys :) Thirty four undecillion twenty eight decillion two hundred thirty six nonillion six hundred ninety two octillion ninety three septillion eight hundred forty six sextillion three hundred forty six quintillion three hundred thirty seven quadrillion four hundred sixty trillion seven hundred forty three billion one hundred seventy seven million

B

Bura Chuhadar

If you are scared of the same GUID values then put two of them next to each other.

Guid.NewGuid().ToString() + Guid.NewGuid().ToString();

If you are too paranoid then put three.

You have to be very, very, very, very paranoid to append 3 GUIDs.

@harsimranb No... very, very, very, very paranoid is 6 GUIDs. Paranoid is one appended, very paranoid is two appended, etc.

@Suamere I have created a website for calculating your paranoid level jogge.github.io/HowParanoidAmI

@Jogge xD That is amazing, lol. After 9 9's 999999999 in your form, I think Paranoia will a-splode my Browser.

@Jogge your website crashed after I put that I am level 10,000 paranoid. Now I am even more paranoid

T

Tomalak

The simple answer is yes.

Raymond Chen wrote a great article on GUIDs and why substrings of GUIDs are not guaranteed unique. The article goes in to some depth as to the way GUIDs are generated and the data they use to ensure uniqueness, which should go to some length in explaining why they are :-)

I think Chen's article is referring to V1 of the GUID generation algorithm, which uses a MAC address & timestamp -- the current V4 uses a pseudo-random number instead: en.wikipedia.org/wiki/Globally_Unique_Identifier#Algorithm

A

Alex

As a side note, I was playing around with Volume GUIDs in Windows XP. This is a very obscure partition layout with three disks and fourteen volumes.

\\?\Volume{23005604-eb1b-11de-85ba-806d6172696f}\ (F:)
\\?\Volume{23005605-eb1b-11de-85ba-806d6172696f}\ (G:)
\\?\Volume{23005606-eb1b-11de-85ba-806d6172696f}\ (H:)
\\?\Volume{23005607-eb1b-11de-85ba-806d6172696f}\ (J:)
\\?\Volume{23005608-eb1b-11de-85ba-806d6172696f}\ (D:)
\\?\Volume{23005609-eb1b-11de-85ba-806d6172696f}\ (P:)
\\?\Volume{2300560b-eb1b-11de-85ba-806d6172696f}\ (K:)
\\?\Volume{2300560c-eb1b-11de-85ba-806d6172696f}\ (L:)
\\?\Volume{2300560d-eb1b-11de-85ba-806d6172696f}\ (M:)
\\?\Volume{2300560e-eb1b-11de-85ba-806d6172696f}\ (N:)
\\?\Volume{2300560f-eb1b-11de-85ba-806d6172696f}\ (O:)
\\?\Volume{23005610-eb1b-11de-85ba-806d6172696f}\ (E:)
\\?\Volume{23005611-eb1b-11de-85ba-806d6172696f}\ (R:)
                                     | | | | |
                                     | | | | +-- 6f = o
                                     | | | +---- 69 = i
                                     | | +------ 72 = r
                                     | +-------- 61 = a
                                     +---------- 6d = m

It's not that the GUIDs are very similar but the fact that all GUIDs have the string "mario" in them. Is that a coincidence or is there an explanation behind this?

Now, when googling for part 4 in the GUID I found approx 125.000 hits with volume GUIDs.

Conclusion: When it comes to Volume GUIDs they aren't as unique as other GUIDs.

Remember that Super Mario Bros 3 ad from the 80's? All those people yelling "Mario! Mario! Mario!" around the world upset the randomness of the universe a bit.

If you manually un-install Office 2010 with msiexec, it lists all the MSI GUID's of the office program. They all spell 0FF1CE. Seems like Microsoft have a fairly... loose... interpretation of how to generate a GUID ;)

These partition GUIDs were all created together at 2009-12-17 @ 2:47:45 PM UTC. They are unique to your machine, but putting "mario" as the node identifier is incorrect - it means they're not RFC-4122-compliant. Likewise, the 0FF1CE GUIDs fall under the "NCS backwards compatibility" section of RFC-4122, but it's unlikely that Microsoft is following the NCS rules for those values.

I knew it, the Nintendo Security Administration has compromised the random number generators.

maybe it's this same ball park as the name of the company making a mineral water (heard they lead the market) Evian. Spelled backwards gives Naive :-)

T

Tim

It should not happen. However, when .NET is under a heavy load, it is possible to get duplicate guids. I have two different web servers using two different sql servers. I went to merge the data and found I had 15 million guids and 7 duplicates.

This would only be true for v1 guids which uses MAC addresses (not machine name) as part of the GUID generation. The v4, which is the de facto STD no longer uses Mac addresses but a pseudo random number.

Guid.NewGuid always generates v4 GUIDs (and always has). Tim must have had extremely poor entropy sources.

Is that have ever been replicated? that's a huge problem if it's the case.

Same here while Importing very large Datasets. From about 10-100 Million you get duplicates from Guid.NewGuid

@StephanBaltzer No, that’s simply impossible. If this actually happened to you there was either a bug in your code which e.g. truncated GUIDs or which confused rows of data. In fact, it would be more likely that there’s a bug in the NewGuid implementation than that you’d really observe this collision without a bug. But so far no such bug has been reported so I’d bet a nontrivial amount of money that issue was in your code.

J

Jogge

Yes, a GUID should always be unique. It is based on both hardware and time, plus a few extra bits to make sure it's unique. I'm sure it's theoretically possible to end up with two identical ones, but extremely unlikely in a real-world scenario.

Here's a great article by Raymond Chen on Guids:

https://blogs.msdn.com/oldnewthing/archive/2008/06/27/8659071.aspx

This article is rather old and referring to v1 of GUIDs. v4 does not use hardware/time but a random number algorithm instead. en.wikipedia.org/wiki/Globally_unique_identifier#Algorithm

This link is broken

Here is the link: devblogs.microsoft.com/oldnewthing/20080627-00/?p=21823

R

Rob Walker

Guids are statistically unique. The odds of two different clients generating the same Guid are infinitesimally small (assuming no bugs in the Guid generating code). You may as well worry about your processor glitching due to a cosmic ray and deciding that 2+2=5 today.

Multiple threads allocating new guids will get unique values, but you should get that the function you are calling is thread safe. Which environment is this in?

Depending on the guid version you're using based on the specs. Some guids are time and mac addressed based. Meaning for V2 the guid would have to be generated on the same machine at the same picosecond. This is like throwing a bag of 1000 pennies into the air and they all land heads up in a stack on their sides. It is possible but unlikely to the point that it doesn't bear mentioning as a risk unless lives are at stake.

P

Paolo Moretti

Eric Lippert has written a very interesting series of articles about GUIDs.

There are on the order 230 personal computers in the world (and of course lots of hand-held devices or non-PC computing devices that have more or less the same levels of computing power, but lets ignore those). Let's assume that we put all those PCs in the world to the task of generating GUIDs; if each one can generate, say, 220 GUIDs per second then after only about 272 seconds -- one hundred and fifty trillion years -- you'll have a very high chance of generating a collision with your specific GUID. And the odds of collision get pretty good after only thirty trillion years.

GUID Guide, part one

GUID Guide, part two

GUID Guide, part three

...and he continues in the next paragraph: "But that's looking for a collision with a specific GUID. [...] So if we put those billion PCs to work generating 122-bits-of-randomness GUIDs, the probability that two of them somewhere in there would collide gets really high after about 2^61 GUIDs are generated. Since we're assuming that about 2^30 machines are doing 2^20 GUIDs per second, we'd expect a collision after about 2^11 seconds, which is about an hour." (And finally he explains that, of course, not that many GUIDs are generated.)

M

Michael Haren

Theoretically, no, they are not unique. It's possible to generate an identical guid over and over. However, the chances of it happening are so low that you can assume they are unique.

I've read before that the chances are so low that you really should stress about something else--like your server spontaneously combusting or other bugs in your code. That is, assume it's unique and don't build in any code to "catch" duplicates--spend your time on something more likely to happen (i.e. anything else).

I made an attempt to describe the usefulness of GUIDs to my blog audience (non-technical family memebers). From there (via Wikipedia), the odds of generating a duplicate GUID:

1 in 2^128

1 in 340 undecillion (don’t worry, undecillion is not on the quiz)

1 in 3.4 × 10^38

1 in 340,000,000,000,000,000,000,000,000,000,000,000,000

Actually, I disagree about 'not worrying about it', although from a different stance: if you do detect a GUID collision, then something has gone wrong with your application. I've used GUIDs, for instance, for idempotency, and have got a collision when a command has been sent twice (with the same GUID).

C

Cine

None seems to mention the actual math of the probability of it occurring.

First, let's assume we can use the entire 128 bit space (Guid v4 only uses 122 bits).

We know that the general probability of NOT getting a duplicate in n picks is:

(1-1/2128)(1-2/2128)...(1-(n-1)/2128)

Because 2¹²⁸ is much much larger than n, we can approximate this to:

(1-1/2128)n(n-1)/2

And because we can assume n is much much larger than 0, we can approximate that to:

(1-1/2128)n^2/2

Now we can equate this to the "acceptable" probability, let's say 1%:

(1-1/2128)n^2/2 = 0.01

Which we solve for n and get:

n = sqrt(2* log 0.01 / log (1-1/2128))

Which Wolfram Alpha gets to be 5.598318 × 1019

To put that number into perspective, lets take 10000 machines, each having a 4 core CPU, doing 4Ghz and spending 10000 cycles to generate a Guid and doing nothing else. It would then take ~111 years before they generate a duplicate.

I've edited your post following to this post - please edit if I did a mistake ;).

Hi @Cine, I have the power to edit your response but have opted not to because I want to get a chance for you to rebut it first, I'll probably come by in a month-ish to formally change it if I don't hear from you. I'm fairly certain your math is wrong though. the real equation for determining a 1% chance is this: ((2^128 - 1) / 2 ^128) ^ ( (n (n-1)) / 2) = .01. Your exponent is wrong. it isn't just n. You need C(n,2) (aka (n*(n-1))/2) to calculate all the combinations when you generate "n" guids. See here for more information

Thanks Cine, I too ended up approximating n^2/2 since its so huge :)

It would take 10000 machines 111 years to generate every single possible GUID, and then generate a duplicate. A duplicate would however occur long before all possible GUIDs have been generated. I think the approximate time-frame would depends on how 'random' the GUID generation process is.

@GeorgeK I think you misunderstood... It would take 10000 machines 111 years to have a 1% chance of encountering a duplicate. But yes, this math ofcourse assumes that the random generator is totally random.

C

Community

From http://www.guidgenerator.com/online-guid-generator.aspx

What is a GUID? GUID (or UUID) is an acronym for 'Globally Unique Identifier' (or 'Universally Unique Identifier'). It is a 128-bit integer number used to identify resources. The term GUID is generally used by developers working with Microsoft technologies, while UUID is used everywhere else. How unique is a GUID? 128-bits is big enough and the generation algorithm is unique enough that if 1,000,000,000 GUIDs per second were generated for 1 year the probability of a duplicate would be only 50%. Or if every human on Earth generated 600,000,000 GUIDs there would only be a 50% probability of a duplicate.

isn't a 50% chance of a duplicate high enough to cause fear?

@disklosr yeah it's enough to cause fear if your systems are generating 1 billion GUIDs per second. In the extremely unlikely event you are generating that amount then just chain two GUIDs together...