ChatGPT解决这个技术问题 Extra ChatGPT

biggest integer that can be stored in a double

What is the biggest "no-floating" integer that can be stored in an IEEE 754 double type without losing precision ?


S
Steve Jessop

The biggest/largest integer that can be stored in a double without losing precision is the same as the largest possible value of a double. That is, DBL_MAX or approximately 1.8 × 10308 (if your double is an IEEE 754 64-bit double). It's an integer. It's represented exactly. What more do you want?

Go on, ask me what the largest integer is, such that it and all smaller integers can be stored in IEEE 64-bit doubles without losing precision. An IEEE 64-bit double has 52 bits of mantissa, so I think it's 253:

253 + 1 cannot be stored, because the 1 at the start and the 1 at the end have too many zeros in between.

Anything less than 253 can be stored, with 52 bits explicitly stored in the mantissa, and then the exponent in effect giving you another one.

253 obviously can be stored, since it's a small power of 2.

Or another way of looking at it: once the bias has been taken off the exponent, and ignoring the sign bit as irrelevant to the question, the value stored by a double is a power of 2, plus a 52-bit integer multiplied by 2exponent − 52. So with exponent 52 you can store all values from 252 through to 253 − 1. Then with exponent 53, the next number you can store after 253 is 253 + 1 × 253 − 52. So loss of precision first occurs with 253 + 1.


+1 Good job noticing that the question did not really mean what the asker probably intended and providing both answers ("technically correct" and "probably expected").
Or "messing about" and "trying to help" as I tend to call them :-)
I bow to Tony the Pony, and no other.
You don't mean "all smaller integers", you mean all integers of equal or lesser magnitude. Because there are a lot of negative integers below below 2^53 and cannot be represented exactly in a double.
I do mean smaller, and that's exactly what I mean when I say smaller :-) -1,000,000 is less than 1, but it is not smaller.
G
Glenjamin

9007199254740992 (that's 9,007,199,254,740,992 or 2^53) with no guarantees :)

Program

#include <math.h>
#include <stdio.h>

int main(void) {
  double dbl = 0; /* I started with 9007199254000000, a little less than 2^53 */
  while (dbl + 1 != dbl) dbl++;
  printf("%.0f\n", dbl - 1);
  printf("%.0f\n", dbl);
  printf("%.0f\n", dbl + 1);
  return 0;
}

Result

9007199254740991
9007199254740992
9007199254740992

Assuming it will be 'close' but less than a 2^N, then a faster test is double dbl = 1; while (dbl + 1 != dbl) dbl *= 2; while (dbl == --dbl); which yields the same result
@Seph what the...? No? while (dbl == --dbl) will loop forever or not at all. :) (in this case, not at all, since it is a 2^N). You'll have to approach it from below. It will indeed also result in one less than the expected result (since the one check in the while loop decrements dbl). And it depends on order of execution, if the decrement is done before or after evaluating the left side (which is undefined as far as I know). If it's the former, it'll always be true and loop forever.
Maybe indicate that 2^53=9,007,199,254,740,992 somewhere.
It's hard to argue with this! Nice experiment
A weakness to using while (dbl + 1 != dbl) dbl++; in that dbl + 1 != dbl may evaluate using long double math - consider FLT_EVAL_METHOD == 2. This could end in an infinite loop.
S
Simon Biber

The largest integer that can be represented in IEEE 754 double (64-bit) is the same as the largest value that the type can represent, since that value is itself an integer.

This is represented as 0x7FEFFFFFFFFFFFFF, which is made up of:

The sign bit 0 (positive) rather than 1 (negative)

The maximum exponent 0x7FE (2046 which represents 1023 after the bias is subtracted) rather than 0x7FF (2047 which indicates a NaN or infinity).

The maximum mantissa 0xFFFFFFFFFFFFF which is 52 bits all 1.

In binary, the value is the implicit 1 followed by another 52 ones from the mantissa, then 971 zeros (1023 - 52 = 971) from the exponent.

The exact decimal value is:

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

This is approximately 1.8 x 10308.


What about the largest value that it can represent with all values between it and zero contiguously representable?
@AaronFranke The question didn't ask about contiguous representation, but the answer to that different question has been included in most other answers here, or even wrongly given as the actual answer. It's 2⁵³ (2 to the power of 53).
C
Carl Smotricz

Wikipedia has this to say in the same context with a link to IEEE 754:

On a typical computer system, a 'double precision' (64-bit) binary floating-point number has a coefficient of 53 bits (one of which is implied), an exponent of 11 bits, and one sign bit.

2^53 is just over 9 * 10^15.


@Steve Jessop more or less, that is indeed what I am saying. I have also encountered hardware systems that don't have a FPU that still need to be IEEE-compliant, so that "typical system" stuff doesn't really help me if I come back to here 8 months later and need the same info for my 68K-based microcontroller (assuming it doesn't have a FPU... I can't remember).
@San Jacinto - "This is useless" is unduly harsh. The answer is quite useful, just not as useful as it would have been if it included the comment that typical computer systems do indeed use the IEEE 754 reprensentation.
@Stephen C. Steel, actually you are correct. Under my scenario, coming back to this at a later time and looking for the IEEE max, it is impossibly ambiguous as to what a 'typical system' is, but there is still merit in the answer besides this complaint.
D
Dolphin

You need to look at the size of the mantissa. An IEEE 754 64 bit floating point number (which has 52 bits, plus 1 implied) can exactly represent integers with an absolute value of less than or equal to 2^53.


It can exactly represent 2^53, too :-)
J
Jay

1.7976931348623157 × 10^308

http://en.wikipedia.org/wiki/Double_precision_floating-point_format


this answer would be much better with a citation.
@Carl well, if the integer has zeros beyond to the left, then it is precisely stored.
@all you downvoters: 1.7976931348623157 × 10^308 is an exact integer. Do you all need to attend remedial math classes or something??
We're down to semantics here in the discussion of this hopelessly sunk answer. True, that number can be represented exactly and thereby fulfills the letter of the question. But we all know it's a tiny island of exactitude in an ocean of near misses, and most of us correctly interpolated the question to mean "the largest number beyond which precision goes down the drain." Ah, isn't it wonderful that CompSci is an exact science? :)
@DanMoulding 1.7976931348623157 × 10^308 is an exact integer, but I am pretty sure this particular integer cannot be stored exactly in a double.
J
Jan Heldal

It is true that, for 64-bit IEEE754 double, all integers up to 9007199254740992 == 2^53 can be exactly represented.

However, it is also worth mentioning that all representable numbers beyond 4503599627370496 == 2^52 are integers. Beyond 2^52 it becomes meaningless to test whether or not they are integers, because they are all implicitly rounded to a nearby representable value.

In the range 2^51 to 2^52, the only non-integer values are the midpoints ending with ".5", meaning any integer test after a calculation must be expected to yield at least 50% false answers.

Below 2^51 we also have ".25" and ".75", so comparing a number with its rounded counterpart in order to determine if it may be integer or not starts making some sense.

TLDR: If you want to test whether a calculated result may be integer, avoid numbers larger than 2251799813685248 == 2^51


J
Jay Lee

As others has noted, I will assume that the OP asked for the largest floating-point value such that all whole numbers less than itself is precisely representable.

You can use FLT_MANT_DIG and DBL_MANT_DIG defined in float.h to not rely on the explicit values (e.g., 53):

#include <stdio.h>
#include <float.h>

int main(void)
{
    printf("%d, %.1f\n", FLT_MANT_DIG, (float)(1L << FLT_MANT_DIG));
    printf("%d, %.1lf\n", DBL_MANT_DIG, (double)(1L << DBL_MANT_DIG));
}

outputs:

24, 16777216.0
53, 9007199254740992.0