ChatGPT解决这个技术问题 Extra ChatGPT

Why does integer overflow on x86 with GCC cause an infinite loop?

The following code goes into an infinite loop on GCC:

#include <iostream>
using namespace std;

int main(){
    int i = 0x10000000;

    int c = 0;
    do{
        c++;
        i += i;
        cout << i << endl;
    }while (i > 0);

    cout << c << endl;
    return 0;
}

So here's the deal: Signed integer overflow is technically undefined behavior. But GCC on x86 implements integer arithmetic using x86 integer instructions - which wrap on overflow.

Therefore, I would have expected it to wrap on overflow - despite the fact that it is undefined behavior. But that's clearly not the case. So what did I miss?

I compiled this using:

~/Desktop$ g++ main.cpp -O2

GCC Output:

~/Desktop$ ./a.out
536870912
1073741824
-2147483648
0
0
0

... (infinite loop)

With optimizations disabled, there is no infinite loop and the output is correct. Visual Studio also correctly compiles this and gives the following result:

Correct Output:

~/Desktop$ g++ main.cpp
~/Desktop$ ./a.out
536870912
1073741824
-2147483648
3

Here are some other variations:

i *= 2;   //  Also fails and goes into infinite loop.
i <<= 1;  //  This seems okay. It does not enter infinite loop.

Here's all the relevant version information:

~/Desktop$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5.2/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ..

...

Thread model: posix
gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) 
~/Desktop$ 

So the question is: Is this a bug in GCC? Or did I misunderstand something about how GCC handles integer arithmetic?

*I'm tagging this C as well, because I assume this bug will reproduce in C. (I haven't verified it yet.)

EDIT:

Here's the assembly of the loop: (if I recognized it properly)

.L5:
addl    %ebp, %ebp
movl    $_ZSt4cout, %edi
movl    %ebp, %esi
.cfi_offset 3, -40
call    _ZNSolsEi
movq    %rax, %rbx
movq    (%rax), %rax
movq    -24(%rax), %rax
movq    240(%rbx,%rax), %r13
testq   %r13, %r13
je  .L10
cmpb    $0, 56(%r13)
je  .L3
movzbl  67(%r13), %eax
.L4:
movsbl  %al, %esi
movq    %rbx, %rdi
addl    $1, %r12d
call    _ZNSo3putEc
movq    %rax, %rdi
call    _ZNSo5flushEv
cmpl    $3, %r12d
jne .L5
Signed integer overflow in C/C++ is undefined behaviour (unsigned integer operations are modulo $2^w$ where $w$ is the wordsize).
-1. you say that this is strictly speaking undefined behavior and ask whether this is undefined behavior. so this is not a real question for me.
@JohannesSchaub-litb Thanks for commenting. Probably bad wording on my part. I'll try my best to clarify in a way to earn your undownvote (and I'll edit the question accordingly). Basically, I know it's UB. But I also know that GCC on x86 uses x86 integer instructions - which wrap on overflow. Therefore, I expected it to wrap despite it being UB. However, it didn't and that confused me. Hence the question.

b
bdonlan

When the standard says it's undefined behavior, it means it. Anything can happen. "Anything" includes "usually integers wrap around, but on occasion weird stuff happens".

Yes, on x86 CPUs, integers usually wrap the way you expect. This is one of those exceptions. The compiler assumes you won't cause undefined behavior, and optimizes away the loop test. If you really want wraparound, pass -fwrapv to g++ or gcc when compiling; this gives you well-defined (twos-complement) overflow semantics, but can hurt performance.


@Mysticial, or just use unsigned ints, where overflow behavior is perfectly well-defined always. Using <<= may happen to work now, but there is no guarentee that will be the case in future versions of gcc.
Is there a warning option that attempts to notice accidental infinite loops?
I found -Wunsafe-loop-optimizations mentioned here : stackoverflow.com/questions/2982507/…
-1 "Yes, on x86 CPUs, integers usually wrap the way you expect." that's wrong. but it's subtle. as i recall it's possible to make them trap on overflow, but that's not what we're talking about here, and i've never seen it done. other than that, and disregarding x86 bcd operations (not permitted representation in C++) x86 integer ops always wrap, because they're two's complement. you're mistaking g++ faulty (or extremely impractical and nonsense) optimization for a property of x86 integer ops.
@Cheersandhth.-Alf, by 'on x86 CPUs' I mean 'when you're developing for x86 CPUs using a C compiler'. Do I really need to spell it out? Obviously all my talk about compilers and GCC is irrelevant if you're developing in assembler, in which case the semantics for integer overflow are very well-defined indeed.
D
Dennis

It's simple: Undefined behaviour - especially with optimization (-O2) turned on - means anything can happen.

Your code behaves as (you) expected without the -O2 switch.

It's works quite fine with icl and tcc by the way, but you can't rely on stuff like that...

According to this, gcc optimization actually exploits signed integer overflow. This would mean that the "bug" is by design.


It kinda sucks that a compiler would opt for an infinite loop of all things for undefined behavior.
@Inverse: I disagree. If you're have coded something with undefined behavior, pray for an infinite loop. Makes it easier to detect...
I mean if the compiler is actively looking for UB, why not insert an exception instead of trying to hyper-optimize broken code?
@Inverse: The compiler isn't actively looking for undefined behavior, it assumes that it doesn't occur. This allows the compiler to optimize the code. For example, instead of computing for (j = i; j < i + 10; ++j) ++k;, it will just set k = 10, since this will always be true if no signed overflow occurs.
@Inverse The compiler didn't "opt" for anything. You wrote the loop in your code. The compiler didn't invent it.
M
Mankarse

The important thing to note here is that C++ programs are written for the C++ abstract machine (which is usually emulated through hardware instructions). The fact that you are compiling for x86 is totally irrelevant to the fact that this has undefined behaviour.

The compiler is free to use the existence of undefined behaviour to improve its optimisations, (by removing a conditional from a loop, as in this example). There is no guaranteed, or even useful, mapping between C++ level constructs and x86 level machine code constructs apart from the requirement that the machine code will, when executed, produce the result demanded by the C++ abstract machine.


l
lostyzd
i += i;

// the overflow is undefined.

With -fwrapv it is correct. -fwrapv


v
vonbrand

Please people, undefined behaviour is exactly that, undefined. It means that anything could happen. In practice (as in this case), the compiler is free to assume it won't be called upon, and do whatever it pleases if that could make the code faster/smaller. What happens with code that should't run is anybody's guess. It will depend on the surrounding code (depending on that, the compiler could well generate different code), variables/constants used, compiler flags, ... Oh, and the compiler could get updated and write the same code differently, or you could get another compiler with a different view on code generation. Or just get a different machine, even another model in the same architecture line could very well have it's own undefined behaviour (look up undefined opcodes, some enterprising programmers found out that on some of those early machines sometimes did do useful stuff...). There is no "the compiler gives a definite behaviour on undefined behaviour". There are areas that are implementation-defined, and there you should be able to count on the compiler behaving consistently.


Yes, I know very well what undefined behavior is. But when you know how certain aspects of the language is implemented for a particular environment, you can expect to see certain types of UB and not others. I know that GCC implements integer arithmetic as x86 integer arithmetic - which wraps on overflow. So I assumed the behavior as such. What I didn't expect was GCC to do something else as bdonlan has answered.
Wrong. What happens is that GCC is allowed to assume you won't invoke undefined behaviour, so it just emits code as if it couldn't happen. If it does happen, the instructions to do what you ask for with no undefined behaviour get executed, and the result is whatever the CPU does. I.e., on x86 is does x86 stuff. If it is another processor, it could do something totally different. Or the compiler could be smart enough to figure out that you are calling on undefined behaviour and start nethack (yes, some ancient versions of gcc did exactly that).
I believe you misread my comment. I said: "What I didn't expect" - which is why I asked the question in the first place. I didn't expect GCC to pull any tricks.
s
supercat

Even if a compiler were to specify that integer overflow must be considered a "non-critical" form of Undefined Behavior (as defined in Annex L), the result of an integer overflow should, absent a specific platform promise of more specific behavior, be at minimum regarded as a "partially-indeterminate value". Under such rules, adding 1073741824+1073741824 could arbitrarily be regarded as yielding 2147483648 or -2147483648 or any other value which was congruent to 2147483648 mod 4294967296, and values obtained by additions could arbitrarily be regarded as any value which was congruent to 0 mod 4294967296.

Rules allowing overflow to yield "partially-indeterminate values" would be sufficiently well-defined to abide by the letter and spirit of Annex L, but would not prevent a compiler from making the same generally-useful inferences as would be justified if overflows were unconstrained Undefined Behavior. It would prevent a compiler from making some phony "optimizations" whose primary effect in many cases is to require that programmers add extra clutter to the code whose sole purpose is to prevent such "optimizations"; whether that would be a good thing or not depends on one's point of view.