Why does the enhanced GCC 6 optimizer break practical C++ code?

c++ gcc compiler-optimization undefined-behavior

GCC 6 has a new optimizer feature: It assumes that this is always not null and optimizes based on that.

Value range propagation now assumes that the this pointer of C++ member functions is non-null. This eliminates common null pointer checks but also breaks some non-conforming code-bases (such as Qt-5, Chromium, KDevelop). As a temporary work-around -fno-delete-null-pointer-checks can be used. Wrong code can be identified by using -fsanitize=undefined.

The change document clearly calls this out as dangerous because it breaks a surprising amount of frequently used code.

Why would this new assumption break practical C++ code? Are there particular patterns where careless or uninformed programmers rely on this particular undefined behavior? I cannot imagine anyone writing if (this == NULL) because that is so unnatural.

@Ben Hopefully you mean it in a good way. Code with UB should be rewritten not to invoke UB. It's as simple as that. Heck, there are often FAQs that tell you how to achieve it. So, not a real issue IMHO. All good.

I am amazed to see people defending dereferencing null pointers in the code. Simply amazing.

@Ben, exploting undefined behavior has been the very effective optimization tactic for a very long time. I love it, because I love optimizations which make my code run faster.

I agree with SergeyA. The whole brouhaha started because people seem to dwell on the fact that this is passed as an implicit parameter, so they then start using it just as if it was an explicit parameter. It's not. When you dereference a null this, you're invoking UB just as if you dereferenced any other null pointer. That's all there's to it. If you want to pass nullptrs around, use an explicit parameter, DUH. It won't be any slower, it won't be any clunkier, and the code that has such API is deep in the internals anyway, so has very limited scope. End of story I think.

Kudos to GCC for breaking the cycle of bad code -> inefficient compiler to support bad code -> more bad code -> more inefficient compilation -> ...

αλεχολυτ

I guess the question that needs to be answered why well-intentioned people would write the checks in the first place.

The most common case is probably if you have a class that is part of a naturally occurring recursive call.

If you had:

struct Node
{
    Node* left;
    Node* right;
};

in C, you might write:

void traverse_in_order(Node* n) {
    if(!n) return;
    traverse_in_order(n->left);
    process(n);
    traverse_in_order(n->right);
}

In C++, it's nice to make this a member function:

void Node::traverse_in_order() {
    // <--- What check should be put here?
    left->traverse_in_order();
    process();
    right->traverse_in_order();
}

In the early days of C++ (prior to standardization), it was emphasized that that member functions were syntactic sugar for a function where the this parameter is implicit. Code was written in C++, converted to equivalent C and compiled. There were even explicit examples that comparing this to null was meaningful and the original Cfront compiler took advantage of this too. So coming from a C background, the obvious choice for the check is:

if(this == nullptr) return;

Note: Bjarne Stroustrup even mentions that the rules for this have changed over the years here

And this worked on many compilers for many years. When standardization happened, this changed. And more recently, compilers started taking advantage of calling a member function where this being nullptr is undefined behavior, which means that this condition is always false, and the compiler is free to omit it.

That means that to do any traversal of this tree, you need to either:

Do all of the checks before calling traverse_in_order void Node::traverse_in_order() { if(left) left->traverse_in_order(); process(); if(right) right->traverse_in_order(); } This means also checking at EVERY call site if you could have a null root.

Don't use a member function This means that you're writing the old C style code (perhaps as a static method), and calling it with the object explicitly as a parameter. eg. you're back to writing Node::traverse_in_order(node); rather than node->traverse_in_order(); at the call site.

I believe the easiest/neatest way to fix this particular example in a way that is standards compliant is to actually use a sentinel node rather than a nullptr. // static class, or global variable Node sentinel; void Node::traverse_in_order() { if(this == &sentinel) return; ... }

Neither of the first two options seem that appealing, and while code could get away with it, they wrote bad code with this == nullptr instead of using a proper fix.

I'm guessing that's how some of these code bases evolved to have this == nullptr checks in them.

How can 1 == 0 be undefined behavior? It's simply false.

The check itself is not an undefined behaviour. It's just always false, and thus eliminated by the compiler.

Hmm.. this == nullptr idiom is undefined behavior because you've called a member function on a nullptr object before that, which is undefined. And the compiler is free to omit the check

@Joshua, the first standard was published in 1998. Whatever happened before that was whatever every implementation wanted. Dark ages.

Heh, wow, I can't believe anyone ever wrote code that relied on calling instance functions... without an instance. I would have instinctively used the excerpt marked "Do all of the checks before calling traverse_in_order", without even thinking about this ever being nullable. I guess maybe this is the benefit of learning C++ in an age where SO exists to entrench the perils of UB in my brain and dissuade me from doing bizarre hacks like this.

Community

It does so because the "practical" code was broken and involved undefined behavior to begin with. There's no reason to use a null this, other than as a micro-optimization, usually a very premature one.

It's a dangerous practice, since adjustment of pointers due to class hierarchy traversal can turn a null this into a non-null one. So, at the very least, the class whose methods are supposed to work with a null this must be a final class with no base class: it can't derive from anything, and it can't be derived from. We're quickly departing from practical to ugly-hack-land.

In practical terms, the code doesn't have to be ugly:

struct Node
{
  Node* left;
  Node* right;
  void process();
  void traverse_in_order() {
    traverse_in_order_impl(this);
  }
private:
  static void traverse_in_order_impl(Node * n)
    if (!n) return;
    traverse_in_order_impl(n->left);
    n->process();
    traverse_in_order_impl(n->right);
  }
};

If you had an empty tree (eg. root is nullptr), this solution is still relying on undefined behavior by calling traverse_in_order with a nullptr.

If the tree is empty, a.k.a. a null Node* root, you aren't supposed to be calling any non-static methods on it. Period. It's perfectly fine to have C-like tree code that takes an instance pointer by an explicit parameter.

The argument here seems to boil down to somehow needing to write non-static methods on objects that could be called from a null instance pointer. There's no such need. The C-with-objects way of writing such code is still way nicer in the C++ world, because it can be type safe at the very least. Basically, the null this is such a micro-optimization, with such narrow field of use, that disallowing it is IMHO perfectly fine. No public API should depend on a null this.

@Ben, Whoever wrote this code was wrong in the first place. It is funny that you are naming such terribly broken projects as MFC, Qt and Chromium. Good riddance with them.

@Ben, terrible coding styles in Google are well known to me. Google code (at least publicly available) is often badly written, despite multiple people believing Google code to be the shining example. May be this will make them to revisit their coding styles (and guidelines while they are on it).

@Ben No one is retroactively replacing Chromium on these devices with Chromium compiled using gcc 6. Before Chromium will be compiled using gcc 6, and other modern compilers, it will need to be fixed. It's not a huge task either; the this checks are picked out by various static code analyzers, so it's not as if anyone has to manually hunt them all down. The patch would be probably a couple hundred lines of trivial changes.

@Ben In practical terms, a null this dereference is an instant crash. These problems will be found out very quickly even if nobody cares to run a static analyzer over the code. C/C++ follows the "pay only for features you use" mantra. If you want checks, you must be explicit about them and that means not doing them on this, when it's too late, since the compiler assumes this isn't null. Otherwise it'd have to check this, and for 99.9999% of code out there such checks are a waste of time.

my advice for anyone who thinks the standard is broken: use a different language. There is no shortage of C++-like languages that don't have the possibility of undefined behaviour.

eerorika

The change document clearly calls this out as dangerous because it breaks a surprising amount of frequently used code.

The document doesn't call it dangerous. Nor does it claim that it breaks a surprising amount of code. It simply points out a few popular code bases which it claims to be known to rely on this undefined behaviour and would break due to the change unless the workaround option is used.

Why would this new assumption break practical C++ code?

If practical c++ code relies on undefined behaviour, then changes to that undefined behaviour can break it. This is why UB is to be avoided, even when a program relying on it appears to work as intended.

Are there particular patterns where careless or uninformed programmers rely on this particular undefined behavior?

I don't know if it's wide spread anti-pattern, but an uninformed programmer might think that they can fix their program from crashing by doing:

if (this)
    member_variable = 42;

When the actual bug is dereferencing a null pointer somewhere else.

I'm sure that if programmer is uninformed enough, they will be able to come up with more advanced (anti)-patterns that rely on this UB.

I cannot imagine anyone writing if (this == NULL) because that is so unnatural.

I can.

"If practical c++ code relies on undefined behaviour, then changes to that undefined behaviour can break it. This is why UB is to be avoided" this * 1000

if(this == null) PrintSomeHelpfulDebugInformationAboutHowWeGotHere(); Such as a nice easy-to-read log of a sequence of events that a debugger can't easily tell you about. Have fun debugging this now without spending hours placing checks everywhere when there's a sudden random null in a large data set, in code you haven't written... And the UB rule about this was made later, after C++ was created. It used to be valid.

@StephaneHockenhull That's what -fsanitize=null is for.

@user2079303 Issues: Is that going to slow down production code to the point where you can't leave the check in while running, costing the company a lot of money? Is that going to increase size and not fit in flash? Does that work on all target platforms including Atmel? Can -fsanitize=null log the errors to the SD/MMC card on Pins #5,6,10,11 using SPI? That's not a universal solution. Some have argued that it goes against object-oriented principles to access a null object yet some OOP languages have a null object that can be operated on so it's not a universal rule of OOP. 1/2

...a regular expression which matches such files? Saying that e.g. if an lvalue is accessed twice, a compiler may consolidate the accesses unless code between them does any of several specific things would be much easier than trying to define the precise situations in which code is allowed to access storage.

Jonathan Wakely

Some of the "practical" (funny way to spell "buggy") code that was broken looked like this:

void foo(X* p) {
  p->bar()->baz();
}

and it forgot to account for the fact that p->bar() sometimes returns a null pointer, which means that dereferencing it to call baz() is undefined.

Not all the code that was broken contained explicit if (this == nullptr) or if (!p) return; checks. Some cases were simply functions that didn't access any member variables, and so appeared to work OK. For example:

struct DummyImpl {
  bool valid() const { return false; }
  int m_data;
};
struct RealImpl {
  bool valid() const { return m_valid; }
  bool m_valid;
  int m_data;
};

template<typename T>
void do_something_else(T* p) {
  if (p) {
    use(p->m_data);
  }
}

template<typename T>
void func(T* p) {
  if (p->valid())
    do_something(p);
  else 
    do_something_else(p);
}

In this code when you call func<DummyImpl*>(DummyImpl*) with a null pointer there is a "conceptual" dereference of the pointer to call p->DummyImpl::valid(), but in fact that member function just returns false without accessing *this. That return false can be inlined and so in practice the pointer doesn't need to be accessed at all. So with some compilers it appears to work OK: there's no segfault for dereferencing null, p->valid() is false, so the code calls do_something_else(p), which checks for null pointers, and so does nothing. No crash or unexpected behaviour is observed.

With GCC 6 you still get the call to p->valid(), but the compiler now infers from that expression that p must be non-null (otherwise p->valid() would be undefined behaviour) and makes a note of that information. That inferred information is used by the optimizer so that if the call to do_something_else(p) gets inlined, the if (p) check is now considered redundant, because the compiler remembers that it is not null, and so inlines the code to:

template<typename T>
void func(T* p) {
  if (p->valid())
    do_something(p);
  else {
    // inlined body of do_something_else(p) with value propagation
    // optimization performed to remove null check.
    use(p->m_data);
  }
}

This now really does dereference a null pointer, and so code that previously appeared to work stops working.

In this example the bug is in func, which should have checked for null first (or the callers should never have called it with null):

template<typename T>
void func(T* p) {
  if (p && p->valid())
    do_something(p);
  else 
    do_something_else(p);
}

An important point to remember is that most optimizations like this are not a case of the compiler saying "ah, the programmer tested this pointer against null, I will remove it just to be annoying". What happens is that various run-of-the-mill optimizations like inlining and value range propagation combine to make those checks redundant, because they come after an earlier check, or a dereference. If the compiler knows that a pointer is non-null at point A in a function, and the pointer isn't changed before a later point B in the same function, then it knows it is also non-null at B. When inlining happens points A and B might actually be pieces of code that were originally in separate functions, but are now combined into one piece of code, and the compiler is able to apply its knowledge that the pointer is non-null in more places. This is a basic, but very important optimization, and if compilers didn't do that everyday code would be considerably slower and people would complain about unnecessary branches to re-test the same conditions repeatedly.

Is it possible to instrument GCC 6 to output compile-time warnings when it encounters such usages of this?

@jotik Something like "warning: after 3 levels of inlining (potentially across files with Link Time Optimization), some common subexpression elimination, after hoisting this thing out of a loop and proving that these 13 pointers don't alias, we found a case where you are [comparing this to null]"?

@jotik, ^^^ what T.C said. It would be possible, but you would get that warning FOR ALL CODE, ALL THE TIME. Value range propagation is one of the most common optimizations, that affects almost all code, everywhere. The optimizers just see code, which can be simplified. They don't see "a piece of code written by an idiot that wants to be warned if their dumb UB gets optimized away". It's not easy for the compiler to tell the difference between "redundant check that the programmer wants to be optimized" and "redundant check that the programmer thinks will help, but is redundant".

If you want to instrument your code to give runtime errors for various types of UB, including invalid uses of this, then just use -fsanitize=undefined

@jotik There's already a warning for that.

Ben

The C++ standard is broken in important ways. Unfortunately, rather than protect the users from these problems, the GCC developers have chosen to use undefined behaviour as an excuse to implement marginal optimisations, even when it has been clearly explained to them how harmful it is.

Here a much cleverer person than I explains in great detail. (He's talking about C but the situation is the same there).

https://groups.google.com/forum/m/#!msg/boring-crypto/48qa1kWignU/o8GGp2K1DAAJ

Why is it harmful?

Simply recompiling previously working, secure code with a newer version of the compiler can introduce security vulnerabilities. While the new behaviour can be disabled with a flag, existing makefiles do not have that flag set, obviously. And since no warning is produced, it is not obvious to the developer that the previously reasonable behaviour has changed.

In this example, the developer has included a check for integer overflow, using assert, which will terminate the program if an invalid length is supplied. The GCC team removed the check on the basis that integer overflow is undefined, therefore the check can be removed. This resulted in real in-the-wild instances of this codebase being re-made vulnerable after the issue had been fixed.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475

Read the whole thing. It's enough to make you weep.

OK, but what about this one?

Way back when, there was a fairly common idiom which went something like this:

 OPAQUEHANDLE ObjectType::GetHandle(){
    if(this==NULL)return DEFAULTHANDLE;
    return mHandle;

 }

 void DoThing(ObjectType* pObj){
     osfunction(pObj->GetHandle(), "BLAH");
 }

So the idiom is: If pObj is not null, you use the handle it contains, otherwise you use a default handle. This is encapsulated in the GetHandle function.

The trick is that calling a non-virtual function doesn't actually make any use of the this pointer, so there is no access violation.

I still don't get it

A lot of code exists which is written like that. If someone simply recompiles it, without changing a line, every call to DoThing(NULL) is a crashing bug - if you are lucky.

If you are not lucky, calls to crashing bugs become remote execution vulnerabilities.

This can occur even automatically. You've got an automated build system, right? Upgrading it to the latest compiler is harmless, right? But now it's not - not if your compiler is GCC.

OK so tell them!

They've been told. They are doing this in the full knowledge of the consequences.

but... why?

Who can say? Perhaps:

They value the ideal purity of the C++ language over actual code

They believe people should be punished for not following the standard

They have no understanding of the reality of the world

They are ... introducing bugs on purpose. Perhaps for a foreign government. Where do you live? All governments are foreign to most of the world, and most are hostile to some of the world.

Or perhaps something else. Who can say?

Disagree with every and single line of the answer. Same comments were made for strict aliasing optimizations, and those hopefully are dismissed now. The solution is to educate developers, not prevent optimizations based on bad development habits.

I did go and read the whole thing like you said, and indeed I wept, but mainly at the stupidity of Felix which I don't think was what you were trying to get across...

Downvoted for the useless rant. "They are ... introducing bugs on purpose. Perhaps for a foreign government." Really? This isn't /r/conspiracy.

Decent programmers over and over again repeat the mantra do not invoke undefined behaviour, yet these nonks have gone ahead and done it anyway. And look what happened. I have no sympathy whatsoever. This is the developers' fault, simple as that. They need to take responsibility. Remember that? Personal responsibility? People relying on your mantra "but what about in practice!" is precisely how this situation arose in the first place. Avoiding nonsense like this is precisely why standards exist in the first place. Code to standards, and you won't have a problem. Period.

"Simply recompiling previously working, secure code with a newer version of the compiler can introduce security vulnerabilities" - that always happens. Unless you want to mandate that one version of one compiler is the only compiler that will be allowed for the rest of eternity. Do you remember back when the linux kernel could only be compiled with exactly gcc 2.7.2.1 ? The gcc project even got forked because people were fed up with bullcrap . It took a long time to get past that.

Why does the enhanced GCC 6 optimizer break practical C++ code?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Links

Contact US