Recently I stumbled over a comparison between Rust and C and they use the following code:
bool f(int* a, const int* b) {
*a = 2;
int ret = *b;
*a = 3;
return ret != 0;
}
In Rust (same code, but with Rust syntax), it produces the following Assembler Code:
cmp dword ptr [rsi], 0
mov dword ptr [rdi], 3
setne al
ret
While with gcc it produces the following:
mov DWORD PTR [rdi], 2
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], 3
test eax, eax
setne al
ret
The text claims that the C function can't optimize the first line away, because a
and b
could point to the same number. In Rust this is not allowed so the compiler can optimize it away.
Now to my question:
The function takes a const int*
which is a pointer to a const int. I read this question and it states that modifying a const int with a pointer should result in a compiler warning and in the worst cast in UB.
Could this function result in a UB if I call it with two pointers to the same integer?
Why can't the C compiler optimize the first line away, under the assumption, that two pointers to the same variable would be illegal/UB?
int foo = 0; f(&foo, &foo);
. This is perfectly legal C and it works as expected with your function returning 1
.
restrict
to tell it that updating a doesn't update the value b is pointing to, which needs to be included in the comparison to 0, hence the store to a that happens before the comparison needs to go ahead, whereas in rust the default assumption is restrict
Why can't the C Compiler optimize the first line away, under the assumption, that two pointers to the same variable would be illegal/UB?
Because you haven't instructed the C compiler to do so -- that it is allowed to make that assumption.
C has a type qualifier for exactly this called restrict
which roughly means: this pointer does not overlap with other pointers (not exactly, but play along).
The assembly output for
bool f(int* restrict a, const int* b) {
*a = 2;
int ret = *b;
*a = 3;
return ret != 0;
}
is
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], 3
test eax, eax
setne al
ret
... which removes/optimizes-away the assignment *a = 2
From https://en.wikipedia.org/wiki/Restrict
In the C programming language, restrict is a keyword that can be used in pointer declarations. By adding this type qualifier, a programmer hints to the compiler that for the lifetime of the pointer, only the pointer itself or a value directly derived from it (such as pointer + 1) will be used to access the object to which it points.
The function int f(int *a, const int *b);
promises to not change the contents of b
through that pointer... It makes no promises regarding access to variables through the a
pointer.
If a
and b
point to the same object, changing it through a
is legal (provided the underlying object is modifiable, of course).
Example:
int val = 0;
f(&val, &val);
*b
through b
(after casting away const
ness) would also be legal C. const
is essentially a lint for the programmer, not a hint for the compiler -- C is not allowed to optimize calls to f
by assuming that it does not change *b
. restrict
-qualified pointers are another matter (see Morten Jensen's answer, and my comment there).
While the other answers mention the C side, it is still worth taking a look at the Rust side. With Rust the code you have is probably this:
fn f(a:&mut i32, b:&i32)->bool{
*a = 2;
let ret = *b;
*a = 3;
return ret != 0;
}
The function takes in two references, one mutable, one not. References are pointers that are guaranteed to be valid for reads, and mutable references are also guaranteed to be unique, so it gets optimized to
cmp dword ptr [rsi], 0
mov dword ptr [rdi], 3
setne al
ret
However, Rust also has raw pointers that are equivalent to C's pointers and make no such guarantees. The following function, which takes in raw pointers:
unsafe fn g(a:*mut i32, b:*const i32)->bool{
*a = 2;
let ret = *b;
*a = 3;
return ret != 0;
}
misses out on the optimization and compiles to this:
mov dword ptr [rdi], 2
cmp dword ptr [rsi], 0
mov dword ptr [rdi], 3
setne al
ret
const
pointer) and it must refer to an object (as opposed to a pointer set to nullptr
) but under the hood it's just a pointer that you don't dereference to access what's pointed to.
The function takes a const int* which is a pointer to a const int.
No, const int*
is not a pointer to a const int. Anyone who says that is deluded.
int* is a pointer to an int that definitely isn't const.
const int* is a pointer to an int of unknown constness.
There is no way to express the notion of a pointer to an int that definitely is const.
If C was a better designed language, then const int *
would be a pointer to a const int, mutable int *
(borrowing a keyword from C++) would be a pointer to a non-const int, and int *
would be a pointer to an int of unknown constness. Dropping the qualifiers (i.e., forgetting something about the pointed-to type) would be safe – the opposite of real C in which adding the const
qualifier is safe. I haven't used Rust, but it appears from examples in another answer that it uses a syntax like that.
Bjarne Stroustrup, who introduced const
, originally named it readonly
, which is much closer to its actual meaning. int readonly*
would have made it clearer that it's the pointer that's read-only, not the pointed-to object. The renaming to const
has confused generations of programmers.
When I have the choice, I always write foo const*
, not const foo*
, as the next best thing to readonly*
.
const int *
nor int *
say anything about the const
ness of the underlying object. C does not keep track of such a thing. It is perfectly legal to have an int *
that points to an int
declared const
.
It should be noted that this question is talking about optimisation on -Ofast
and how it's even the case there.
Essentially, the C compiler of the function does not know the full discrete set of addresses that might be passed to it as that isn't known until link time / runtime as the function can be called from multiple translation units, and therefore it makes considerations that handle any legal address that a
and b
might point to, and of course that includes the case where they overlap.
Therefore, you need to use restrict
to tell it that updating a
(which the function allows because it is not a pointer-to-const, but even then the function could cast off const) doesn't update the value b
is pointing to, which needs to be included in the comparison to 0, hence the store to a
that happens before the comparison needs to go ahead, whereas on rust the default assumption is restrict. The compiler of the function does however know that *a
is the same as *(a+1-1)
and therefore will not produce 2 separate stores, but it does not know whether a
or b
overlap.
Success story sharing
restrict
, even the shared ones -- so the equivalent of Aiden4's Rust code would bebool f(int * restrict a, const int* restrict b)
. Wikipedia omits to mention two facts aboutrestrict
that make this work: first, that the single pointer access rule only applies if the object behind the pointer is modified (sob
can be bothrestrict
and aliased, because*b
is not modified); and second, that aconst
-qualifiedrestrict
pointer must not point to something that is modified (unlike normalconst
pointers). (C99 6.7.3.1p4)const restrict
pointer may point to objects that are modified, if it is not used to access any of the modified portions of the object. For example, givenvoid test(int * restrict p, int const * restrict q)
, code could writep[0]
and read bothp[1]
andq[1]
, provided it never readsq[0]
, nor (for clang or gcc) tests any restrict-qualified pointer for equality with anything not derived from it.