Is there a performance difference between i++ and ++i in C++?

z

zar

[Executive Summary: Use ++i if you don't have a specific reason to use i++.]

For C++, the answer is a bit more complicated.

If i is a simple type (not an instance of a C++ class), then the answer given for C ("No there is no performance difference") holds, since the compiler is generating the code.

However, if i is an instance of a C++ class, then i++ and ++i are making calls to one of the operator++ functions. Here's a standard pair of these functions:

Foo& Foo::operator++()   // called for ++i
{
    this->data += 1;
    return *this;
}

Foo Foo::operator++(int ignored_dummy_value)   // called for i++
{
    Foo tmp(*this);   // variable "tmp" cannot be optimized away by the compiler
    ++(*this);
    return tmp;
}

Since the compiler isn't generating code, but just calling an operator++ function, there is no way to optimize away the tmp variable and its associated copy constructor. If the copy constructor is expensive, then this can have a significant performance impact.

What the compiler can avoid is the second copy to return tmp, by allocating tmp in the caller, through NRVO, as mentioned by another comment.

Can't the compiler avoid this altogether if operator++ is inlined?

Yes if operator++ is inlined and tmp is never used it can be removed unless the tmp object's constructor or destructor has side effects.

@kriss: the difference between C and C++ is that in C you have a guarantee that the operator will be inlined, and at that point a decent optimizer will be able to remove the difference; instead in C++ you can't assume inlining - not always.

I'd +1 IF the answer mentioned something about classes that hold pointers (whether auto, smart, or primitive) to dynamically-allocated (heap) memory, where the copy constructor necessarily performs deep copies. In such cases, there is no argument, ++i is perhaps an order of magnitude more efficient than i++. They key is to get in the habit of using pre-increment whenever post-increment semantics are not actually required by your algorithm, and you'll then be in the habit of writing code that by nature lends itself to greater efficiency, regardless of how well your compiler can optimize.

l

lc.

Yes. There is.

The ++ operator may or may not be defined as a function. For primitive types (int, double, ...) the operators are built in, so the compiler will probably be able to optimize your code. But in the case of an object that defines the ++ operator things are different.

The operator++(int) function must create a copy. That is because postfix ++ is expected to return a different value than what it holds: it must hold its value in a temp variable, increment its value and return the temp. In the case of operator++(), prefix ++, there is no need to create a copy: the object can increment itself and then simply return itself.

Here is an illustration of the point:

struct C
{
    C& operator++();      // prefix
    C  operator++(int);   // postfix

private:

    int i_;
};

C& C::operator++()
{
    ++i_;
    return *this;   // self, no copy created
}

C C::operator++(int ignored_dummy_value)
{
    C t(*this);
    ++(*this);
    return t;   // return a copy
}

Every time you call operator++(int) you must create a copy, and the compiler can't do anything about it. When given the choice, use operator++(); this way you don't save a copy. It might be significant in the case of many increments (large loop?) and/or large objects.

"The pre increment operator introduces a data dependency in the code: the CPU must wait for the increment operation to be completed before its value can be used in the expression. On a deeply pipelined CPU, this introduces a stall. There is no data dependency for the post increment operator." (Game Engine Architecture (2nd edition)) So if the copy of a post increment is not computationally intensive, it can still beat the pre increment.

In the postfix code, how does this work C t(*this); ++(*this); return t; In the second line, you're incrementing the this pointer right, so how does t get updated if you're incrementing this. Weren't the values of this already copied into t?

The operator++(int) function must create a copy. no, it is not. No more copies than operator++()

2

2 revs

Here's a benchmark for the case when increment operators are in different translation units. Compiler with g++ 4.5.

Ignore the style issues for now

// a.cc
#include <ctime>
#include <array>
class Something {
public:
    Something& operator++();
    Something operator++(int);
private:
    std::array<int,PACKET_SIZE> data;
};

int main () {
    Something s;

    for (int i=0; i<1024*1024*30; ++i) ++s; // warm up
    std::clock_t a = clock();
    for (int i=0; i<1024*1024*30; ++i) ++s;
    a = clock() - a;

    for (int i=0; i<1024*1024*30; ++i) s++; // warm up
    std::clock_t b = clock();
    for (int i=0; i<1024*1024*30; ++i) s++;
    b = clock() - b;

    std::cout << "a=" << (a/double(CLOCKS_PER_SEC))
              << ", b=" << (b/double(CLOCKS_PER_SEC)) << '\n';
    return 0;
}

O(n) increment

Test

// b.cc
#include <array>
class Something {
public:
    Something& operator++();
    Something operator++(int);
private:
    std::array<int,PACKET_SIZE> data;
};


Something& Something::operator++()
{
    for (auto it=data.begin(), end=data.end(); it!=end; ++it)
        ++*it;
    return *this;
}

Something Something::operator++(int)
{
    Something ret = *this;
    ++*this;
    return ret;
}

Results

Results (timings are in seconds) with g++ 4.5 on a virtual machine:

Flags (--std=c++0x)       ++i   i++
-DPACKET_SIZE=50 -O1      1.70  2.39
-DPACKET_SIZE=50 -O3      0.59  1.00
-DPACKET_SIZE=500 -O1    10.51 13.28
-DPACKET_SIZE=500 -O3     4.28  6.82

O(1) increment

Test

Let us now take the following file:

// c.cc
#include <array>
class Something {
public:
    Something& operator++();
    Something operator++(int);
private:
    std::array<int,PACKET_SIZE> data;
};


Something& Something::operator++()
{
    return *this;
}

Something Something::operator++(int)
{
    Something ret = *this;
    ++*this;
    return ret;
}

It does nothing in the incrementation. This simulates the case when incrementation has constant complexity.

Results

Results now vary extremely:

Flags (--std=c++0x)       ++i   i++
-DPACKET_SIZE=50 -O1      0.05   0.74
-DPACKET_SIZE=50 -O3      0.08   0.97
-DPACKET_SIZE=500 -O1     0.05   2.79
-DPACKET_SIZE=500 -O3     0.08   2.18
-DPACKET_SIZE=5000 -O3    0.07  21.90

Conclusion

Performance-wise

If you do not need the previous value, make it a habit to use pre-increment. Be consistent even with builtin types, you'll get used to it and do not run risk of suffering unecessary performance loss if you ever replace a builtin type with a custom type.

Semantic-wise

i++ says increment i, I am interested in the previous value, though.

++i says increment i, I am interested in the current value or increment i, no interest in the previous value. Again, you'll get used to it, even if you are not right now.

Knuth.

Premature optimization is the root of all evil. As is premature pessimization.

Interesting test. Now, almost two and a half years later, gcc 4.9 and Clang 3.4 show a similar trend. Clang is a bit faster with both, but the disparity between pre and postfix is worse than gcc.

What I would really like to see is a real-world example where ++i / i++ makes a difference. For instance, does it make a difference on any of the std iterators?

@JakobSchouJensen: These were pretty intended to be real world examples. Consider a large application, with complex tree structures (e.g. kd-trees, quad-trees) or large containers used in expression templates (in order to maximise data throughput on SIMD hardware). If it makes a difference there, I am not really sure why one would fallback to post-increment for specific cases if that's not needed semantic-wise.

@phresnel: I don't think operator++ is in your everyday an expression template - do you have an actual example of this? The typical use of operator++ is on integers and iterators. Thats were I think it would be interesting to know if there is any difference (there's no difference on integers of course - but iterators).

@JakobSchouJensen: No actual business example, but some number crunching applications where you count stuff. Wrt iterators, consider a ray tracer that is written in idiomatic C++ style, and you have an iterator for depth-first traversal, such that for (it=nearest(ray.origin); it!=end(); ++it) { if (auto i = intersect(ray, *it)) return i; }, never mind the actual tree structur (BSP, kd, Quadtree, Octree Grid, etc.). Such an iterator would need to maintain some state, e.g. parent node, child node, index and stuff like that. All in all, my stance is, even if only few examples exist, ...

J

James Sutherland

It's not entirely correct to say that the compiler can't optimize away the temporary variable copy in the postfix case. A quick test with VC shows that it, at least, can do that in certain cases.

In the following example, the code generated is identical for prefix and postfix, for instance:

#include <stdio.h>

class Foo
{
public:

    Foo() { myData=0; }
    Foo(const Foo &rhs) { myData=rhs.myData; }

    const Foo& operator++()
    {
        this->myData++;
        return *this;
    }

    const Foo operator++(int)
    {
        Foo tmp(*this);
        this->myData++;
        return tmp;
    }

    int GetData() { return myData; }

private:

    int myData;
};

int main(int argc, char* argv[])
{
    Foo testFoo;

    int count;
    printf("Enter loop count: ");
    scanf("%d", &count);

    for(int i=0; i<count; i++)
    {
        testFoo++;
    }

    printf("Value: %d\n", testFoo.GetData());
}

Whether you do ++testFoo or testFoo++, you'll still get the same resulting code. In fact, without reading the count in from the user, the optimizer got the whole thing down to a constant. So this:

for(int i=0; i<10; i++)
{
    testFoo++;
}

printf("Value: %d\n", testFoo.GetData());

Resulted in the following:

00401000  push        0Ah  
00401002  push        offset string "Value: %d\n" (402104h) 
00401007  call        dword ptr [__imp__printf (4020A0h)]

So while it's certainly the case that the postfix version could be slower, it may well be that the optimizer will be good enough to get rid of the temporary copy if you're not using it.

You forgot to note the important point that here everything is inlined. If the definitions of the operators is not available, the copy done in the out-of-line code cannot be avoided; with inlining the optim is quite obvious, so any compiler will do it.

H

Harshil Modi

The Google C++ Style Guide says:

Preincrement and Predecrement Use prefix form (++i) of the increment and decrement operators with iterators and other template objects. Definition: When a variable is incremented (++i or i++) or decremented (--i or i--) and the value of the expression is not used, one must decide whether to preincrement (decrement) or postincrement (decrement). Pros: When the return value is ignored, the "pre" form (++i) is never less efficient than the "post" form (i++), and is often more efficient. This is because post-increment (or decrement) requires a copy of i to be made, which is the value of the expression. If i is an iterator or other non-scalar type, copying i could be expensive. Since the two types of increment behave the same when the value is ignored, why not just always pre-increment? Cons: The tradition developed, in C, of using post-increment when the expression value is not used, especially in for loops. Some find post-increment easier to read, since the "subject" (i) precedes the "verb" (++), just like in English. Decision: For simple scalar (non-object) values there is no reason to prefer one form and we allow either. For iterators and other template types, use pre-increment.

"Decision: For simple scalar (non-object) values there is no reason to prefer one form and we allow either. For iterators and other template types, use pre-increment."

Eh, ..., and what is that something?

The mentioned link in the answer is currently broken

佚

佚名

I would like to point out an excellent post by Andrew Koenig on Code Talk very recently.

http://dobbscodetalk.com/index.php?option=com_myblog&show=Efficiency-versus-intent.html&Itemid=29

At our company also we use convention of ++iter for consistency and performance where applicable. But Andrew raises over-looked detail regarding intent vs performance. There are times when we want to use iter++ instead of ++iter.

So, first decide your intent and if pre or post does not matter then go with pre as it will have some performance benefit by avoiding creation of extra object and throwing it.

Updated link: drdobbs.com/architecture-and-design/efficiency-versus-intent/…

M

Motti

@Ketan

...raises over-looked detail regarding intent vs performance. There are times when we want to use iter++ instead of ++iter.

Obviously post and pre-increment have different semantics and I'm sure everyone agrees that when the result is used you should use the appropriate operator. I think the question is what should one do when the result is discarded (as in for loops). The answer to this question (IMHO) is that, since the performance considerations are negligible at best, you should do what is more natural. For myself ++i is more natural but my experience tells me that I'm in a minority and using i++ will cause less metal overhead for most people reading your code.

After all that's the reason the language is not called "++C".[*]

[*] Insert obligatory discussion about ++C being a more logical name.

@Motti: (joking) The C++ name is logical if you recall Bjarne Stroustrup C++ initially coded it as a pre-compiler generating a C program. Hence C++ returned an old C value. Or it may be to enhance that C++ is somewhat conceptually flawed from the beginning.

H

Hans Malherbe

++i - faster not using the return value i++ - faster using the return value

When not using the return value the compiler is guaranteed not to use a temporary in the case of ++i. Not guaranteed to be faster, but guaranteed not to be slower.

When using the return value i++ allows the processor to push both the increment and the left side into the pipeline since they don't depend on each other. ++i may stall the pipeline because the processor cannot start the left side until the pre-increment operation has meandered all the way through. Again, a pipeline stall is not guaranteed, since the processor may find other useful things to stick in.

0

0124816

Mark: Just wanted to point out that operator++'s are good candidates to be inlined, and if the compiler elects to do so, the redundant copy will be eliminated in most cases. (e.g. POD types, which iterators usually are.)

That said, it's still better style to use ++iter in most cases. :-)

2

2 revs

The performance difference between ++i and i++ will be more apparent when you think of operators as value-returning functions and how they are implemented. To make it easier to understand what's happening, the following code examples will use int as if it were a struct.

++i increments the variable, then returns the result. This can be done in-place and with minimal CPU time, requiring only one line of code in many cases:

int& int::operator++() { 
     return *this += 1;
}

But the same cannot be said of i++.

Post-incrementing, i++, is often seen as returning the original value before incrementing. However, a function can only return a result when it is finished. As a result, it becomes necessary to create a copy of the variable containing the original value, increment the variable, then return the copy holding the original value:

int int::operator++(int& _Val) {
    int _Original = _Val;
    _Val += 1;
    return _Original;
}

When there is no functional difference between pre-increment and post-increment, the compiler can perform optimization such that there is no performance difference between the two. However, if a composite data type such as a struct or class is involved, the copy constructor will be called on post-increment, and it will not be possible to perform this optimization if a deep copy is needed. As such, pre-increment generally is faster and requires less memory than post-increment.

M

Mike Dunlavey

@Mark: I deleted my previous answer because it was a bit flip, and deserved a downvote for that alone. I actually think it's a good question in the sense that it asks what's on the minds of a lot of people.

The usual answer is that ++i is faster than i++, and no doubt it is, but the bigger question is "when should you care?"

If the fraction of CPU time spent in incrementing iterators is less than 10%, then you may not care.

If the fraction of CPU time spent in incrementing iterators is greater than 10%, you can look at which statements are doing that iterating. See if you could just increment integers rather than using iterators. Chances are you could, and while it may be in some sense less desirable, chances are pretty good you will save essentially all the time spent in those iterators.

I've seen an example where the iterator-incrementing was consuming well over 90% of the time. In that case, going to integer-incrementing reduced execution time by essentially that amount. (i.e. better than 10x speedup)

T

Tim Cooper

@wilhelmtell

The compiler can elide the temporary. Verbatim from the other thread:

The C++ compiler is allowed to eliminate stack based temporaries even if doing so changes program behavior. MSDN link for VC 8:

http://msdn.microsoft.com/en-us/library/ms364057(VS.80).aspx

That's not relevant. NRVO avoids the need to copy t in "C C::operator++(int)" back to the caller, but i++ will still copy the old value on the stack of the caller. Without NRVO, i++ creates 2 copies, one to t and one back to the caller.

J

Josh

An the reason why you ought to use ++i even on built-in types where there's no performance advantage is to create a good habit for yourself.

Sorry, but that bothers me. Who says it's a "good habit", when it almost never matters? If people want to make it part of their discipline, that's fine, but let's distinguish significant reasons from matters of personal taste.

@MikeDunlavey ok, so which side do you normally use when it doesn't matter? xD it's either one or the other ain't it! the post++ (if you're using it with the general meaning. update it, return the old) is completely inferior to ++pre (update it, return) there's never any reason you'd want to have less performance. in the case where you'd want to update it after, the programmer won't even do the post++ at all then. no wasting time copying when we already have it. update it after we use it. then the compilers having the common sense you wanted it to have.

@Puddle: When I hear this: "there's never any reason you'd want to have less performance" I know I'm hearing "penny wise - pound foolish". You need to have an appreciation of the magnitudes involved. Only if this accounts for more than 1% of the time involved should you even give it a thought. Usually, if you're thinking about this, there are million-times larger problems you're not considering, and this is what makes software much much slower than it could be.

@MikeDunlavey regurgitated nonsense to satisfy your ego. you're trying to sound like some all wise monk, yet you're saying nothing. the magnitudes involved... if only over 1% of the time you should care... xD absolute dribble. if it's inefficient, it's worth knowing about and fixing. we're here pondering this for that exact reason! we're not concerned about how much we may gain from this knowledge. and when i said you'd not want less performance, go ahead, explain one damn scenario then. MR WISE!

2

2 revs, 2 users 95%

Both are as fast ;) If you want it is the same calculation for the processor, it's just the order in which it is done that differ.

For example, the following code :

#include <stdio.h>

int main()
{
    int a = 0;
    a++;
    int b = 0;
    ++b;
    return 0;
}

Produce the following assembly :

0x0000000100000f24 : push %rbp 0x0000000100000f25 : mov %rsp,%rbp 0x0000000100000f28 : movl $0x0,-0x4(%rbp) 0x0000000100000f2f : incl -0x4(%rbp) 0x0000000100000f32 : movl $0x0,-0x8(%rbp) 0x0000000100000f39 : incl -0x8(%rbp) 0x0000000100000f3c : mov $0x0,%eax 0x0000000100000f41 : leaveq 0x0000000100000f42 : retq

You see that for a++ and b++ it's an incl mnemonic, so it's the same operation ;)

It's C, while OP asked C++. In C it is the same. In C++ the faster is ++i; due to its object. However some compilers may optimize the post-increment operator.

2

2 revs

The intended question was about when the result is unused (that's clear from the question for C). Can somebody fix this since the question is "community wiki"?

About premature optimizations, Knuth is often quoted. That's right. but Donald Knuth would never defend with that the horrible code which you can see in these days. Ever seen a = b + c among Java Integers (not int)? That amounts to 3 boxing/unboxing conversions. Avoiding stuff like that is important. And uselessly writing i++ instead of ++i is the same mistake. EDIT: As phresnel nicely puts it in a comment, this can be summed up as "premature optimization is evil, as is premature pessimization".

Even the fact that people are more used to i++ is an unfortunate C legacy, caused by a conceptual mistake by K&R (if you follow the intent argument, that's a logical conclusion; and defending K&R because they're K&R is meaningless, they're great, but they aren't great as language designers; countless mistakes in the C design exist, ranging from gets() to strcpy(), to the strncpy() API (it should have had the strlcpy() API since day 1)).

Btw, I'm one of those not used enough to C++ to find ++i annoying to read. Still, I use that since I acknowledge that it's right.

I see you're working on a Ph.D. with interest in compiler optimization and things of that sort. That's great, but don't forget academia is an echo chamber, and common sense often gets left outside the door, at least in C.S. You might be interested in this: stackoverflow.com/questions/1303899/…

I never found ++i more annoying than i++ (in fact, I found it cooler), but the rest of your post gets my full acknowledgement. Maybe add a point "premature optimization is evil, as is premature pessimization"

strncpy served a purpose in the filesystems they were using at the time; the filename was an 8-character buffer and it did not have to be null-terminated. You can't blame them for not seeing 40 years into the future of language evolution.

@MattMcNabb: weren't 8 characters filename a MS-DOS exclusive? C was invented with Unix. Anyway, even if strncpy had a point, the lack of strlcpy wasn't fully justified: even original C had arrays which you shouldn't overflow, which needed strlcpy; at most, they were only missing attackers intent on exploiting the bugs. But one can't say that forecasting this problem was trivial, so if I rewrote my post I wouldn't use the same tone.

@Blaisorblade: As I recall, early UNIX file names were limited to 14 characters. The lack of strlcpy() was justified by the fact that it hadn't been invented yet.

T

Tristan

Since you asked for C++ too, here is a benchmark for java (made with jmh) :

private static final int LIMIT = 100000;

@Benchmark
public void postIncrement() {
    long a = 0;
    long b = 0;
    for (int i = 0; i < LIMIT; i++) {
        b = 3;
        a += i * (b++);
    }
    doNothing(a, b);
}

@Benchmark
public void preIncrement() {
    long a = 0;
    long b = 0;
    for (int i = 0; i < LIMIT; i++) {
        b = 3;
        a += i * (++b);
    }
    doNothing(a, b);
}

The result shows, even when the value of the incremented variable (b) is actually used in some computation, forcing the need to store an additional value in case of post-increment, the time per operation is exactly the same :

Benchmark                         Mode  Cnt  Score   Error  Units
IncrementBenchmark.postIncrement  avgt   10  0,039   0,001  ms/op
IncrementBenchmark.preIncrement   avgt   10  0,039   0,001  ms/op

S

Siddhant Jain

++i is faster than i = i +1 because in i = i + 1 two operation are taking place, first increment and second assigning it to a variable. But in i++ only increment operation is taking place.

S

Severin Pappadeux

Time to provide folks with gems of wisdom ;) - there is simple trick to make C++ postfix increment behave pretty much the same as prefix increment (Invented this for myself, but the saw it as well in other people code, so I'm not alone).

Basically, trick is to use helper class to postpone increment after the return, and RAII comes to rescue

#include <iostream>

class Data {
    private: class DataIncrementer {
        private: Data& _dref;

        public: DataIncrementer(Data& d) : _dref(d) {}

        public: ~DataIncrementer() {
            ++_dref;
        }
    };

    private: int _data;

    public: Data() : _data{0} {}

    public: Data(int d) : _data{d} {}

    public: Data(const Data& d) : _data{ d._data } {}

    public: Data& operator=(const Data& d) {
        _data = d._data;
        return *this;
    }

    public: ~Data() {}

    public: Data& operator++() { // prefix
        ++_data;
        return *this;
    }

    public: Data operator++(int) { // postfix
        DataIncrementer t(*this);
        return *this;
    }

    public: operator int() {
        return _data;
    }
};

int
main() {
    Data d(1);

    std::cout <<   d << '\n';
    std::cout << ++d << '\n';
    std::cout <<   d++ << '\n';
    std::cout << d << '\n';

    return 0;
}

Invented is for some heavy custom iterators code, and it cuts down run-time. Cost of prefix vs postfix is one reference now, and if this is custom operator doing heavy moving around, prefix and postfix yielded the same run-time for me.

4

4 revs, 3 users 73%

++i is faster than i++ because it doesn't return an old copy of the value.

It's also more intuitive:

x = i++;  // x contains the old value of i
y = ++i;  // y contains the new value of i

This C example prints "02" instead of the "12" you might expect:

#include <stdio.h>

int main(){
    int a = 0;
    printf("%d", a++);
    printf("%d", ++a);
    return 0;
}

Same for C++:

#include <iostream>
using namespace std;

int main(){
    int a = 0;
    cout << a++;
    cout << ++a;
    return 0;
}

I don't think the answer(er) has any clue of either what the op wants or what word faster means..

Is there a performance difference between i++ and ++i in C++?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Links

Contact US