The yield keyword is one of those keywords in C# that continues to mystify me, and I've never been confident that I'm using it correctly.
Of the following two pieces of code, which is the preferred and why?
Version 1: Using yield return
public static IEnumerable<Product> GetAllProducts()
{
using (AdventureWorksEntities db = new AdventureWorksEntities())
{
var products = from product in db.Product
select product;
foreach (Product product in products)
{
yield return product;
}
}
}
Version 2: Return the list
public static IEnumerable<Product> GetAllProducts()
{
using (AdventureWorksEntities db = new AdventureWorksEntities())
{
var products = from product in db.Product
select product;
return products.ToList<Product>();
}
}
yield
is tied to IEnumerable<T>
and its kind. It's in someway lazy evaluation
yield return
if the code that iterates through the results of GetAllProducts()
allows the user a chance to prematurely cancel the processing.
I tend to use yield-return when I calculate the next item in the list (or even the next group of items).
Using your Version 2, you must have the complete list before returning. By using yield-return, you really only need to have the next item before returning.
Among other things, this helps spread the computational cost of complex calculations over a larger time-frame. For example, if the list is hooked up to a GUI and the user never goes to the last page, you never calculate the final items in the list.
Another case where yield-return is preferable is if the IEnumerable represents an infinite set. Consider the list of Prime Numbers, or an infinite list of random numbers. You can never return the full IEnumerable at once, so you use yield-return to return the list incrementally.
In your particular example, you have the full list of products, so I'd use Version 2.
Populating a temporary list is like downloading the whole video, whereas using yield
is like streaming that video.
As a conceptual example for understanding when you ought to use yield
, let's say the method ConsumeLoop()
processes the items returned/yielded by ProduceList()
:
void ConsumeLoop() {
foreach (Consumable item in ProduceList()) // might have to wait here
item.Consume();
}
IEnumerable<Consumable> ProduceList() {
while (KeepProducing())
yield return ProduceExpensiveConsumable(); // expensive
}
Without yield
, the call to ProduceList()
might take a long time because you have to complete the list before returning:
//pseudo-assembly
Produce consumable[0] // expensive operation, e.g. disk I/O
Produce consumable[1] // waiting...
Produce consumable[2] // waiting...
Produce consumable[3] // completed the consumable list
Consume consumable[0] // start consuming
Consume consumable[1]
Consume consumable[2]
Consume consumable[3]
Using yield
, it becomes rearranged, sort of interleaved:
//pseudo-assembly
Produce consumable[0]
Consume consumable[0] // immediately yield & Consume
Produce consumable[1] // ConsumeLoop iterates, requesting next item
Consume consumable[1] // consume next
Produce consumable[2]
Consume consumable[2] // consume next
Produce consumable[3]
Consume consumable[3] // consume next
And lastly, as many before have already suggested, you should use Version 2 because you already have the completed list anyway.
I know this is an old question, but I'd like to offer one example of how the yield keyword can be creatively used. I have really benefited from this technique. Hopefully this will be of assistance to anyone else who stumbles upon this question.
Note: Don't think about the yield keyword as merely being another way to build a collection. A big part of the power of yield comes in the fact that execution is paused in your method or property until the calling code iterates over the next value. Here's my example:
Using the yield keyword (alongside Rob Eisenburg's Caliburn.Micro coroutines implementation) allows me to express an asynchronous call to a web service like this:
public IEnumerable<IResult> HandleButtonClick() {
yield return Show.Busy();
var loginCall = new LoginResult(wsClient, Username, Password);
yield return loginCall;
this.IsLoggedIn = loginCall.Success;
yield return Show.NotBusy();
}
What this will do is turn my BusyIndicator on, call the Login method on my web service, set my IsLoggedIn flag to the return value, and then turn the BusyIndicator back off.
Here's how this works: IResult has an Execute method and a Completed event. Caliburn.Micro grabs the IEnumerator from the call to HandleButtonClick() and passes it into a Coroutine.BeginExecute method. The BeginExecute method starts iterating through the IResults. When the first IResult is returned, execution is paused inside HandleButtonClick(), and BeginExecute() attaches an event handler to the Completed event and calls Execute(). IResult.Execute() can perform either a synchronous or an asynchronous task and fires the Completed event when it's done.
LoginResult looks something like this:
public LoginResult : IResult {
// Constructor to set private members...
public void Execute(ActionExecutionContext context) {
wsClient.LoginCompleted += (sender, e) => {
this.Success = e.Result;
Completed(this, new ResultCompletionEventArgs());
};
wsClient.Login(username, password);
}
public event EventHandler<ResultCompletionEventArgs> Completed = delegate { };
public bool Success { get; private set; }
}
It may help to set up something like this and step through the execution to watch what's going on.
Hope this helps someone out! I've really enjoyed exploring the different ways yield can be used.
yield
in this way. It seems like an elegant way to emulate the async/await pattern (which I assume would be used instead of yield
if this were rewritten today). Have you found that these creative uses of yield
have yielded (no pun intended) diminishing returns over the years as C# has evolved since you answered this question? Or are you still coming up with modernized clever use-cases such as this? And if so, would you mind sharing another interesting scenario for us?
Yield return can be very powerful for algorithms where you need to iterate through millions of objects. Consider the following example where you need to calculate possible trips for ride sharing. First we generate possible trips:
static IEnumerable<Trip> CreatePossibleTrips()
{
for (int i = 0; i < 1000000; i++)
{
yield return new Trip
{
Id = i.ToString(),
Driver = new Driver { Id = i.ToString() }
};
}
}
Then iterate through each trip:
static void Main(string[] args)
{
foreach (var trip in CreatePossibleTrips())
{
// possible trip is actually calculated only at this point, because of yield
if (IsTripGood(trip))
{
// match good trip
}
}
}
If you use List instead of yield, you will need to allocation 1 million objects to memory (~190mb) and this simple example will take ~1400ms to run. However, if you use yield, you don't need to put all these temp objects to memory and you will get significantly faster algorithm speed: this example will take only ~400ms to run with no memory consumption at all.
yield
works under the covers by implementing a state machine internally. Here's an SO answer with 3 detailed MSDN blog posts that explain the implementation in great detail. Written by Raymond Chen @ MSFT
This is what Chris Sells tells about those statements in The C# Programming Language;
I sometimes forget that yield return is not the same as return , in that the code after a yield return can be executed. For example, the code after the first return here can never be executed: int F() {
return 1;
return 2; // Can never be executed
}
In contrast, the code after the first yield return here can be executed: IEnumerable
F().Any()
- this will return after trying to enumerate the first result only. In general, you shouldn't rely on an IEnumerable yield
to change program state, because it may not actually get triggered
The two pieces of code are really doing two different things. The first version will pull members as you need them. The second version will load all the results into memory before you start to do anything with it.
There's no right or wrong answer to this one. Which one is preferable just depends on the situation. For example, if there's a limit of time that you have to complete your query and you need to do something semi-complicated with the results, the second version could be preferable. But beware large resultsets, especially if you're running this code in 32-bit mode. I've been bitten by OutOfMemory exceptions several times when doing this method.
The key thing to keep in mind is this though: the differences are in efficiency. Thus, you should probably go with whichever one makes your code simpler and change it only after profiling.
Yield has two great uses
It helps to provide custom iteration with out creating temp collections. ( loading all data and looping)
It helps to do stateful iteration. ( streaming)
Below is a simple video which i have created with full demonstration in order to support the above two points
http://www.youtube.com/watch?v=4fju3xcm21M
Assuming your products LINQ class uses a similar yield for enumerating/iterating, the first version is more efficient because its only yielding one value each time its iterated over.
The second example is converting the enumerator/iterator to a list with the ToList() method. This means it manually iterates over all the items in the enumerator and then returns a flat list.
This is kinda besides the point, but since the question is tagged best-practices I'll go ahead and throw in my two cents. For this type of thing I greatly prefer to make it into a property:
public static IEnumerable<Product> AllProducts
{
get {
using (AdventureWorksEntities db = new AdventureWorksEntities()) {
var products = from product in db.Product
select product;
return products;
}
}
}
Sure, it's a little more boiler-plate, but the code that uses this will look much cleaner:
prices = Whatever.AllProducts.Select (product => product.price);
vs
prices = Whatever.GetAllProducts().Select (product => product.price);
Note: I wouldn't do this for any methods that may take a while to do their work.
And what about this?
public static IEnumerable<Product> GetAllProducts()
{
using (AdventureWorksEntities db = new AdventureWorksEntities())
{
var products = from product in db.Product
select product;
return products.ToList();
}
}
I guess this is much cleaner. I do not have VS2008 at hand to check, though. In any case, if Products implements IEnumerable (as it seems to - it is used in a foreach statement), I'd return it directly.
I would have used version 2 of the code in this case. Since you have the full-list of products available and that's what expected by the "consumer" of this method call, it would be required to send the complete information back to the caller.
If caller of this method requires "one" information at a time and the consumption of the next information is on-demand basis, then it would be beneficial to use yield return which will make sure the command of execution will be returned to the caller when a unit of information is available.
Some examples where one could use yield return is:
Complex, step-by-step calculation where caller is waiting for data of a step at a time Paging in GUI - where user might never reach to the last page and only sub-set of information is required to be disclosed on current page
To answer your questions, I would have used the version 2.
Return the list directly. Benefits:
It's more clear
The list is reusable. (the iterator is not) not actually true, Thanks Jon
You should use the iterator (yield) from when you think you probably won't have to iterate all the way to the end of the list, or when it has no end. For example, the client calling is going to be searching for the first product that satisfies some predicate, you might consider using the iterator, although that's a contrived example, and there are probably better ways to accomplish it. Basically, if you know in advance that the whole list will need to be calculated, just do it up front. If you think that it won't, then consider using the iterator version.
Given the exact two code snippets, I think Version 1 is the better one as it can be more efficient. Let's say there are a lot of products and the caller wants to convert to DTOs.
var dtos = GetAllProducts().Select(ConvertToDto).ToList();
With Version 2 first a list of Product objects would be created, and then another list of ProductDto objects. With Version 1 there is no list of Product objects, only the list of the required ProductDto objects gets built.
Even without converting, Version 2 has a problem in my opinion: The list is returned as IEnumerable. The caller of GetAllProducts() does not know how expensive the enumeration of the result is. And if the caller needs to iterate more than once, she will probably materialize once by using ToList() (tools like ReSharper also suggest this). Which results in an unnecessary copy of the list already created in GetAllProducts(). So if Version 2 should be used, the return type should be List and not IEnumerable.
The usage of yield is similar to the keyword return, except that it will return a generator. And the generator object will only traverse once.
yield has two benefits:
You do not need to read these values twice; You can get many child nodes but do not have to put them all in memory.
There is another clear explanation maybe help you.
Success story sharing
Yield return
seems to be shorthand for writing your own custom iterator class (implement IEnumerator). Hence, the mentioned benefits also apply to custom iterator classes. Anyway, both constructs keep intermediate state. In its most simple form it's about holding a reference to the current object.