ChatGPT解决这个技术问题 Extra ChatGPT

LINQ's Distinct() on a particular property

I am playing with LINQ to learn about it, but I can't figure out how to use Distinct when I do not have a simple list (a simple list of integers is pretty easy to do, this is not the question). What I if want to use Distinct on a list of an Object on one or more properties of the object?

Example: If an object is Person, with Property Id. How can I get all Person and use Distinct on them with the property Id of the object?

Person1: Id=1, Name="Test1"
Person2: Id=1, Name="Test1"
Person3: Id=2, Name="Test2"

How can I get just Person1 and Person3? Is that possible?

If it's not possible with LINQ, what would be the best way to have a list of Person depending on some of its properties in .NET 3.5?


F
Frederik Struck-Schøning

What if I want to obtain a distinct list based on one or more properties?

Simple! You want to group them and pick a winner out of the group.

List<Person> distinctPeople = allPeople
  .GroupBy(p => p.PersonId)
  .Select(g => g.First())
  .ToList();

If you want to define groups on multiple properties, here's how:

List<Person> distinctPeople = allPeople
  .GroupBy(p => new {p.PersonId, p.FavoriteColor} )
  .Select(g => g.First())
  .ToList();

Note: Certain query providers are unable to resolve that each group must have at least one element, and that First is the appropriate method to call in that situation. If you find yourself working with such a query provider, FirstOrDefault may help get your query through the query provider.

Note2: Consider this answer for an EF Core (prior to EF Core 6) compatible approach. https://stackoverflow.com/a/66529949/8155


@ErenErsonmez sure. With my posted code, if deferred execution is desired, leave off the ToList call.
Very nice answer! Realllllly helped me in Linq-to-Entities driven from a sql view where I couldn't modify the view. I needed to use FirstOrDefault() rather than First() - all is good.
I tried it and it should change to Select(g => g.FirstOrDefault())
@ChocapicSz Nope. Both Single() and SingleOrDefault() each throw when the source has more than one item. In this operation, we expect the possibility that each group may have more then one item. For that matter, First() is preferred over FirstOrDefault() because each group must have at least one member.... unless you're using EntityFramework, which can't figure out that each group has at least one member and demands FirstOrDefault().
Seems to not be currently supported in EF Core, even using FirstOrDefault() github.com/dotnet/efcore/issues/12088 I am on 3.1, and I get "unable to translate" errors.
T
Thijs

EDIT: This is now part of MoreLINQ.

What you need is a "distinct-by" effectively. I don't believe it's part of LINQ as it stands, although it's fairly easy to write:

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
    (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> seenKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (seenKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

So to find the distinct values using just the Id property, you could use:

var query = people.DistinctBy(p => p.Id);

And to use multiple properties, you can use anonymous types, which implement equality appropriately:

var query = people.DistinctBy(p => new { p.Id, p.Name });

Untested, but it should work (and it now at least compiles).

It assumes the default comparer for the keys though - if you want to pass in an equality comparer, just pass it on to the HashSet constructor.


@ashes999: I'm not sure what you mean. The code is present in the answer and in the library - depending on whether you're happy to take on a dependency.
@ashes999: If you're only doing this in a single place, ever, then sure, using GroupBy is simpler. If you need it in more than one place, it's much cleaner (IMO) to encapsulate the intention.
@MatthewWhited: Given that there's no mention of IQueryable<T> here, I don't see how it's relevant. I agree that this wouldn't be suitable for EF etc, but within LINQ to Objects I think it's more suitable than GroupBy. The context of the question is always important.
The project moved on github, here's the code of DistinctBy: github.com/morelinq/MoreLINQ/blob/master/MoreLinq/DistinctBy.cs
I think this is a superior solution to the numerous GroupBy()/group by/ToLookup() answers because, like Distinct(), this is able to yield an element as soon as it's encountered (the first time), whereas those other methods can't return anything until the entire input sequence has been consumed. I think that's an important, er, distinction worth pointing out in the answer. Also, as far as memory, by the final element this HashSet<> will be storing only unique elements, whereas the other methods will somewhere be storing unique groups with unique + duplicates elements.
S
Sheridan

Use:

List<Person> pList = new List<Person>();
/* Fill list */

var result = pList.Where(p => p.Name != null).GroupBy(p => p.Id)
    .Select(grp => grp.FirstOrDefault());

The where helps you filter the entries (could be more complex) and the groupby and select perform the distinct function.


Perfect, and works without extending Linq or using another dependency.
b
burnttoast11

You could also use query syntax if you want it to look all LINQ-like:

var uniquePeople = from p in people
                   group p by new {p.ID} //or group by new {p.ID, p.Name, p.Whatever}
                   into mygroup
                   select mygroup.FirstOrDefault();

Hmm my thoughts are both the query syntax and the fluent API syntax are just as LINQ like as each other and its just preference over which ones people use. I myself prefer the fluent API so I would consider that more LINK-Like but then I guess that's subjective
LINQ-Like has nothing to do with preference, being "LINQ-like" has to do with looking like a different query language being embedded into C#, I prefer the fluent interface, coming from java streams, but it is NOT LINQ-Like.
Excellent!! You are my hero!
I
Ivan

I think it is enough:

list.Select(s => s.MyField).Distinct();

What if he needs back his full object, not just that particular field?
What exactly object of the several objects that have the same property value?
H
Himanshu

Solution first group by your fields then select FirstOrDefault item.

List<Person> distinctPeople = allPeople
.GroupBy(p => p.PersonId)
.Select(g => g.FirstOrDefault())
.ToList();

P
Peter Mortensen

You can do this with the standard Linq.ToLookup(). This will create a collection of values for each unique key. Just select the first item in the collection

Persons.ToLookup(p => p.Id).Select(coll => coll.First());

T
Theodor Zoulias

Starting with .NET 6, there is new solution using the new DistinctBy() extension in Linq, so we can do:

var distinctPersonsById = personList.DistinctBy(x => x.Id);

The signature of the DistinctBy method:

// Returns distinct elements from a sequence according to a specified
// key selector function.
public static IEnumerable<TSource> DistinctBy<TSource, TKey> (
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector);

C
Contango

The following code is functionally equivalent to Jon Skeet's answer.

Tested on .NET 4.5, should work on any earlier version of LINQ.

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
  this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
  HashSet<TKey> seenKeys = new HashSet<TKey>();
  return source.Where(element => seenKeys.Add(keySelector(element)));
}

Incidentially, check out Jon Skeet's latest version of DistinctBy.cs on Google Code.

Update 2022-04-03

Based on an comment by Andrew McClement, best to take John Skeet's answer over this one.


This gave me a "sequence has no values error", but Skeet's answer produced the correct result.
To clarify why this is not equivalent to Jon Skeet's answer - the difference only happens if you reuse the same enumerable. If you reuse the enumerable from this answer, the HashSet is already filled, so no elements are returned (all keys have been seen). For Skeet's answer, since it uses yield return, it creates a new HashSet every time the enumerable is iterated.
@AndrewMcClement Agree. Updated answer.
N
Nowhere Man

I've written an article that explains how to extend the Distinct function so that you can do as follows:

var people = new List<Person>();

people.Add(new Person(1, "a", "b"));
people.Add(new Person(2, "c", "d"));
people.Add(new Person(1, "a", "b"));

foreach (var person in people.Distinct(p => p.ID))
    // Do stuff with unique list here.

Here's the article (now in the Web Archive): Extending LINQ - Specifying a Property in the Distinct Function


Your article has an error, there should be a after Distinct: public static IEnumerable Distinct(this... Also it does not look like it will work (nicely) on more that one property i.e. a combination of first and last names.
Please, don't post the relevant information in external link, an answer must stand on its own. It's ok to post the link, but please, copy the relevant info to the answer itself. You only posted an usage example, but without the external resource it's useless.
J
Joel

Personally I use the following class:

public class LambdaEqualityComparer<TSource, TDest> : 
    IEqualityComparer<TSource>
{
    private Func<TSource, TDest> _selector;

    public LambdaEqualityComparer(Func<TSource, TDest> selector)
    {
        _selector = selector;
    }

    public bool Equals(TSource obj, TSource other)
    {
        return _selector(obj).Equals(_selector(other));
    }

    public int GetHashCode(TSource obj)
    {
        return _selector(obj).GetHashCode();
    }
}

Then, an extension method:

public static IEnumerable<TSource> Distinct<TSource, TCompare>(
    this IEnumerable<TSource> source, Func<TSource, TCompare> selector)
{
    return source.Distinct(new LambdaEqualityComparer<TSource, TCompare>(selector));
}

Finally, the intended usage:

var dates = new List<DateTime>() { /* ... */ }
var distinctYears = dates.Distinct(date => date.Year);

The advantage I found using this approach is the re-usage of LambdaEqualityComparer class for other methods that accept an IEqualityComparer. (Oh, and I leave the yield stuff to the original LINQ implementation...)


H
Harry .Naeem

You can use DistinctBy() for getting Distinct records by an object property. Just add the following statement before using it:

using Microsoft.Ajax.Utilities;

and then use it like following:

var listToReturn = responseList.DistinctBy(x => x.Index).ToList();

where 'Index' is the property on which i want the data to be distinct.


m
mqp

You can do it (albeit not lightning-quickly) like so:

people.Where(p => !people.Any(q => (p != q && p.Id == q.Id)));

That is, "select all people where there isn't another different person in the list with the same ID."

Mind you, in your example, that would just select person 3. I'm not sure how to tell which you want, out of the previous two.


P
Peter Mortensen

In case you need a Distinct method on multiple properties, you can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;

This is how you use it:

using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);

P
Peter Mortensen

When we faced such a task in our project we defined a small API to compose comparators.

So, the use case was like this:

var wordComparer = KeyEqualityComparer.Null<Word>().
    ThenBy(item => item.Text).
    ThenBy(item => item.LangID);
...
source.Select(...).Distinct(wordComparer);

And API itself looks like this:

using System;
using System.Collections;
using System.Collections.Generic;

public static class KeyEqualityComparer
{
    public static IEqualityComparer<T> Null<T>()
    {
        return null;
    }

    public static IEqualityComparer<T> EqualityComparerBy<T, K>(
        this IEnumerable<T> source,
        Func<T, K> keyFunc)
    {
        return new KeyEqualityComparer<T, K>(keyFunc);
    }

    public static KeyEqualityComparer<T, K> ThenBy<T, K>(
        this IEqualityComparer<T> equalityComparer,
        Func<T, K> keyFunc)
    {
        return new KeyEqualityComparer<T, K>(keyFunc, equalityComparer);
    }
}

public struct KeyEqualityComparer<T, K>: IEqualityComparer<T>
{
    public KeyEqualityComparer(
        Func<T, K> keyFunc,
        IEqualityComparer<T> equalityComparer = null)
    {
        KeyFunc = keyFunc;
        EqualityComparer = equalityComparer;
    }

    public bool Equals(T x, T y)
    {
        return ((EqualityComparer == null) || EqualityComparer.Equals(x, y)) &&
                EqualityComparer<K>.Default.Equals(KeyFunc(x), KeyFunc(y));
    }

    public int GetHashCode(T obj)
    {
        var hash = EqualityComparer<K>.Default.GetHashCode(KeyFunc(obj));

        if (EqualityComparer != null)
        {
            var hash2 = EqualityComparer.GetHashCode(obj);

            hash ^= (hash2 << 5) + hash2;
        }

        return hash;
    }

    public readonly Func<T, K> KeyFunc;
    public readonly IEqualityComparer<T> EqualityComparer;
}

More details is on our site: IEqualityComparer in LINQ.


C
Caspian Canuck

If you don't want to add the MoreLinq library to your project just to get the DistinctBy functionality then you can get the same end result using the overload of Linq's Distinct method that takes in an IEqualityComparer argument.

You begin by creating a generic custom equality comparer class that uses lambda syntax to perform custom comparison of two instances of a generic class:

public class CustomEqualityComparer<T> : IEqualityComparer<T>
{
    Func<T, T, bool> _comparison;
    Func<T, int> _hashCodeFactory;

    public CustomEqualityComparer(Func<T, T, bool> comparison, Func<T, int> hashCodeFactory)
    {
        _comparison = comparison;
        _hashCodeFactory = hashCodeFactory;
    }

    public bool Equals(T x, T y)
    {
        return _comparison(x, y);
    }

    public int GetHashCode(T obj)
    {
        return _hashCodeFactory(obj);
    }
}

Then in your main code you use it like so:

Func<Person, Person, bool> areEqual = (p1, p2) => int.Equals(p1.Id, p2.Id);

Func<Person, int> getHashCode = (p) => p.Id.GetHashCode();

var query = people.Distinct(new CustomEqualityComparer<Person>(areEqual, getHashCode));

Voila! :)

The above assumes the following:

Property Person.Id is of type int

The people collection does not contain any null elements

If the collection could contain nulls then simply rewrite the lambdas to check for null, e.g.:

Func<Person, Person, bool> areEqual = (p1, p2) => 
{
    return (p1 != null && p2 != null) ? int.Equals(p1.Id, p2.Id) : false;
};

EDIT

This approach is similar to the one in Vladimir Nesterovsky's answer but simpler.

It is also similar to the one in Joel's answer but allows for complex comparison logic involving multiple properties.

However, if your objects can only ever differ by Id then another user gave the correct answer that all you need to do is override the default implementations of GetHashCode() and Equals() in your Person class and then just use the out-of-the-box Distinct() method of Linq to filter out any duplicates.


I want to get only unique items in dictonary, Can you please help, I am using this code If TempDT IsNot Nothing Then m_ConcurrentScriptDictionary = TempDT.AsEnumerable.ToDictionary(Function(x) x.SafeField(fldClusterId, NULL_ID_VALUE), Function(y) y.SafeField(fldParamValue11, NULL_ID_VALUE))
W
Waldemar Gałęzinowski

Override Equals(object obj) and GetHashCode() methods:

class Person
{
    public int Id { get; set; }
    public int Name { get; set; }

    public override bool Equals(object obj)
    {
        return ((Person)obj).Id == Id;
        // or: 
        // var o = (Person)obj;
        // return o.Id == Id && o.Name == Name;
    }
    public override int GetHashCode()
    {
        return Id.GetHashCode();
    }
}

and then just call:

List<Person> distinctList = new[] { person1, person2, person3 }.Distinct().ToList();

However GetHashCode() should be more advanced (to count also the Name), this answer is probably best by my opinion. Actually, to archive the target logic, there no need to override the GetHashCode(), Equals() is enough, but if we need performance, we have to override it. All comparison algs, first check hash, and if they are equal then call Equals().
Also, there in Equals() the first line should be "if (!(obj is Person)) return false". But best practice is to use separate object casted to a type, like "var o = obj as Person;if (o==null) return false;" then check equality with o without casting
Overriding Equals like this is not a good idea as it could have unintended consequences for other programmers expecting the Person's Equality to be determined on more than a single property.
C
Community

The best way to do this that will be compatible with other .NET versions is to override Equals and GetHash to handle this (see Stack Overflow question This code returns distinct values. However, what I want is to return a strongly typed collection as opposed to an anonymous type), but if you need something that is generic throughout your code, the solutions in this article are great.


A
Arindam
List<Person>lst=new List<Person>
        var result1 = lst.OrderByDescending(a => a.ID).Select(a =>new Player {ID=a.ID,Name=a.Name} ).Distinct();

Did you mean to Select() new Person instead of new Player? The fact that you are ordering by ID doesn't somehow inform Distinct() to use that property in determining uniqueness, though, so this won't work.
G
GWLlosa

You should be able to override Equals on person to actually do Equals on Person.id. This ought to result in the behavior you're after.


I wouldn't recommend this approach. While it might work in this specific case, it's simply bad practice. What if he wants to distinct by a different property somewhere else? For sure he can't override Equals twice, can he? :-) Apart from that, it's fundamentally wrong to override equals for this purpose, since it's meant to tell whether two objects are equal or not. If the classes condition for equality changes for any reason, you will burn your fingers for sure...
T
TOL

If you use old .NET version, where the extension method is not built-in, then you may define your own extension method:

public static class EnumerableExtensions
{
    public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> enumerable, Func<T, TKey> keySelector)
    {
        return enumerable.GroupBy(keySelector).Select(grp => grp.First());
    }
}

Example of usage:

var personsDist = persons.DistinctBy(item => item.Name);

How does this improve the accepted answer that offers the same extension method, slightly differently implemented?
It's shorter at least. And it's not slightly, it's differently implemented.
And not better. The accepted answer is much better. Why offer an inferior solution? New answers to old questions are supposed to be significant improvements to what's already there.
B
Bose_geek

May be this could help, Try this. Using HashSet is more code performant.

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    var known = new HashSet<TKey>();
    return source.Where(element => known.Add(keySelector(element)));
}

A
Alien

Please give a try with below code.

var Item = GetAll().GroupBy(x => x .Id).ToList();

A short answer is welcome, however it won't provide much value to the latter users who are trying to understand what's going on behind the problem. Please spare some time to explain what's the real issue to cause the problem and how to solve. Thank you ~