ChatGPT解决这个技术问题 Extra ChatGPT

The JPA hashCode() / equals() dilemma

There have been some discussions here about JPA entities and which hashCode()/equals() implementation should be used for JPA entity classes. Most (if not all) of them depend on Hibernate, but I'd like to discuss them JPA-implementation-neutrally (I am using EclipseLink, by the way).

All possible implementations are having their own advantages and disadvantages regarding:

hashCode()/equals() contract conformity (immutability) for List/Set operations

Whether identical objects (e.g. from different sessions, dynamic proxies from lazily-loaded data structures) can be detected

Whether entities behave correctly in detached (or non-persisted) state

As far I can see, there are three options:

Do not override them; rely on Object.equals() and Object.hashCode() hashCode()/equals() work cannot identify identical objects, problems with dynamic proxies no problems with detached entities Override them, based on the primary key hashCode()/equals() are broken correct identity (for all managed entities) problems with detached entities Override them, based on the Business-Id (non-primary key fields; what about foreign keys?) hashCode()/equals() are broken correct identity (for all managed entities) no problems with detached entities

My questions are:

Did I miss an option and/or pro/con point? What option did you choose and why?

UPDATE 1:

By "hashCode()/equals() are broken", I mean that successive hashCode() invocations may return differing values, which is (when correctly implemented) not broken in the sense of the Object API documentation, but which causes problems when trying to retrieve a changed entity from a Map, Set or other hash-based Collection. Consequently, JPA implementations (at least EclipseLink) will not work correctly in some cases.

UPDATE 2:

Thank you for your answers -- most of them have remarkable quality. Unfortunately, I am still unsure which approach will be the best for a real-life application, or how to determine the best approach for my application. So, I'll keep the question open and hope for some more discussions and/or opinions.

I don't understand what do you mean by "hashCode()/equals() broken"
They wouldn't be "broken" in that sense then, as in option 2 and 3 you would be implementing both equals() and hashCode() using the same strategy.
That is not true of option 3. hashCode() and equals() should be using the same criteria, therefore if one of your fields change, yes the hashcode() method will return a different value for the same instance than it previous did, but so will equals(). You've left off the second part of the sentence from the hashcode() javadoc: Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
Actually that part of the sentence means the opposite - calling hashcode() on the same object instance should return the same value, unless any fields used in the equals() implementation change. In other words, if you have three fields in your class and your equals() method uses only two of them to determine equality of instances, then you can expect the hashcode() return value to change if you change one of those field's values - which makes sense when you consider that this object instance is no longer "equal" to the value that the old instance represented.
"problems when trying to retrieve a changed entity from a Map, Set or other hash-based Collections"... this should be "problems when trying to retrieve a changed entity from a HashMap, HashSet or other hash-based Collections"

a
andref

Read this very nice article on the subject: Don't Let Hibernate Steal Your Identity.

The conclusion of the article goes like this:

Object identity is deceptively hard to implement correctly when objects are persisted to a database. However, the problems stem entirely from allowing objects to exist without an id before they are saved. We can solve these problems by taking the responsibility of assigning object IDs away from object-relational mapping frameworks such as Hibernate. Instead, object IDs can be assigned as soon as the object is instantiated. This makes object identity simple and error-free, and reduces the amount of code needed in the domain model.


No, that is not a nice article. That is a freaking great article on the subject, and it should be required reading for every JPA programmer! +1!
Yup I am using the same solution. Not letting the DB generate the ID has other advantages too, such as being able to create an object and already create other objects that reference it before persisting it. This can remove latency and multiple request/response cycles in client-server apps. If you need inspiration for such a solution, check out my projects: suid.js and suid-server-java. Basically suid.js fetches ID blocks from suid-server-java which you can then get and use client-side.
This is simply insane. I'm new to hibernate workings under the hood, was writing unit tests, and found out that I can't delete an object from a set after modifying it, concluded that it is because of the hashcode change, but was unable to understand how to solve. The article is simple gorgeous!
It is a great article. However, for people who see the link for the first time, I would suggest that it might be an overkill for most applications. The other 3 options listed on this page should more or less solve the issue in multiple ways.
Does Hibernate/JPA use the equals and hashcode method of an entity to check if the record already exists in the database?
n
nanda

I always override equals/hashcode and implement it based on the business id. Seems the most reasonable solution for me. See the following link.

To sum all this stuff up, here is a listing of what will work or won't work with the different ways to handle equals/hashCode:

EDIT:

To explain why this works for me:

I don't usually use hashed-based collection (HashMap/HashSet) in my JPA application. If I must, I prefer to create UniqueList solution. I think changing business id on runtime is not a best practice for any database application. On rare cases where there is no other solution, I'd do special treatment like remove the element and put it back to the hashed-based collection. For my model, I set the business id on constructor and doesn't provide setters for it. I let JPA implementation to change the field instead of the property. UUID solution seems to be overkill. Why UUID if you have natural business id? I would after all set the uniqueness of the business id in the database. Why having THREE indexes for each table in the database then?


But this table is lacking a fifth line "works with List/Sets" (if you think of removing an entity which is part of a Set from a OneToMany mapping) which would be answered "No" on the last two options because its hashCode() changes which violates its contract.
See the comment on the question. You seems to misunderstand the equals/hashcode contract
@MRalwasser: I think you mean the right thing, it's just not the equals/hashCode() contract itself which is violated. But a mutable equals/hashCode does create problems with the Set contract.
@MRalwasser: The hashcode can only change if the business ID changes, and the point is that the business ID does not change. So the hashcode doesn't change, and this works perfectly with hashed collections.
What if you don't have a natural business key? For example in case of a two dimensional point, Point(X,Y), in a graph-drawing application? How would you store that point as an Entity?
B
Balder

I personally already used all of these three stategies in different projects. And I must say that option 1 is in my opinion the most practicable in a real life app. In my experience breaking hashCode()/equals() conformity leads to many crazy bugs as you will every time end up in situations where the result of equality changes after an entity has been added to a collection.

But there are further options (also with their pros and cons):

a) hashCode/equals based on a set of immutable, not null, constructor assigned, fields

(+) all three criterias are guaranteed

(-) field values must be available to create a new instance

(-) complicates handling if you must change one of then

b) hashCode/equals based on a primary key that is assigned by the application (in the constructor) instead of JPA

(+) all three criterias are guaranteed

(-) you cannot take advantage of simple reliable ID generation stategies like DB sequences

(-) complicated if new entities are created in a distributed environment (client/server) or app server cluster

c) hashCode/equals based on a UUID assigned by the constructor of the entity

(+) all three criterias are guaranteed

(-) overhead of UUID generation

(-) may be a little risk that twice the same UUID is used, depending on algorythm used (may be detected by an unique index on DB)


I'm fan of Option 1 and Approach C also. Do nothing till you absolutely need it is the more agile approach.
+1 for option (b). IMHO, if an entity has a natural business ID, then that should also be its database primary key. That's simple, straightforward, good database design. If it doesn't have such an ID, then a surrogate key is needed. If you set that at object creation, then everything else is simple. It's when people don't use a natural key, and don't generate a surrogate key early that they get into trouble. As for complexity in implementation - yes, there is some. But really not a lot, and it can be done a very generic way that solves it once for all entities.
I also prefer option 1, but then how to write unit test to assert the full equality is a big problem, because we have to implement the equals method for Collection.
Overhead of UUID generation is a minus? How does that compare to actually storing the data in a database?
C
Chris Lercher

If you want to use equals()/hashCode() for your Sets, in the sense that the same entity can only be in there once, then there is only one option: Option 2. That's because a primary key for an entity by definition never changes (if somebody indeed updates it, it's not the same entity anymore)

You should take that literally: Since your equals()/hashCode() are based on the primary key, you must not use these methods, until the primary key is set. So you shouldn't put entities in the set, until they're assigned a primary key. (Yes, UUIDs and similar concepts may help to assign primary keys early.)

Now, it's theoretically also possible to achieve that with Option 3, even though so-called "business-keys" have the nasty drawback that they can change: "All you'll have to do is delete the already inserted entities from the set(s), and re-insert them." That is true - but it also means, that in a distributed system, you'll have to make sure, that this is done absolutely everywhere the data has been inserted to (and you'll have to make sure, that the update is performed, before other things occur). You'll need a sophisticated update mechanism, especially if some remote systems aren't currently reachable...

Option 1 can only be used, if all the objects in your sets are from the same Hibernate session. The Hibernate documentation makes this very clear in chapter 13.1.3. Considering object identity:

Within a Session the application can safely use == to compare objects. However, an application that uses == outside of a Session might produce unexpected results. This might occur even in some unexpected places. For example, if you put two detached instances into the same Set, both might have the same database identity (i.e., they represent the same row). JVM identity, however, is by definition not guaranteed for instances in a detached state. The developer has to override the equals() and hashCode() methods in persistent classes and implement their own notion of object equality.

It continues to argue in favor of Option 3:

There is one caveat: never use the database identifier to implement equality. Use a business key that is a combination of unique, usually immutable, attributes. The database identifier will change if a transient object is made persistent. If the transient instance (usually together with detached instances) is held in a Set, changing the hashcode breaks the contract of the Set.

This is true, if you

cannot assign the id early (e.g. by using UUIDs)

and yet you absolutely want to put your objects in sets while they're in transient state.

Otherwise, you're free to choose Option 2.

Then it mentions the need for a relative stability:

Attributes for business keys do not have to be as stable as database primary keys; you only have to guarantee stability as long as the objects are in the same Set.

This is correct. The practical problem I see with this is: If you can't guarantee absolute stability, how will you be able to guarantee stability "as long as the objects are in the same Set". I can imagine some special cases (like using sets only for a conversation and then throwing it away), but I would question the general practicability of this.

Short version:

Option 1 can only be used with objects within a single session.

If you can, use Option 2. (Assign PK as early as possible, because you can't use the objects in sets until the PK is assigned.)

If you can guarantee relative stability, you can use Option 3. But be careful with this.


Your assumption that the primary key never changes is false. Eg, Hibernate only allocates the primary key when the session is saved. So, if you use the primary key as your hashCode the result of hashCode() before you first save the object and after you first save the object will be different. Worse, before you save the session, two newly created objects will have the same hashCode and can overwrite each other when added to collections. You may find yourself having to force a save/flush immediately on object creation to use that approach.
@William: The primary key of an entity doesn't change. The id property of the mapped object may change. This occurs, as you explained, especially when a transient object is made persistent. Please read the part of my answer carefully, where I said about the equals/hashCode methods: "you must not use these methods, until the primary key is set."
Totally agree. With option 2, you're also able to factor out equals/hashcode in a super class and have it re-used by all you entities.
+1 I'm new to JPA, but some of the comments and answers here imply that people do not understand the meaning of the term "primary key".
C
Christian Conti-Vock

We usually have two IDs in our entities:

Is for persistence layer only (so that persistence provider and database can figure out relationships between objects). Is for our application needs (equals() and hashCode() in particular)

Take a look:

@Entity
public class User {

    @Id
    private int id;  // Persistence ID
    private UUID uuid; // Business ID

    // assuming all fields are subject to change
    // If we forbid users change their email or screenName we can use these
    // fields for business ID instead, but generally that's not the case
    private String screenName;
    private String email;

    // I don't put UUID generation in constructor for performance reasons. 
    // I call setUuid() when I create a new entity
    public User() {
    }

    // This method is only called when a brand new entity is added to 
    // persistence context - I add it as a safety net only but it might work 
    // for you. In some cases (say, when I add this entity to some set before 
    // calling em.persist()) setting a UUID might be too late. If I get a log 
    // output it means that I forgot to call setUuid() somewhere.
    @PrePersist
    public void ensureUuid() {
        if (getUuid() == null) {
            log.warn(format("User's UUID wasn't set on time. " 
                + "uuid: %s, name: %s, email: %s",
                getUuid(), getScreenName(), getEmail()));
            setUuid(UUID.randomUUID());
        }
    }

    // equals() and hashCode() rely on non-changing data only. Thus we 
    // guarantee that no matter how field values are changed we won't 
    // lose our entity in hash-based Sets.
    @Override
    public int hashCode() {
        return getUuid().hashCode();
    }

    // Note that I don't use direct field access inside my entity classes and
    // call getters instead. That's because Persistence provider (PP) might
    // want to load entity data lazily. And I don't use 
    //    this.getClass() == other.getClass() 
    // for the same reason. In order to support laziness PP might need to wrap
    // my entity object in some kind of proxy, i.e. subclassing it.
    @Override
    public boolean equals(final Object obj) {
        if (this == obj)
            return true;
        if (!(obj instanceof User))
            return false;
        return getUuid().equals(((User) obj).getUuid());
    }

    // Getters and setters follow
}

EDIT: to clarify my point regarding calls to setUuid() method. Here's a typical scenario:

User user = new User();
// user.setUuid(UUID.randomUUID()); // I should have called it here
user.setName("Master Yoda");
user.setEmail("yoda@jedicouncil.org");

jediSet.add(user); // here's bug - we forgot to set UUID and 
                   //we won't find Yoda in Jedi set

em.persist(user); // ensureUuid() was called and printed the log for me.

jediCouncilSet.add(user); // Ok, we got a UUID now

When I run my tests and see the log output I fix the problem:

User user = new User();
user.setUuid(UUID.randomUUID());

Alternatively, one can provide a separate constructor:

@Entity
public class User {

    @Id
    private int id;  // Persistence ID
    private UUID uuid; // Business ID

    ... // fields

    // Constructor for Persistence provider to use
    public User() {
    }

    // Constructor I use when creating new entities
    public User(UUID uuid) {
        setUuid(uuid);
    }

    ... // rest of the entity.
}

So my example would look like this:

User user = new User(UUID.randomUUID());
...
jediSet.add(user); // no bug this time

em.persist(user); // and no log output

I use a default constructor and a setter, but you may find two-constructors approach more suitable for you.


I believe, that this is a correct and good solution. It may also have a little performance advantage, because integers usually perform better in database indexes than uuids. But apart from that, you could probably eliminate the current integer id property, and replace it with the (application assigned) uuid?
How is this different to using the default hashCode/equals methods for JVM equality and id for persistence equality? This doesn't make sense to me at all.
It works in cases when you have several entity objects pointing to the same row in a database. Object's equals() would return false in this case. UUID-based equals() returns true.
-1 - i don't see any reason to have two IDs, and so two kinds of identity. This seems completely pointless and potentially harmful to me.
Sorry for criticising your solution without pointing to one i would prefer. In short, i would give objects a single ID field, i would implement equals and hashCode based on it, and i would generate its value on object creation, rather than when saving to the database. That way, all forms of the object work the same way: non-persistent, persistent, and detached. Hibernate proxies (or similar) should also work correctly, and i think would not even need to be hydrated to handle equals and hashCode calls.
V
Vlad Mihalcea

If you have a business key, then you should use that for equals and hashCode. If you don't have a business key, you should not leave it with the default Object equals and hashCode implementations because that does not work after you merge and entity. You can use the entity identifier in the equals method only if the hashCode implementation returns a constant value, like this: @Entity public class Book implements Identifiable { @Id @GeneratedValue private Long id; private String title; @Override public boolean equals(Object o) { if (this == o) return true; if (!(o instanceof Book)) return false; Book book = (Book) o; return getId() != null && Objects.equals(getId(), book.getId()); } @Override public int hashCode() { return getClass().hashCode(); } //Getters and setters omitted for brevity }

Check out this test case on GitHub that proves this solution works like a charm.


Which is better: (1) onjava.com/pub/a/onjava/2006/09/13/… or (2) vladmihalcea.com/…? Solution (2) is easier than (1). So why should I use (1). Are the effects of both the same? Do both guarantee the same solution?
And with your solution: "the hashCode value does not change" between the same instances. This has the same behaviour as if it were the "same" uuid (from solution (1)) being compared. Am I right?
If (2) works under every state, why should I bother with "business key" at all?
And store the UUID in the database and increase the footprint of the record and in the buffer pool? I think this can lead to more performance issues on the long run than the unique hashCode. As for the other solution, you can check it out to see if it provides consistency across all entity state transitions. You can find the test that checks that on GitHub.
If you have an immutable business key, the hashCode can use it and it's going to benefit from multiple buckets, so it's worth using if you have one. Otherwise, just use the entity identifier as explained in my article.
j
jbyler

Although using a business key (option 3) is the most commonly recommended approach (Hibernate community wiki, "Java Persistence with Hibernate" p. 398), and this is what we mostly use, there's a Hibernate bug which breaks this for eager-fetched sets: HHH-3799. In this case, Hibernate can add an entity to a set before its fields are initialized. I'm not sure why this bug hasn't gotten more attention, as it really makes the recommended business-key approach problematic.

I think the heart of the matter is that equals and hashCode should be based on immutable state (reference Odersky et al.), and a Hibernate entity with Hibernate-managed primary key has no such immutable state. The primary key is modified by Hibernate when a transient object becomes persistent. The business key is also modified by Hibernate, when it hydrates an object in the process of being initialized.

That leaves only option 1, inheriting the java.lang.Object implementations based on object identity, or using an application-managed primary key as suggested by James Brundege in "Don't Let Hibernate Steal Your Identity" (already referenced by Stijn Geukens's answer) and by Lance Arlaus in "Object Generation: A Better Approach to Hibernate Integration".

The biggest problem with option 1 is that detached instances can't be compared with persistent instances using .equals(). But that's OK; the contract of equals and hashCode leaves it up to the developer to decide what equality means for each class. So just let equals and hashCode inherit from Object. If you need to compare a detached instance to a persistent instance, you can create a new method explicitly for that purpose, perhaps boolean sameEntity or boolean dbEquivalent or boolean businessEquals.


D
Drew

I agree with Andrew's answer. We do the same thing in our application but instead of storing UUIDs as VARCHAR/CHAR, we split it into two long values. See UUID.getLeastSignificantBits() and UUID.getMostSignificantBits().

One more thing to consider, is that calls to UUID.randomUUID() are pretty slow, so you might want to look into lazily generating the UUID only when needed, such as during persistence or calls to equals()/hashCode()

@MappedSuperclass
public abstract class AbstractJpaEntity extends AbstractMutable implements Identifiable, Modifiable {

    private static final long   serialVersionUID    = 1L;

    @Version
    @Column(name = "version", nullable = false)
    private int                 version             = 0;

    @Column(name = "uuid_least_sig_bits")
    private long                uuidLeastSigBits    = 0;

    @Column(name = "uuid_most_sig_bits")
    private long                uuidMostSigBits     = 0;

    private transient int       hashCode            = 0;

    public AbstractJpaEntity() {
        //
    }

    public abstract Integer getId();

    public abstract void setId(final Integer id);

    public boolean isPersisted() {
        return getId() != null;
    }

    public int getVersion() {
        return version;
    }

    //calling UUID.randomUUID() is pretty expensive, 
    //so this is to lazily initialize uuid bits.
    private void initUUID() {
        final UUID uuid = UUID.randomUUID();
        uuidLeastSigBits = uuid.getLeastSignificantBits();
        uuidMostSigBits = uuid.getMostSignificantBits();
    }

    public long getUuidLeastSigBits() {
        //its safe to assume uuidMostSigBits of a valid UUID is never zero
        if (uuidMostSigBits == 0) {
            initUUID();
        }
        return uuidLeastSigBits;
    }

    public long getUuidMostSigBits() {
        //its safe to assume uuidMostSigBits of a valid UUID is never zero
        if (uuidMostSigBits == 0) {
            initUUID();
        }
        return uuidMostSigBits;
    }

    public UUID getUuid() {
        return new UUID(getUuidMostSigBits(), getUuidLeastSigBits());
    }

    @Override
    public int hashCode() {
        if (hashCode == 0) {
            hashCode = (int) (getUuidMostSigBits() >> 32 ^ getUuidMostSigBits() ^ getUuidLeastSigBits() >> 32 ^ getUuidLeastSigBits());
        }
        return hashCode;
    }

    @Override
    public boolean equals(final Object obj) {
        if (obj == null) {
            return false;
        }
        if (!(obj instanceof AbstractJpaEntity)) {
            return false;
        }
        //UUID guarantees a pretty good uniqueness factor across distributed systems, so we can safely
        //dismiss getClass().equals(obj.getClass()) here since the chance of two different objects (even 
        //if they have different types) having the same UUID is astronomical
        final AbstractJpaEntity entity = (AbstractJpaEntity) obj;
        return getUuidMostSigBits() == entity.getUuidMostSigBits() && getUuidLeastSigBits() == entity.getUuidLeastSigBits();
    }

    @PrePersist
    public void prePersist() {
        // make sure the uuid is set before persisting
        getUuidLeastSigBits();
    }

}

Well, actually if you override equals()/hashCode() then you have to generate UUID for every entity anyway (I assume that you want to persist every entity you create in your code). You do it only once - before storing it to a database for the first time. After that UUID is just loaded by Persistence Provider. Thus I don't see th point of doing it lazily.
I up-voted your answer because I really like your other ideas: storing UUID as a pair of numbers in the database and not casting to a particular type inside the equals() method - that one is really neat! I'll definitely use these two tricks in future.
Thanks for the up-vote. The reason for lazily initializing the UUID was in our app we create a lot of Entities that never get put in a HashMap or persisted. So we saw 100x drop in performance when we were creating the object (100,000s of them). So we only init the UUID if it's needed. I just wish there was good support in MySql for 128 bit numbers so we could just use the UUID for id too and don't care about the auto_increment.
Oh, I see. In my case we don't even declare UUID field if corresponding entity is not going to be put in collections. The drawback is that sometimes we have to add it because later it turns out that we actually need to put them in collections. This happens sometimes during development but fortunately never happened to us after initial deployment to a customer so it wasn't a big deal. If that would happen after the system went live we would need a db migration. Lazy UUID are very helpful in such situations.
Maybe you should also try the faster UUID generator Adam suggested in his answer if performance is a critical issue in your situation.
M
Martin Andersson

Jakarta Persistence 3.0, section 4.12 writes:

Two entities of the same abstract schema type are equal if and only if they have the same primary key value.

I see no reason why Java code should behave differently.

If the entity class is in a so called "transient" state, i.e. it's not yet persisted and it has no identifier, then the hashCode/equals methods can not return a value, they ought to blow up, ideally implicitly with a NullPointerException when the method attempts to traverse the ID. Either way, this will effectively stop application code from putting a non-managed entity into a hash-based data structure. In fact, why not go one step further and blow up if the class and identifier are equal, but other important attributes such as the version are unequal (IllegalStateException)! Fail-fast in a deterministic way is always the preferred option.

Word of caution: Also document the blowing-up behavior. Documentation is important in and by itself, but it will hopefully also stop junior developers in the future to do something stupid with your code (they have this tendency to suppress NullPointerException where it happened and the last thing on their mind is side-effects lol).

Oh, and always use getClass() instead of instanceof. The equals-method requires symmetry. If b is equal to a, then a must be equal to b. With subclasses, instanceof breaks this relationship (a is not instance of b).

Although I personally always use getClass() even when implementing non-entity classes (the type is state, and so a subclass adds state even if the subclass is empty or only contains behavior), instanceof would've been fine only if the class is final. But entity classes must not be final (§2.1) so we're really out of options here.

Some folks may not like getClass(), because of the persistence provider's proxy wrapping the object. This might have been a problem in the past, but it really shouldn't be. A provider not returning different proxy classes for different entities, well, I'd say that's not a very smart provider lol. Generally, we shouldn't solve a problem until there is a problem. And, it seems like Hibernate's own documentation doesn't even see it worthwhile mentioning. In fact, they elegantly use getClass() in their own examples (see this).

Lastly, if one has an entity subclass that is an entity, and the inheritance mapping strategy used is not the default ("single table"), but configured to be a "joined subtype", then the primary key in that subclass table will be the same as the superclass table. If the mapping strategy is "table per concrete class", then the primary key may be the same as in the superclass. An entity subclass is very likely to be adding state and therefore just as likely to be logically a different thing. But an equals implementation using instanceof can not necessarily and secondarily rely on the ID only, as we saw may be the same for different entities.

In my opinion, instanceof has no place at all in a non-final Java class, ever. This is especially true for persistent entities.


Even with a DB missing sequences (like Mysql), it's possible to simulate them (e.g., table hibernate_sequence). So you may always get an ID unique across tables. +++ But you don't need it. Calling Object#getClass() is bad because of H. proxies. Calling Hibernate.getClass(o) helps, but the problem of equality of entities of different kinds remains. There's a solution using canEqual, a bit complicated, but usable. Agreed that usually it's not needed. +++ Throwing in eq/hc on null ID violates the contract, but it's very pragmatic.
Thank you for your comment. I updated the answer. The one thing I wish to add here is that the statement "throwing in eq/hc on null ID violaes the contract" is wrong. It's objectively wrong because, well, it's simply not part of the contract. Not that it matters for the truthfulness, but I'd just like to add that others agree.
A
Adam Gent

There are obviously already very informative answers here but I will tell you what we do.

We do nothing (ie do not override).

If we do need equals/hashcode to work for collections we use UUIDs. You just create the UUID in the constructor. We use http://wiki.fasterxml.com/JugHome for UUID. UUID is a little more expensive CPU wise but is cheap compared to serialization and db access.


a
aux

Please consider the following approach based on predefined type identifier and the ID.

The specific assumptions for JPA:

entities of the same "type" and the same non-null ID are considered equal

non-persisted entities (assuming no ID) are never equal to other entities

The abstract entity:

@MappedSuperclass
public abstract class AbstractPersistable<K extends Serializable> {

  @Id @GeneratedValue
  private K id;

  @Transient
  private final String kind;

  public AbstractPersistable(final String kind) {
    this.kind = requireNonNull(kind, "Entity kind cannot be null");
  }

  @Override
  public final boolean equals(final Object obj) {
    if (this == obj) return true;
    if (!(obj instanceof AbstractPersistable)) return false;
    final AbstractPersistable<?> that = (AbstractPersistable<?>) obj;
    return null != this.id
        && Objects.equals(this.id, that.id)
        && Objects.equals(this.kind, that.kind);
  }

  @Override
  public final int hashCode() {
    return Objects.hash(kind, id);
  }

  public K getId() {
    return id;
  }

  protected void setId(final K id) {
    this.id = id;
  }
}

Concrete entity example:

static class Foo extends AbstractPersistable<Long> {
  public Foo() {
    super("Foo");
  }
}

Test example:

@Test
public void test_EqualsAndHashcode_GivenSubclass() {
  // Check contract
  EqualsVerifier.forClass(Foo.class)
    .suppress(Warning.NONFINAL_FIELDS, Warning.TRANSIENT_FIELDS)
    .withOnlyTheseFields("id", "kind")
    .withNonnullFields("id", "kind")
    .verify();
  // Ensure new objects are not equal
  assertNotEquals(new Foo(), new Foo());
}

Main advantages here:

simplicity

ensures subclasses provide type identity

predicted behavior with proxied classes

Disadvantages:

Requires each entity to call super()

Notes:

Needs attention when using inheritance. E.g. instance equality of class A and class B extends A may depend on concrete details of the application.

Ideally, use a business key as the ID

Looking forward to your comments.


N
Neil Stevens

I have always used option 1 in the past because I was aware of these discussions and thought it was better to do nothing until I knew the right thing to do. Those systems are all still running successfully.

However, next time I may try option 2 - using the database generated Id.

Hashcode and equals will throw IllegalStateException if the id is not set.

This will prevent subtle errors involving unsaved entities from appearing unexpectedly.

What do people think of this approach?


D
Demel

Business keys approach doesn't suit for us. We use DB generated ID, temporary transient tempId and override equal()/hashcode() to solve the dilemma. All entities are descendants of Entity. Pros:

No extra fields in DB No extra coding in descendants entities, one approach for all No performance issues (like with UUID), DB Id generation No problem with Hashmaps (don't need to keep in mind the use of equal & etc.) Hashcode of new entity doesn't changed in time even after persisting

Cons:

There are may be problems with serializing and deserializing not persisted entities Hashcode of the saved entity may change after reloading from DB Not persisted objects considered always different (maybe this is right?) What else?

Look at our code:

@MappedSuperclass
abstract public class Entity implements Serializable {

    @Id
    @GeneratedValue
    @Column(nullable = false, updatable = false)
    protected Long id;

    @Transient
    private Long tempId;

    public void setId(Long id) {
        this.id = id;
    }

    public Long getId() {
        return id;
    }

    private void setTempId(Long tempId) {
        this.tempId = tempId;
    }

    // Fix Id on first call from equal() or hashCode()
    private Long getTempId() {
        if (tempId == null)
            // if we have id already, use it, else use 0
            setTempId(getId() == null ? 0 : getId());
        return tempId;
    }

    @Override
    public boolean equals(Object obj) {
        if (super.equals(obj))
            return true;
        // take proxied object into account
        if (obj == null || !Hibernate.getClass(obj).equals(this.getClass()))
            return false;
        Entity o = (Entity) obj;
        return getTempId() != 0 && o.getTempId() != 0 && getTempId().equals(o.getTempId());
    }

    // hash doesn't change in time
    @Override
    public int hashCode() {
        return getTempId() == 0 ? super.hashCode() : getTempId().hashCode();
    }
}

C
Christian Beikov

IMO you have 3 options for implementing equals/hashCode

Use an application generated identity i.e. a UUID

Implement it based on a business key

Implement it based on the primary key

Using an application generated identity is the easiest approach, but comes with a few downsides

Joins are slower when using it as PK because 128 Bit is simply bigger than 32 or 64 Bit

"Debugging is harder" because checking with your own eyes wether some data is correct is pretty hard

If you can work with these downsides, just use this approach.

To overcome the join issue one could be using the UUID as natural key and a sequence value as primary key, but then you might still run into the equals/hashCode implementation problems in compositional child entities that have embedded ids since you will want to join based on the primary key. Using the natural key in child entities id and the primary key for referring to the parent is a good compromise.

@Entity class Parent {
  @Id @GeneratedValue Long id;
  @NaturalId UUID uuid;
  @OneToMany(mappedBy = "parent") Set<Child> children;
  // equals/hashCode based on uuid
}

@Entity class Child {
  @EmbeddedId ChildId id;
  @ManyToOne Parent parent;

  @Embeddable class ChildId {
    UUID parentUuid;
    UUID childUuid;
    // equals/hashCode based on parentUuid and childUuid
  }
  // equals/hashCode based on id
}

IMO this is the cleanest approach as it will avoid all downsides and at the same time provide you a value(the UUID) that you can share with external systems without exposing system internals.

Implement it based on a business key if you can expect that from a user is a nice idea, but comes with a few downsides as well

Most of the time this business key will be some kind of code that the user provides and less often a composite of multiple attributes.

Joins are slower because joining based on variable length text is simply slow. Some DBMS might even have problems creating an index if the key exceeds a certain length.

In my experience, business keys tend to change which will require cascading updates to objects referring to it. This is impossible if external systems refer to it

IMO you shouldn't implement or work with a business key exclusively. It's a nice add-on i.e. users can quickly search by that business key, but the system shouldn't rely on it for operating.

Implement it based on the primary key has it's problems, but maybe it's not such a big deal

If you need to expose ids to external system, use the UUID approach I suggested. If you don't, you could still use the UUID approach but you don't have to. The problem of using a DBMS generated id in equals/hashCode stems from the fact that the object might have been added to hash based collections before assigning the id.

The obvious way to get around this is to simply not add the object to hash based collections before assigning the id. I understand that this is not always possible because you might want deduplication before assigning the id already. To still be able to use the hash based collections, you simply have to rebuild the collections after assigning the id.

You could do something like this:

@Entity class Parent {
  @Id @GeneratedValue Long id;
  @OneToMany(mappedBy = "parent") Set<Child> children;
  // equals/hashCode based on id
}

@Entity class Child {
  @EmbeddedId ChildId id;
  @ManyToOne Parent parent;

  @PrePersist void postPersist() {
    parent.children.remove(this);
  }
  @PostPersist void postPersist() {
    parent.children.add(this);
  }

  @Embeddable class ChildId {
    Long parentId;
    @GeneratedValue Long childId;
    // equals/hashCode based on parentId and childId
  }
  // equals/hashCode based on id
}

I haven't tested the exact approach myself, so I'm not sure how changing collections in pre- and post-persist events works but the idea is:

Temporarily Remove the object from hash based collections

Persist it

Re-add the object to the hash based collections

Another way of solving this is to simply rebuild all your hash based models after an update/persist.

In the end, it's up to you. I personally use the sequence based approach most of the time and only use the UUID approach if I need to expose an identifier to external systems.


u
user2083808

I tried to answer this question myself and was never totally happy with found solutions until i read this post and especially DREW one. I liked the way he lazy created UUID and optimally stored it.

But I wanted to add even more flexibility, ie lazy create UUID ONLY when hashCode()/equals() is accessed before first persistence of the entity with each solution's advantages :

equals() means "object refers to the same logical entity"

use database ID as much as possible because why would I do the work twice (performance concern)

prevent problem while accessing hashCode()/equals() on not yet persisted entity and keep the same behaviour after it is indeed persisted

I would really apreciate feedback on my mixed-solution below

public class MyEntity { @Id() @Column(name = "ID", length = 20, nullable = false, unique = true) @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id = null; @Transient private UUID uuid = null; @Column(name = "UUID_MOST", nullable = true, unique = false, updatable = false) private Long uuidMostSignificantBits = null; @Column(name = "UUID_LEAST", nullable = true, unique = false, updatable = false) private Long uuidLeastSignificantBits = null; @Override public final int hashCode() { return this.getUuid().hashCode(); } @Override public final boolean equals(Object toBeCompared) { if(this == toBeCompared) { return true; } if(toBeCompared == null) { return false; } if(!this.getClass().isInstance(toBeCompared)) { return false; } return this.getUuid().equals(((MyEntity)toBeCompared).getUuid()); } public final UUID getUuid() { // UUID already accessed on this physical object if(this.uuid != null) { return this.uuid; } // UUID one day generated on this entity before it was persisted if(this.uuidMostSignificantBits != null) { this.uuid = new UUID(this.uuidMostSignificantBits, this.uuidLeastSignificantBits); // UUID never generated on this entity before it was persisted } else if(this.getId() != null) { this.uuid = new UUID(this.getId(), this.getId()); // UUID never accessed on this not yet persisted entity } else { this.setUuid(UUID.randomUUID()); } return this.uuid; } private void setUuid(UUID uuid) { if(uuid == null) { return; } // For the one hypothetical case where generated UUID could colude with UUID build from IDs if(uuid.getMostSignificantBits() == uuid.getLeastSignificantBits()) { throw new Exception("UUID: " + this.getUuid() + " format is only for internal use"); } this.uuidMostSignificantBits = uuid.getMostSignificantBits(); this.uuidLeastSignificantBits = uuid.getLeastSignificantBits(); this.uuid = uuid; }


what do you mean by "UUID one day generated on this entity before i was persisted" ? could you please give an example for this case ?
could you use assigned generationtype? why is identity generationtype need? does it have some advantage over assigned?
what happens if you 1) make a new MyEntity, 2) put it into a list, 3) then save it to database then 4) you load that entity back from DB and 5) try to see if the loaded instance is in the list. My guess is that it wont't be even though it should be.
Thanks for your first comments that showed me I wasn't as clear as I should. Firstly, "UUID one day generated on this entity before i was persisted" was a typo ... "before IT was persisted" should have been read instead. For the other remarks, I'll edit my post soon to try to explain better my solution.
C
Christopher Yang

This is a common problem in every IT system that uses Java and JPA. The pain point extends beyond implementing equals() and hashCode(), it affects how an organization refer to an entity and how its clients refer to the same entity. I've seen enough pain of not having a business key to the point that I wrote my own blog to express my view.

In short: use a short, human readable, sequential ID with meaningful prefixes as business key that's generated without any dependency on any storage other than RAM. Twitter's Snowflake is a very good example.


M
Marcos Oliveira

I using a class EntityBase and inherited to all my entities of JPA and this it works very good to me.

/**
 * @author marcos.oliveira
 */
@MappedSuperclass
public abstract class EntityBase<TId extends Serializable> implements Serializable{
    /**
     *
     */
    private static final long serialVersionUID = 1L;

    @Id
    @Column(name = "id", unique = true, nullable = false)
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    protected TId id;



    public TId getId() {
        return this.id;
    }

    public void setId(TId id) {
        this.id = id;
    }

    @Override
    public int hashCode() {
        return (super.hashCode() * 907) + Objects.hashCode(getId());//this.getId().hashCode();
    }

    @Override
    public String toString() {
        return super.toString() + " [Id=" + id + "]";
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (obj == null || getClass() != obj.getClass()) {
            return false;
        }
        EntityBase entity = (EntityBase) obj;
        if (entity.id == null || id == null) {
            return false;
        }
        return Objects.equals(id, entity.id);
    }
}

Reference: https://thorben-janssen.com/ultimate-guide-to-implementing-equals-and-hashcode-with-hibernate/


i
illEatYourPuppies

If UUID is the answer for many people, why don't we just use factory methods from business layer to create the entities and assign primary key at creation time?

for example:

@ManagedBean
public class MyCarFacade {
  public Car createCar(){
    Car car = new Car();
    em.persist(car);
    return car;
  }
}

this way we would get a default primary key for the entity from the persistence provider, and our hashCode() and equals() functions could rely on that.

We could also declare the Car's constructors protected and then use reflection in our business method to access them. This way developers would not be intent on instantiate Car with new, but through factory method.

How'bout that?


An approach that works great if you're willing to take the performance hit both of generating the guid when doing a database lookup.
What about unit testing Car ? In this case you need a database connection for testing ? Also, your domain objects should not depend on persistance.
G
Grigory Kislin

In practice it seems, that Option 2 (Primary key) is most frequently used. Natural and IMMUTABLE business key is seldom thing, creating and supporting synthetic keys are too heavy to solve situations, which are probably never happened. Have a look at spring-data-jpa AbstractPersistable implementation (the only thing: for Hibernate implementation use Hibernate.getClass).

public boolean equals(Object obj) {
    if (null == obj) {
        return false;
    }
    if (this == obj) {
        return true;
    }
    if (!getClass().equals(ClassUtils.getUserClass(obj))) {
        return false;
    }
    AbstractPersistable<?> that = (AbstractPersistable<?>) obj;
    return null == this.getId() ? false : this.getId().equals(that.getId());
}

@Override
public int hashCode() {
    int hashCode = 17;
    hashCode += null == getId() ? 0 : getId().hashCode() * 31;
    return hashCode;
}

Just aware of manipulating new objects in HashSet/HashMap. In opposite, the Option 1 (remain Object implementation) is broken just after merge, that is very common situation.

If you have no business key and have a REAL needs to manipulate new entity in hash structure, override hashCode to constant, as below Vlad Mihalcea was advised.


j
jhegedus

Below is a simple (and tested) solution for Scala.

Note that this solution does not fit into any of the 3 categories given in the question.

All my Entities are subclasses of the UUIDEntity so I follow the don't-repeat-yourself (DRY) principle.

If needed the UUID generation can be made more precise (by using more pseudo-random numbers).

Scala Code:

import javax.persistence._
import scala.util.Random

@Entity
@Inheritance(strategy = InheritanceType.TABLE_PER_CLASS)
abstract class UUIDEntity {
  @Id  @GeneratedValue(strategy = GenerationType.TABLE)
  var id:java.lang.Long=null
  var uuid:java.lang.Long=Random.nextLong()
  override def equals(o:Any):Boolean= 
    o match{
      case o : UUIDEntity => o.uuid==uuid
      case _ => false
    }
  override def hashCode() = uuid.hashCode()
}