ChatGPT解决这个技术问题 Extra ChatGPT

Why there is no ConcurrentHashSet against ConcurrentHashMap

HashSet is based on HashMap.

If we look at HashSet<E> implementation, everything is been managed under HashMap<E,Object>.

<E> is used as a key of HashMap.

And we know that HashMap is not thread safe. That is why we have ConcurrentHashMap in Java.

Based on this, I am confused that why we don't have a ConcurrentHashSet which should be based on the ConcurrentHashMap?

Is there anything else that I am missing? I need to use Set in a multi-threaded environment.

Also, If I want to create my own ConcurrentHashSet can I achieve it by just replacing the HashMap to ConcurrentHashMap and leaving the rest as is?

After looking at the API, if I were to guess I would say that it seems to come down to 2 factors, (1) avoiding having to create a class in Java API for every little bit of functionality needed (2) Providing convenience classes for more frequently used objects. I personally prefer LinkedHashMap and LinkedHashSet since they guarantee order is the same as insertion order, the only reason for using a set is to avoid duplication, often I still want to maintain insertion order.
@Ali, I personally prefer LinkedHashMap and LinkedHashSet you will go far :)
A bit old question, but as it is the first result in Google, may be useful to know that ConcurrentSkipListSet already has the implementation of ConcurrentHashMap. See docs.oracle.com/javase/7/docs/api/java/util/concurrent/…
What I saw from Java source ConcurrentSkipListSet is built on ConcurrentSkipListMap, which implements ConcurrentNavigableMap and ConcurrentMap.

R
Ray Toal

There's no built in type for ConcurrentHashSet because you can always derive a set from a map. Since there are many types of maps, you use a method to produce a set from a given map (or map class).

Prior to Java 8, you produce a concurrent hash set backed by a concurrent hash map, by using Collections.newSetFromMap(map)

In Java 8 (pointed out by @Matt), you can get a concurrent hash set view via ConcurrentHashMap.newKeySet(). This is a bit simpler than the old newSetFromMap which required you to pass in an empty map object. But it is specific to ConcurrentHashMap.

Anyway, the Java designers could have created a new set interface every time a new map interface was created, but that pattern would be impossible to enforce when third parties create their own maps. It is better to have the static methods that derive new sets; that approach always works, even when you create your own map implementations.


Am I right to say that if you create the set this way from ConcurrentHashMap, you lose the benefits you'd get from ConcurrentHashMap ?
There are no benefits to lose. newSetFromMap's implementation is found starting on line 3841 in docjar.com/html/api/java/util/Collections.java.html. It's just a wrapper....
@Andrew, I think the motivation behind using a "ConcurrentSet" stems from not the API but rather the implementation - thread safety but without a universal lock - multiple concurrent reads for instance.
ConcurrentSkipList has lots of (size) overhead and the lookups are slower.
take care when using this approach, since some methods are not implemented correctly. Just follow the links: Collections.newSetFromMap creates a SetFromMap. e.g. the SetFromMap.removeAll method delegates to the KeySetView.removeAll, that inherits from ConcurrentHashMap$CollectionView.removeAll. This method is highly inefficient in bulk removing elements. imagine removeAll(Collections.emptySet()) traverses all elements in the Map without doing anything. Having a ConcurrentHashSet that is corretly implemented will be better in most cases.
P
Paresh Mayani
Set<String> mySet = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());

As Ray Toal points out in his answer, the preferred way to this in 2021 is definitely ConcurrentHashMap.newKeySet() (which does a similar thing behind the scenes), as it's more succinct.
this is thread-safe right?
k
kichik

With Guava 15 you can also simply use:

Set s = Sets.newConcurrentHashSet();

This is always a nightmare. If you have a set or a map that does not indicate whether or not something is thread safe you find all kind of hazards and desasters happen in maintaince. I always would want a type that indicates thread safety for collections (or not).
The method description is literally "Creates a thread-safe set backed by a hash map"
As I said, there is a ConcurrentSet missing. ConcurrentHashMap comes along with a ConcurrentMap interface to indicate this. This is the very same reason I always add this ConcurrentSet interface as well.
C
Community

Like Ray Toal mentioned it is as easy as:

Set<String> myConcurrentSet = ConcurrentHashMap.newKeySet();

This seems to require Java 8. Looking at implementation, this also seems to be just a wrapper of ConcurrentHashMap.
M
Mike Pone

It looks like Java provides a concurrent Set implementation with its ConcurrentSkipListSet. A SkipList Set is just a special kind of set implementation. It still implements the Serializable, Cloneable, Iterable, Collection, NavigableSet, Set, SortedSet interfaces. This might work for you if you only need the Set interface.


Note that ConcurrentSkipListSet's elements should be Comparable
If you need to extend from a Set that is concurrent, this is the only solution here that will work.
ConcurrentSkipListMap adds unnecessary performance penalty of having tree as base data structure, instead of using HashTable, even when you do not need sorting/navigation functionality.
don't use ConcurrentSkipListSet unless you want a SortedSet. A usual operation like add or remove should be O(1) for a HashSet, but O(log(n)) for a SortedSet.
N
Nirro

As pointed by this the best way to obtain a concurrency-able HashSet is by means of Collections.synchronizedSet()

Set s = Collections.synchronizedSet(new HashSet(...));

This worked for me and I haven't seen anybody really pointing to it.

EDIT This is less efficient than the currently aproved solution, as Eugene points out, since it just wraps your set into a synchronized decorator, while a ConcurrentHashMap actually implements low-level concurrency and it can back your Set just as fine. So thanks to Mr. Stepanenkov for making that clear.

http://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#synchronizedSet-java.util.Set-


the synchronizedSet method just creates the decorator under Collection to wrap methods that could be thread-safe by synchronization the whole collection. But ConcurrentHashMap is implemented using non-blocking algorithms and "low-level" synchronisations without any locks of the whole collection. So wrapers from Collections.synchronized... is worse in multi-threads environments for performance reasons.
B
Bozho

You can use guava's Sets.newSetFromMap(map) to get one. Java 6 also has that method in java.util.Collections


it's available in java.utll.Collections and set of CHM is usually a bad thing anyways.
yeah, I noticed it is added in Java 6, so added it to the answer
The main this is that if it is ThreadSafe, and I really doubt that.
@Talha, it's thread safe, however thread safety alone means nothing
Sometimes it means everything. Its shaldom a performance problem unless it is part of an algorithm which are usually implemented in a way that the need for concurrent mapping is minimized.
A
Ankush soni
import java.util.AbstractSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;


public class ConcurrentHashSet<E> extends AbstractSet<E> implements Set<E>{
   private final ConcurrentMap<E, Object> theMap;

   private static final Object dummy = new Object();

   public ConcurrentHashSet(){
      theMap = new ConcurrentHashMap<E, Object>();
   }

   @Override
   public int size() {
      return theMap.size();
   }

   @Override
   public Iterator<E> iterator(){
      return theMap.keySet().iterator();
   }

   @Override
   public boolean isEmpty(){
      return theMap.isEmpty();
   }

   @Override
   public boolean add(final E o){
      return theMap.put(o, ConcurrentHashSet.dummy) == null;
   }

   @Override
   public boolean contains(final Object o){
      return theMap.containsKey(o);
   }

   @Override
   public void clear(){
      theMap.clear();
   }

   @Override
   public boolean remove(final Object o){
      return theMap.remove(o) == ConcurrentHashSet.dummy;
   }

   public boolean addIfAbsent(final E o){
      Object obj = theMap.putIfAbsent(o, ConcurrentHashSet.dummy);
      return obj == null;
   }
}

I like the idea to use Boolean.TRUE instead of an dummy object. It is a little bit more elegant. Also using NULL is also possible since it would be available in the key set even if mapped to null.
@MartinKersten fyi, ConcurrentHashMap doesn't allow null values
S
Shendor

Why not use: CopyOnWriteArraySet from java.util.concurrent?


Because CopyOnWriteArraySet copies the entire collection on any state mutation, which is not always wanted due to the performance impact. It's designed to work only in special cases.
Additionally CopyOnWriteArraySet.contains() has a run-time of O(n) (has to check ever entry) where as HashSet/HashMap has O(1).
because this works only one way: if you're reading data owned/managed by another thread. if you need to pass data to that thread, you'd have to additionally update its atomic ref to updated array - but that makes little sense