I have a Map which is to be modified by several threads concurrently.
There seem to be three different synchronized Map implementations in the Java API:
Hashtable
Collections.synchronizedMap(Map)
ConcurrentHashMap
From what I understand, Hashtable
is an old implementation (extending the obsolete Dictionary
class), which has been adapted later to fit the Map
interface. While it is synchronized, it seems to have serious scalability issues and is discouraged for new projects.
But what about the other two? What are the differences between Maps returned by Collections.synchronizedMap(Map)
and ConcurrentHashMap
s? Which one fits which situation?
ConcurrentSkipListMap
as another thread-safe Map
implementation. Designed to be highly concurrent under load, using Skip List algorithm.
For your needs, use ConcurrentHashMap
. It allows concurrent modification of the Map from several threads without the need to block them. Collections.synchronizedMap(map)
creates a blocking Map which will degrade performance, albeit ensure consistency (if used properly).
Use the second option if you need to ensure data consistency, and each thread needs to have an up-to-date view of the map. Use the first if performance is critical, and each thread only inserts data to the map, with reads happening less frequently.
╔═══════════════╦═══════════════════╦═══════════════════╦═════════════════════╗
║ Property ║ HashMap ║ Hashtable ║ ConcurrentHashMap ║
╠═══════════════╬═══════════════════╬═══════════════════╩═════════════════════╣
║ Null ║ allowed ║ not allowed ║
║ values/keys ║ ║ ║
╠═══════════════╬═══════════════════╬═════════════════════════════════════════╣
║ Thread-safety ║ ║ ║
║ features ║ no ║ yes ║
╠═══════════════╬═══════════════════╬═══════════════════╦═════════════════════╣
║ Lock ║ not ║ locks the whole ║ locks the portion ║
║ mechanism ║ applicable ║ map ║ ║
╠═══════════════╬═══════════════════╩═══════════════════╬═════════════════════╣
║ Iterator ║ fail-fast ║ weakly consistent ║
╚═══════════════╩═══════════════════════════════════════╩═════════════════════╝
Regarding locking mechanism: Hashtable
locks the object, while ConcurrentHashMap
locks only the bucket.
Hashtable
is not locking portion of map. Look at the implementation. It is using synchronized
key with no lock provided so it basically means that it lock whole hashtable
in each operation.
The "scalability issues" for Hashtable
are present in exactly the same way in Collections.synchronizedMap(Map)
- they use very simple synchronization, which means that only one thread can access the map at the same time.
This is not much of an issue when you have simple inserts and lookups (unless you do it extremely intensively), but becomes a big problem when you need to iterate over the entire Map, which can take a long time for a large Map - while one thread does that, all others have to wait if they want to insert or lookup anything.
The ConcurrentHashMap
uses very sophisticated techniques to reduce the need for synchronization and allow parallel read access by multiple threads without synchronization and, more importantly, provides an Iterator
that requires no synchronization and even allows the Map to be modified during interation (though it makes no guarantees whether or not elements that were inserted during iteration will be returned).
The main difference between these two is that ConcurrentHashMap
will lock only portion of the data which are being updated while other portion of data can be accessed by other threads. However, Collections.synchronizedMap()
will lock all the data while updating, other threads can only access the data when the lock is released. If there are many update operations and relative small amount of read operations, you should choose ConcurrentHashMap
.
Also one other difference is that ConcurrentHashMap
will not preserve the order of elements in the Map passed in. It is similar to HashMap
when storing data. There is no guarantee that the element order is preserved. While Collections.synchronizedMap()
will preserve the elements order of the Map passed in. For example, if you pass a TreeMap
to ConcurrentHashMap
, the elements order in the ConcurrentHashMap
may not be the same as the order in the TreeMap
, but Collections.synchronizedMap()
will preserve the order.
Furthermore, ConcurrentHashMap
can guarantee that there is no ConcurrentModificationException
thrown while one thread is updating the map and another thread is traversing the iterator obtained from the map. However, Collections.synchronizedMap()
is not guaranteed on this.
There is one post which demonstrate the differences of these two and also the ConcurrentSkipListMap
.
ConcurrentHashMap is preferred when you can use it - though it requires at least Java 5.
It is designed to scale well when used by multiple threads. Performance may be marginally poorer when only a single thread accesses the Map at a time, but significantly better when multiple threads access the map concurrently.
I found a blog entry that reproduces a table from the excellent book Java Concurrency In Practice, which I thoroughly recommend.
Collections.synchronizedMap makes sense really only if you need to wrap up a map with some other characteristics, perhaps some sort of ordered map, like a TreeMap.
Synchronized Map:
Synchronized Map is also not very different than Hashtable and provides similar performance in concurrent Java programs. Only difference between Hashtable and SynchronizedMap is that SynchronizedMap is not a legacy and you can wrap any Map to create it’s synchronized version by using Collections.synchronizedMap() method.
ConcurrentHashMap:
The ConcurrentHashMap class provides a concurrent version of the standard HashMap. This is an improvement on the synchronizedMap functionality provided in the Collections class.
Unlike Hashtable and Synchronized Map, it never locks whole Map, instead it divides the map in segments and locking is done on those. It perform better if number of reader threads are greater than number of writer threads.
ConcurrentHashMap by default is separated into 16 regions and locks are applied. This default number can be set while initializing a ConcurrentHashMap instance. When setting data in a particular segment, the lock for that segment is obtained. This means that two updates can still simultaneously execute safely if they each affect separate buckets, thus minimizing lock contention and so maximizing performance.
ConcurrentHashMap doesn’t throw a ConcurrentModificationException
ConcurrentHashMap doesn’t throw a ConcurrentModificationException if one thread tries to modify it while another is iterating over it
Difference between synchornizedMap and ConcurrentHashMap
Collections.synchornizedMap(HashMap) will return a collection which is almost equivalent to Hashtable, where every modification operation on Map is locked on Map object while in case of ConcurrentHashMap, thread-safety is achieved by dividing whole Map into different partition based upon concurrency level and only locking particular portion instead of locking whole Map.
ConcurrentHashMap does not allow null keys or null values while synchronized HashMap allows one null keys.
Similar links
In ConcurrentHashMap
, the lock is applied to a segment instead of an entire Map. Each segment manages its own internal hash table. The lock is applied only for update operations. Collections.synchronizedMap(Map)
synchronizes the entire map.
Hashtable and ConcurrentHashMap do not allow null keys or null values.
Collections.synchronizedMap(Map) synchronizes all operations (get, put, size, etc).
ConcurrentHashMap supports full concurrency of retrievals, and adjustable expected concurrency for updates.
As usual, there are concurrency--overhead--speed tradeoffs involved. You really need to consider the detailed concurrency requirements of your application to make a decision, and then test your code to see if it's good enough.
You are right about HashTable
, you can forget about it.
Your article mentions the fact that while HashTable and the synchronized wrapper class provide basic thread-safety by only allowing one thread at a time to access the map, this is not 'true' thread-safety since many compound operations still require additional synchronization, for example:
synchronized (records) {
Record rec = records.get(id);
if (rec == null) {
rec = new Record(id);
records.put(id, rec);
}
return rec;
}
However, don't think that ConcurrentHashMap
is a simple alternative for a HashMap
with a typical synchronized
block as shown above. Read this article to understand its intricacies better.
Here are few :
1) ConcurrentHashMap locks only portion of Map but SynchronizedMap locks whole MAp. 2) ConcurrentHashMap has better performance over SynchronizedMap and more scalable. 3) In case of multiple reader and Single writer ConcurrentHashMap is best choice.
This text is from Difference between ConcurrentHashMap and hashtable in Java
We can achieve thread safety by using ConcurrentHashMap and synchronisedHashmap and Hashtable. But there is a lot of difference if you look at their architecture.
synchronisedHashmap and Hashtable
Both will maintain the lock at the object level. So if you want to perform any operation like put/get then you have to acquire the lock first. At the same time, other threads are not allowed to perform any operation. So at a time, only one thread can operate on this. So the waiting time will increase here. We can say that performance is relatively low when you comparing with ConcurrentHashMap.
ConcurrentHashMap
It will maintain the lock at segment level. It has 16 segments and maintains the concurrency level as 16 by default. So at a time, 16 threads can be able to operate on ConcurrentHashMap. Moreover, read operation doesn't require a lock. So any number of threads can perform a get operation on it. If thread1 wants to perform put operation in segment 2 and thread2 wants to perform put operation on segment 4 then it is allowed here. Means, 16 threads can perform update(put/delete) operation on ConcurrentHashMap at a time. So that the waiting time will be less here. Hence the performance is relatively better than synchronisedHashmap and Hashtable.
ConcurrentHashMap
ConcurrentHashMap for performance-critical applications where there are far more write operations than there are read operations.
It is thread safe without synchronizing the whole map.
Reads can happen very fast while write is done with a lock.
There is no locking at the object level.
The locking is at a much finer granularity at a hashmap bucket level.
ConcurrentHashMap doesn’t throw a ConcurrentModificationException if one thread tries to modify it while another is iterating over it.
ConcurrentHashMap uses multitude of locks.
read operations are non-blocking, whereas write operations take a lock on a particular segment or bucket.
SynchronizedHashMap
Synchronization at Object level.
Every read/write operation needs to acquire lock.
Locking the entire collection is a performance overhead.
This essentially gives access to only one thread to the entire map & blocks all the other threads.
It may cause contention.
SynchronizedHashMap returns Iterator, which fails-fast on concurrent modification.
Collection.synchronizedMap()
The Collections utility class provides polymorphic algorithms that operate on collections and return wrapped collections. Its synchronizedMap() method provides thread-safe functionality.
We need to use, Collections.synchronizedMap() when data consistency is of utmost importance.
ConcurrentHashMap is optimized for concurrent access.
Accesses don't lock the whole map but use a finer grained strategy, which improves scalability. There are also functional enhanvements specifically for concurrent access, e.g. concurrent iterators.
There is one critical feature to note about ConcurrentHashMap
other than concurrency feature it provides, which is fail-safe iterator. I have seen developers using ConcurrentHashMap
just because they want to edit the entryset - put/remove while iterating over it. Collections.synchronizedMap(Map)
does not provide fail-safe iterator but it provides fail-fast iterator instead. fail-fast iterators uses snapshot of the size of map which can not be edited during iteration.
If Data Consistency is highly important - Use Hashtable or Collections.synchronizedMap(Map). If speed/performance is highly important and Data Updating can be compromised- Use ConcurrentHashMap.
In general, if you want to use the ConcurrentHashMap
make sure you are ready to miss 'updates'
(i.e. printing contents of the HashMap does not ensure it will print the up-to-date Map) and use APIs like CyclicBarrier
to ensure consistency across your program's lifecycle.
Collections.synchronizedMap() method synchronizes all the methods of the HashMap and effectively reduces it to a data structure where one thread can enter at a time because it locks every method on a common lock.
In ConcurrentHashMap synchronization is done a little differently. Rather than locking every method on a common lock, ConcurrentHashMap uses separate lock for separate buckets thus locking only a portion of the Map. By default there are 16 buckets and also separate locks for separate buckets. So the default concurrency level is 16. That means theoretically any given time 16 threads can access ConcurrentHashMap if they all are going to separate buckets.
ConcurrentHashMap was presented as alternative to Hashtable in Java 1.5 as part of concurrency package. With ConcurrentHashMap, you have a better choice not only if it can be safely used in the concurrent multi-threaded environment but also provides better performance than Hashtable and synchronizedMap. ConcurrentHashMap performs better because it locks a part of Map. It allows concurred read operations and the same time maintains integrity by synchronizing write operations.
How ConcurrentHashMap is implemented
ConcurrentHashMap was developed as alternative of Hashtable and support all functionality of Hashtable with additional ability, so called concurrency level. ConcurrentHashMap allows multiple readers to read simultaneously without using blocks. It becomes possible by separating Map to different parts and blocking only part of Map in updates. By default, concurrency level is 16, so Map is spitted to 16 parts and each part is managed by separated block. It means, that 16 threads can work with Map simultaneously, if they work with different parts of Map. It makes ConcurrentHashMap hight productive, and not to down thread-safety.
If you are interested in some important features of ConcurrentHashMap and when you should use this realization of Map - I just put a link to a good article - How to use ConcurrentHashMap in Java
Besides what has been suggested, I'd like to post the source code related to SynchronizedMap
.
To make a Map
thread safe, we can use Collections.synchronizedMap
statement and input the map instance as the parameter.
The implementation of synchronizedMap
in Collections
is like below
public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m) {
return new SynchronizedMap<>(m);
}
As you can see, the input Map
object is wrapped by the SynchronizedMap
object.
Let's dig into the implementation of SynchronizedMap
,
private static class SynchronizedMap<K,V>
implements Map<K,V>, Serializable {
private static final long serialVersionUID = 1978198479659022715L;
private final Map<K,V> m; // Backing Map
final Object mutex; // Object on which to synchronize
SynchronizedMap(Map<K,V> m) {
this.m = Objects.requireNonNull(m);
mutex = this;
}
SynchronizedMap(Map<K,V> m, Object mutex) {
this.m = m;
this.mutex = mutex;
}
public int size() {
synchronized (mutex) {return m.size();}
}
public boolean isEmpty() {
synchronized (mutex) {return m.isEmpty();}
}
public boolean containsKey(Object key) {
synchronized (mutex) {return m.containsKey(key);}
}
public boolean containsValue(Object value) {
synchronized (mutex) {return m.containsValue(value);}
}
public V get(Object key) {
synchronized (mutex) {return m.get(key);}
}
public V put(K key, V value) {
synchronized (mutex) {return m.put(key, value);}
}
public V remove(Object key) {
synchronized (mutex) {return m.remove(key);}
}
public void putAll(Map<? extends K, ? extends V> map) {
synchronized (mutex) {m.putAll(map);}
}
public void clear() {
synchronized (mutex) {m.clear();}
}
private transient Set<K> keySet;
private transient Set<Map.Entry<K,V>> entrySet;
private transient Collection<V> values;
public Set<K> keySet() {
synchronized (mutex) {
if (keySet==null)
keySet = new SynchronizedSet<>(m.keySet(), mutex);
return keySet;
}
}
public Set<Map.Entry<K,V>> entrySet() {
synchronized (mutex) {
if (entrySet==null)
entrySet = new SynchronizedSet<>(m.entrySet(), mutex);
return entrySet;
}
}
public Collection<V> values() {
synchronized (mutex) {
if (values==null)
values = new SynchronizedCollection<>(m.values(), mutex);
return values;
}
}
public boolean equals(Object o) {
if (this == o)
return true;
synchronized (mutex) {return m.equals(o);}
}
public int hashCode() {
synchronized (mutex) {return m.hashCode();}
}
public String toString() {
synchronized (mutex) {return m.toString();}
}
// Override default methods in Map
@Override
public V getOrDefault(Object k, V defaultValue) {
synchronized (mutex) {return m.getOrDefault(k, defaultValue);}
}
@Override
public void forEach(BiConsumer<? super K, ? super V> action) {
synchronized (mutex) {m.forEach(action);}
}
@Override
public void replaceAll(BiFunction<? super K, ? super V, ? extends V> function) {
synchronized (mutex) {m.replaceAll(function);}
}
@Override
public V putIfAbsent(K key, V value) {
synchronized (mutex) {return m.putIfAbsent(key, value);}
}
@Override
public boolean remove(Object key, Object value) {
synchronized (mutex) {return m.remove(key, value);}
}
@Override
public boolean replace(K key, V oldValue, V newValue) {
synchronized (mutex) {return m.replace(key, oldValue, newValue);}
}
@Override
public V replace(K key, V value) {
synchronized (mutex) {return m.replace(key, value);}
}
@Override
public V computeIfAbsent(K key,
Function<? super K, ? extends V> mappingFunction) {
synchronized (mutex) {return m.computeIfAbsent(key, mappingFunction);}
}
@Override
public V computeIfPresent(K key,
BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
synchronized (mutex) {return m.computeIfPresent(key, remappingFunction);}
}
@Override
public V compute(K key,
BiFunction<? super K, ? super V, ? extends V> remappingFunction) {
synchronized (mutex) {return m.compute(key, remappingFunction);}
}
@Override
public V merge(K key, V value,
BiFunction<? super V, ? super V, ? extends V> remappingFunction) {
synchronized (mutex) {return m.merge(key, value, remappingFunction);}
}
private void writeObject(ObjectOutputStream s) throws IOException {
synchronized (mutex) {s.defaultWriteObject();}
}
}
What SynchronizedMap
does can be summarized as adding a single lock to primary method of the input Map
object. All method guarded by the lock can't be accessed by multiple threads at the same time. That means normal operations like put
and get
can be executed by a single thread at the same time for all data in the Map
object.
It makes the Map
object thread safe now but the performance may become an issue in some scenarios.
The ConcurrentMap
is far more complicated in the implementation, we can refer to Building a better HashMap for details. In a nutshell, it's implemented taking both thread safe and performance into consideration.
Success story sharing