Dictionaries are insertion ordered as of Python 3.6. It is described as a CPython implementation detail rather than a language feature. The documentation states:
dict() now uses a “compact” representation pioneered by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5. PEP 468 (Preserving the order of **kwargs in a function.) is implemented by this. The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5). (Contributed by INADA Naoki in issue 27350. Idea originally suggested by Raymond Hettinger.)
How does the new dictionary implementation perform better than the older one while preserving element order?
Update December 2017: dict
s retaining insertion order is guaranteed for Python 3.7
**kwargs
and as such the wording used is diplomatic: **kwargs
in a function signature is now guaranteed to be an insertion-order-preserving mapping. They've used the term mapping in order to not force any other implementations to make the dict ordered (and use an OrderedDict
internally) and as a way to signal that this isn't supposed to depend on the fact that the dict
is not ordered.
Are dictionaries ordered in Python 3.6+?
They are insertion ordered[1].
As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict
if you want insertion ordering that's guaranteed across other implementations of Python (and other ordered behavior[1]).
As of Python 3.7, this is a guaranteed language feature, not merely an implementation detail. From a python-dev message by GvR:
Make it so. "Dict keeps insertion order" is the ruling. Thanks!
This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.
How does the Python 3.6 dictionary implementation perform better[2] than the older one while preserving element order?
Essentially, by keeping two arrays.
The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).
The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)
In the previous implementation, a sparse array of type PyDictKeyEntry
and size dk_size
had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size
full for performance reasons. (and the empty space still had PyDictKeyEntry
size!).
This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t
(X
depending on dict size) 2/3 * dk_size
s full is kept. The empty space changed from type PyDictKeyEntry
to intX_t
.
So, obviously, creating a sparse array of type PyDictKeyEntry
is much more memory demanding than a sparse array for storing int
s.
You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.
In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.
For example, the dictionary: d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'} is currently stored as [keyhash, key, value]: entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']] Instead, the data should be organized as follows: indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']]
As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.
[1]: I say "insertion ordered" and not "ordered" since, with the existence of OrderedDict, "ordered" suggests further behavior that the `dict` object *doesn't provide*. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (`==`, `!=`). `dict`s currently don't offer any of those behaviors/methods.
[2]: The new dictionary implementations performs better **memory wise** by being designed more compactly; that's the main benefit here. Speed wise, the difference isn't so drastic, there's places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.
Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.
Below is answering the original first question:
Should I use dict or OrderedDict in Python 3.6?
I think this sentence from the documentation is actually enough to answer your question
The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon
dict
is not explicitly meant to be an ordered collection, so if you want to stay consistent and not rely on a side effect of the new implementation you should stick with OrderedDict
.
Make your code future proof :)
There's a debate about that here.
EDIT: Python 3.7 will keep this as a feature see
Update: Guido van Rossum announced on the mailing list that as of Python 3.7 dict
s in all Python implementations must preserve insertion order.
move_to_end
method and its equality is order sensitive: docs.python.org/3/library/…. See the note on Jim Fasarakis Hilliard's answer.
I wanted to add to the discussion above but don't have the reputation to comment.
Python 3.8 includes the reversed()
function on dictionaries (removing another difference from OrderedDict
.
Dict and dictviews are now iterable in reversed insertion order using reversed(). (Contributed by Rémi Lapeyre in bpo-33462.) See what's new in python 3.8
I don't see any mention of the equality operator or other features of OrderedDict
so they are still not entirely the same.
To fully answer this question in 2020, let me quote several statements from official Python docs:
Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6.
Changed in version 3.7: Dictionary order is guaranteed to be insertion order.
Changed in version 3.8: Dictionaries are now reversible.
Dictionaries and dictionary views are reversible.
A statement regarding OrderedDict vs Dict:
Ordered dictionaries are just like regular dictionaries but have some extra capabilities relating to ordering operations. They have become less important now that the built-in dict class gained the ability to remember insertion order (this new behavior became guaranteed in Python 3.7).
If order is so important to determine like whether two dictionaries are equal, you still need to use OrderedDict
.
To implement this feature, OrderedDict uses more memory (a bidirectional linked list is maintained internally)
from collections import OrderedDict
a = {"a": 1, "b": 2}
b = {"b": 2, "a": 1}
c = OrderedDict({"a": 1, "b": 2})
d = OrderedDict({"b": 2, "a": 1})
print(a == b) # True
print(c == d) # False
print(a == c) # True
print(a == d) # True
Success story sharing
entries
list resized? or is a blank space kept? or is it compressed from time to time?DKIX_DUMMY
with a value of-2
and the entry in theentry
array replaced byNULL
, when inserting is performed the new values are appended to the entries array, Haven't been able to discern yet, but pretty sure when the indices fills up beyond the2/3
threshold resizing is performed. This can lead to shrinking instead of growing if manyDUMMY
entries exist.d = {i:i for i in range(100)}
and you.pop
all items w/o inserting, the size won't change. When you add to it again,d[1] = 1
, the appropriate size is calculated and the dict resizes.dict
being ordered',dict
s aren't ordered in the senseOrderedDict
s are. The notable issue is equality.dict
s have order insensitive==
,OrderedDict
s have order sensitive ones. DumpingOrderedDict
s and changingdicts
to now have order sensitive comparisons could lead to a lot of breakage in old code. I'm guessing the only thing that might change aboutOrderedDict
s is its implementation.