ChatGPT解决这个技术问题 Extra ChatGPT

Should I use 'has_key()' or 'in' on Python dicts?

Given:

>>> d = {'a': 1, 'b': 2}

Which of the following is the best way to check if 'a' is in d?

>>> 'a' in d
True
>>> d.has_key('a')
True

b
bluish

in is definitely more pythonic.

In fact has_key() was removed in Python 3.x.


As an addition, in Python 3, to check for the existence in values, instead of the keys, try >>> 1 in d.values()
One semi-gotcha to avoid though is to make sure you do: "key in some_dict" rather than "key in some_dict.keys()". Both are equivalent semantically, but performance-wise the latter is much slower (O(n) vs O(1)). I've seen people do the "in dict.keys()" thinking it's more explicit & therefore better.
@AdamParkin I demonstrated your comment in my answer stackoverflow.com/a/41390975/117471
@AdamParkin In Python 3, keys() is just a set-like view into a dictionary rather than a copy, so x in d.keys() is O(1). Still, x in d is more Pythonic.
@AdamParkin Interesting, I didn't see that. I suppose it's because x in d.keys() must construct and destroy a temporary object, complete with the memory allocation that entails, where x in d.keys() is just doing an arithmetic operation (computing the hash) and doing a lookup. Note that d.keys() is only about 10 times as long as this, which is still not long really. I haven't checked but I'm still pretty sure it's only O(1).
A
Alex Martelli

in wins hands-down, not just in elegance (and not being deprecated;-) but also in performance, e.g.:

$ python -mtimeit -s'd=dict.fromkeys(range(99))' '12 in d'
10000000 loops, best of 3: 0.0983 usec per loop
$ python -mtimeit -s'd=dict.fromkeys(range(99))' 'd.has_key(12)'
1000000 loops, best of 3: 0.21 usec per loop

While the following observation is not always true, you'll notice that usually, in Python, the faster solution is more elegant and Pythonic; that's why -mtimeit is SO helpful -- it's not just about saving a hundred nanoseconds here and there!-)


Thanks for this, made verifying that "in some_dict" is in fact O(1) much easier (try increasing the 99 to say 1999, and you'll find the runtime is about the same).
has_key appears to be O(1) too.
j
jamylak

According to python docs:

has_key() is deprecated in favor of key in d.


has_key() is now removed in Python 3
M
Mike Samuel

Use dict.has_key() if (and only if) your code is required to be runnable by Python versions earlier than 2.3 (when key in dict was introduced).


The WebSphere update in 2013 uses Jython 2.1 as its main scripting language. So this is unfortunately still a useful thing to note, five years after you noted it.
s
schlenk

There is one example where in actually kills your performance.

If you use in on a O(1) container that only implements __getitem__ and has_key() but not __contains__ you will turn an O(1) search into an O(N) search (as in falls back to a linear search via __getitem__).

Fix is obviously trivial:

def __contains__(self, x):
    return self.has_key(x)

This answer was applicable when it was posted, but 99.95% of readers can safely ignore it. In most cases, if you're working with something this obscure you'll know it.
This really is not an issue. has_key() is specific to Python 2 dictionaries. in / __contains__ is the correct API to use; for those containers where a full scan is unavoidable there is no has_key() method anyway, and if there is a O(1) approach then that'll be use-case specific and so up to the developer to pick the right data type for the problem.
G
Greena modi

Solution to dict.has_key() is deprecated, use 'in' -- sublime text editor 3

Here I have taken an example of dictionary named 'ages' -

ages = {}

# Add a couple of names to the dictionary
ages['Sue'] = 23

ages['Peter'] = 19

ages['Andrew'] = 78

ages['Karren'] = 45

# use of 'in' in if condition instead of function_name.has_key(key-name).
if 'Sue' in ages:

    print "Sue is in the dictionary. She is", ages['Sue'], "years old"

else:

    print "Sue is not in the dictionary"

Correct, but it was already answered, welcome to Stackoveflow, thanks for the example, always check the answers though!
@igorgue im not sure about the downvotes to her. Her answer might be similar to the ones already answered, but she provides an example. Isnt that worthy enough to be an answer of SO?
B
Bruno Bronosky

Expanding on Alex Martelli's performance tests with Adam Parkin's comments...

$ python3.5 -mtimeit -s'd=dict.fromkeys(range( 99))' 'd.has_key(12)'
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/timeit.py", line 301, in main
    x = t.timeit(number)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/timeit.py", line 178, in timeit
    timing = self.inner(it, self.timer)
  File "<timeit-src>", line 6, in inner
    d.has_key(12)
AttributeError: 'dict' object has no attribute 'has_key'

$ python2.7 -mtimeit -s'd=dict.fromkeys(range(  99))' 'd.has_key(12)'
10000000 loops, best of 3: 0.0872 usec per loop

$ python2.7 -mtimeit -s'd=dict.fromkeys(range(1999))' 'd.has_key(12)'
10000000 loops, best of 3: 0.0858 usec per loop

$ python3.5 -mtimeit -s'd=dict.fromkeys(range(  99))' '12 in d'
10000000 loops, best of 3: 0.031 usec per loop

$ python3.5 -mtimeit -s'd=dict.fromkeys(range(1999))' '12 in d'
10000000 loops, best of 3: 0.033 usec per loop

$ python3.5 -mtimeit -s'd=dict.fromkeys(range(  99))' '12 in d.keys()'
10000000 loops, best of 3: 0.115 usec per loop

$ python3.5 -mtimeit -s'd=dict.fromkeys(range(1999))' '12 in d.keys()'
10000000 loops, best of 3: 0.117 usec per loop

Wonderful statistics, sometimes implicit might be better than explicit (at least in efficiency)...
Thank you, @varun. I had forgotten about this answer. I need to do this kind of testing more often. I regularly read long threads where people argue about The Best Way™ to do things. But I rarely remember how easy this was to get proof.
this experiment has a defect, it mixed the dict creation time with the key searching time. it is better to separate the two to measure the time spent on key searching only. once you separate the two, the timing result would show that both 'key in D' and 'key in D.keys()' appear to be O(1). No essential difference, although key in D.keys() is a bit slower than key in D, it is not O(N) vs O(1).
i used python3, so the conclusion i had was for python3 (in python2 likely it is O(N) vs O(1)), but i did not see this in python3.
u
u0b34a0f6ae

has_key is a dictionary method, but in will work on any collection, and even when __contains__ is missing, in will use any other method to iterate the collection to find out.


And does also work on iterators "x in xrange(90, 200) <=> 90 <= x < 200"
…: This looks like a very bad idea: 50 operations instead of 2.
@Clément In Python 3, it's actually quite efficient to do in tests on range objects. I'm not so sure about its efficiency on Python 2 xrange, though. ;)
@Clément not in Python 3; __contains__ can trivially calculate if a value is in the range or not.
@AlexandreHuat Your timing includes the overhead of creating a new range instance each time. Using a single, pre-existing instance the "integer in range" test is about 40% faster in my timings.
K
Kirby

If you have something like this:

if d.has_key('a'):

change it to below for running on Python 3.X and above:

if 'a' in d:

No, you inverted the test. t.has_key(ew) returns True if the value ew references is also a key in the dictionary. key not in t returns True if the value is not in the dictionary. Moreover, the key = ew alias is very, very redundant. The correct spelling is if ew in t. Which is what the accepted answer from 8 years prior already told you.
changed the answer. =) Just to avoid people misleading.