从字典中提取键值对的子集？

p

poorva

你可以试试：

dict((k, bigdict[k]) for k in ('l', 'm', 'n'))

... 或在 ~~Python 3~~ Python 2.7 或更高版本中（感谢 Fábio Diniz 指出它也适用于 2.7）：

{k: bigdict[k] for k in ('l', 'm', 'n')}

更新：正如 Håvard S 指出的那样，我假设您知道键将在字典中 - 如果您无法做出该假设，请参阅 his answer。或者，正如 timbo 在评论中指出的那样，如果您希望将 bigdict 中缺少的键映射到 None，您可以这样做：

{k: bigdict.get(k, None) for k in ('l', 'm', 'n')}

如果您使用的是 Python 3，并且您只希望新 dict 中实际存在于原始字典中的键，则可以使用该事实来查看对象实现一些集合操作：

{k: bigdict[k] for k in bigdict.keys() & {'l', 'm', 'n'}}

如果 bigdict 不包含 k 将失败

{k: bigdict.get(k,None) for k in ('l', 'm', 'n')} 将通过将新字典中的键设置为无来处理源字典中缺少指定键的情况

@MarkLongair 根据用例 {k: bigdict[k] for k in ('l','m','n') if k in bigdict} 可能会更好，因为它只存储实际具有值的键。

bigdict.keys() & {'l', 'm', 'n'} ==> bigdict.viewkeys() & {'l', 'm', 'n'} 用于 Python2.7

最后一个解决方案很好，因为您可以替换 '&'使用 - 来获得“所有键除外”操作。不幸的是，这会导致字典的键顺序不同（即使在 python 3.7 和 3.8 中）

H

Håvard S

至少短一点：

wanted_keys = ['l', 'm', 'n'] # The keys you want
dict((k, bigdict[k]) for k in wanted_keys if k in bigdict)

+1 用于排除键的替代行为，如果它不在 bigdict 中，而不是将其设置为 None。

或者：dict((k,bigdict.get(k,defaultVal) for k in wanted_keys) 如果您必须拥有所有密钥。

此答案由“t”保存。

使用 {} 时，您的解决方案还有一个更短的变体（语法），即 {k: bigdict[k] for k in wanted_keys if k in bigdict}

t

theheadofabroom

interesting_keys = ('l', 'm', 'n')
subdict = {x: bigdict[x] for x in interesting_keys if x in bigdict}

@loutre 您还建议如何确保提取给定键的所有数据？

对不起，我犯了一个错误。我以为你在循环播放“bigdict”。我的错。我删除我的评论

S

Sklavit

所有提到的方法的一些速度比较：

更新于 2020.07.13（感谢@user3780389）：仅适用于来自 bigdict 的密钥。

 IPython 5.5.0 -- An enhanced Interactive Python.
Python 2.7.18 (default, Aug  8 2019, 00:00:00) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
import numpy.random as nprnd
  ...: keys = nprnd.randint(100000, size=10000)
  ...: bigdict = dict([(_, nprnd.rand()) for _ in range(100000)])
  ...: 
  ...: %timeit {key:bigdict[key] for key in keys}
  ...: %timeit dict((key, bigdict[key]) for key in keys)
  ...: %timeit dict(map(lambda k: (k, bigdict[k]), keys))
  ...: %timeit {key:bigdict[key] for key in set(keys) & set(bigdict.keys())}
  ...: %timeit dict(filter(lambda i:i[0] in keys, bigdict.items()))
  ...: %timeit {key:value for key, value in bigdict.items() if key in keys}
100 loops, best of 3: 2.36 ms per loop
100 loops, best of 3: 2.87 ms per loop
100 loops, best of 3: 3.65 ms per loop
100 loops, best of 3: 7.14 ms per loop
1 loop, best of 3: 577 ms per loop
1 loop, best of 3: 563 ms per loop

正如预期的那样：字典理解是最好的选择。

前 3 个操作与后两个操作不同，如果 bigdict 中不存在 key，则会导致错误。

好的。也许值得从完成过滤的 accepted solution 添加 {key:bigdict[key] for key in bigdict.keys() & keys}，同时实际上（在我的机器上）比您列出的不过滤的第一种方法更快。事实上，对于这些非常大的键集，{key:bigdict[key] for key in set(keys) & set(bigdict.keys())} 似乎更快......

@telchert 你错过了，在给出速度比较中 bigdict.keys() 和键没有设置。并且通过显式转换为设置接受的解决方案并没有那么快。

M

Meow

此答案使用类似于所选答案的字典理解，但除了缺少的项目外不会。

蟒蛇2版本：

{k:v for k, v in bigDict.iteritems() if k in ('l', 'm', 'n')}

蟒蛇3版本：

{k:v for k, v in bigDict.items() if k in ('l', 'm', 'n')}

...但是如果大 dict 非常大，它仍然会被完全迭代（这是一个 O(n) 操作），而相反的只会抓取 3 个项目（每个项目都是 O(1) 操作）。

问题是关于只有 16 个键的字典

p

phimuemue

也许：

subdict=dict([(x,bigdict[x]) for x in ['l', 'm', 'n']])

Python 3 甚至支持以下内容：

subdict={a:bigdict[a] for a in ['l','m','n']}

请注意，您可以在字典中检查是否存在，如下所示：

subdict=dict([(x,bigdict[x]) for x in ['l', 'm', 'n'] if x in bigdict])

分别对于蟒蛇 3

subdict={a:bigdict[a] for a in ['l','m','n'] if a in bigdict}

如果 a 不在 bigdict 中则失败

据说只适用于 python 3 的东西，也适用于 2.7

p

petezurich

您还可以使用 map（无论如何，这是一个非常有用的功能来了解）：

sd = dict(map(lambda k: (k, l.get(k, None)), l))

例子：

large_dictionary = {'a1':123, 'a2':45, 'a3':344}
list_of_keys = ['a1', 'a3']
small_dictionary = dict(map(lambda key: (key, large_dictionary.get(key, None)), list_of_keys))

PS：我从以前的答案中借用了 .get(key, None) :)

K

Kevin Grimm

如果您想保留大部分密钥同时删除一些密钥，另一种方法是：

{k: bigdict[k] for k in bigdict.keys() if k not in ['l', 'm', 'n']}

更短：{k: v for k, v in bigdict.items() if k not in ['l', 'm', 'n']}

p

pandamonium

好的，这是困扰我几次的事情，所以谢谢 Jayesh 的提问。

上面的答案似乎是一个很好的解决方案，但是如果您在整个代码中都使用它，那么包装功能恕我直言是有意义的。此外，这里有两种可能的用例：一种是您关心是否所有关键字都在原始字典中。和一个你不知道的地方。平等对待两者会很好。

所以，为了我的两分钱，我建议写一个字典的子类，例如

class my_dict(dict):
    def subdict(self, keywords, fragile=False):
        d = {}
        for k in keywords:
            try:
                d[k] = self[k]
            except KeyError:
                if fragile:
                    raise
        return d

现在您可以使用

orig_dict.subdict(keywords)

使用示例：

#
## our keywords are letters of the alphabet
keywords = 'abcdefghijklmnopqrstuvwxyz'
#
## our dictionary maps letters to their index
d = my_dict([(k,i) for i,k in enumerate(keywords)])
print('Original dictionary:\n%r\n\n' % (d,))
#
## constructing a sub-dictionary with good keywords
oddkeywords = keywords[::2]
subd = d.subdict(oddkeywords)
print('Dictionary from odd numbered keys:\n%r\n\n' % (subd,))
#
## constructing a sub-dictionary with mixture of good and bad keywords
somebadkeywords = keywords[1::2] + 'A'
try:
    subd2 = d.subdict(somebadkeywords)
    print("We shouldn't see this message")
except KeyError:
    print("subd2 construction fails:")
    print("\toriginal dictionary doesn't contain some keys\n\n")
#
## Trying again with fragile set to false
try:
    subd3 = d.subdict(somebadkeywords, fragile=False)
    print('Dictionary constructed using some bad keys:\n%r\n\n' % (subd3,))
except KeyError:
    print("We shouldn't see this message")

如果您运行上述所有代码，您应该会看到（类似于）以下输出（抱歉格式化）：

原字典：{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8，'h'：7，'k'：10，'j'：9，'m'：12，'l'：11，'o'：14，'n'：13，'q'：16， 'p'：15，'s'：18，'r'：17，'u'：20，'t'：19，'w'：22，'v'：21，'y'：24，'x ': 23, 'z': 25} 奇数键的字典：{'a': 0, 'c': 2, 'e': 4, 'g': 6, 'i': 8, 'k' : 10, 'm': 12, 'o': 14, 'q': 16, 's': 18, 'u': 20, 'w': 22, 'y': 24} subd2 构造失败：原始字典不包含一些键使用一些坏键构造的字典：{'b': 1, 'd': 3, 'f': 5, 'h': 7, 'j': 9, 'l': 11 , 'n': 13, 'p': 15, 'r': 17, 't': 19, 'v': 21, 'x': 23, 'z': 25}

子类化需要将现有的 dict 对象转换为子类类型，这可能很昂贵。为什么不写一个简单的函数 subdict(orig_dict, keys, …)？

@musiphil：我怀疑开销有很大差异。子类化的好处是方法是类的一部分，不需要导入或内联。此答案中代码的唯一潜在问题或限制是结果 not 属于 my_dict 类型。

g

georg

还有一个（我更喜欢 Mark Longair 的回答）

di = {'a':1,'b':2,'c':3}
req = ['a','c','w']
dict([i for i in di.iteritems() if i[0] in di and i[0] in req])

bigdict 的速度很慢

D

DmitrySemenov

解决方案

from operator import itemgetter
from typing import List, Dict, Union


def subdict(d: Union[Dict, List], columns: List[str]) -> Union[Dict, List[Dict]]:
    """Return a dict or list of dicts with subset of 
    columns from the d argument.
    """
    getter = itemgetter(*columns)

    if isinstance(d, list):
        result = []
        for subset in map(getter, d):
            record = dict(zip(columns, subset))
            result.append(record)
        return result
    elif isinstance(d, dict):
        return dict(zip(columns, getter(d)))

    raise ValueError('Unsupported type for `d`')

使用示例

# pure dict

d = dict(a=1, b=2, c=3)
print(subdict(d, ['a', 'c']))

>>> In [5]: {'a': 1, 'c': 3}

# list of dicts

d = [
    dict(a=1, b=2, c=3),
    dict(a=2, b=4, c=6),
    dict(a=4, b=8, c=12),
]

print(subdict(d, ['a', 'c']))

>>> In [5]: [{'a': 1, 'c': 3}, {'a': 2, 'c': 6}, {'a': 4, 'c': 12}]

n

ntg

使用 map （halfdanrump 的回答）对我来说是最好的，虽然还没有计时......

但是，如果你去找一本字典，并且你有一个 big_dict：

绝对确定你遍历 req.这是至关重要的，并且会影响算法的运行时间（大 O，theta，您可以命名它）将其写得足够通用以避免在没有键时出错。

所以例如：

big_dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w']

{k:big_dict.get(k,None) for k in req )
# or 
{k:big_dict[k] for k in req if k in big_dict)

请注意，在相反的情况下，req 很大，但 my_dict 很小，您应该循环遍历 my_dict。

一般来说，我们正在做一个交集和the complexity of the problem is O(min(len(dict)),min(len(req)))。 Python 的 own implementation of intersection 考虑了这两个集合的大小，因此它似乎是最优的。此外，在 c 和核心库的一部分中，可能比大多数未优化的 python 语句更快。因此，我会考虑的解决方案是：

dict = {'a':1,'b':2,'c':3,................................................}
req = ['a','c','w',...................]

{k:dic[k] for k in set(req).intersection(dict.keys())}

它将关键操作移动到 python 的 c 代码中，适用于所有情况。

佚

佚名

如果有人在不知道键的情况下想要字典的前几项 n：

n = 5 # First Five Items
ks = [*dikt.keys()][:n]
less_dikt = {i: dikt[i] for i in ks}

从字典中提取键值对的子集？

关注公众号

想领先一步获取最新的外包任务吗？

相似问题

平台

支持

友情链接

联系我们