检查另一个字符串中是否存在多个字符串

python arrays string exists

如何检查数组中的任何字符串是否存在于另一个字符串中？

喜欢：

a = ['a', 'b', 'c']
str = "a123"
if a in str:
  print "some of the strings found in str"
else:
  print "no strings found in str"

该代码不起作用，它只是为了显示我想要实现的目标。

我很惊讶在性能方面与已编译的正则表达式相比（还没有）任何答案，尤其是与要搜索的字符串的大小和“针”的数量相比。

@Pat 我并不感到惊讶。问题不在于性能。今天大多数程序员更关心完成它和可读性。性能问题是有效的，但问题不同。

使用 str 作为变量会造成混淆，并且可能会导致意外行为，因为它是保留字；见link。

正则表达式 [abc] 也可以很好地工作，如果要测试的候选人超过几个，速度会更快。但是，如果字符串是任意的，并且您事先不知道它们来构造正则表达式，则必须使用 any(x in str for x in a) 方法。

@CleverGuy您是对的，尽管它不是保留字，否则您将无法分配给它。这是一个内置的。

rjurney

您可以使用 any：

a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]

if any(x in a_string for x in matches):

与检查是否找到列表中的所有字符串类似，请使用 all 而不是 any。

any() 接受一个可迭代的。我不确定您使用的是哪个版本的 Python，但在 2.6 中，您需要在 any() 的参数周围加上 []。 any([x in str for x in a]) 以便理解返回一个可迭代的。但也许更高版本的 Python 已经这样做了。

@Mark Byers：很抱歉评论晚了，但是有没有办法打印找到的字符串？你会怎么做。谢谢你。

不确定我是否理解，如果 a 是列表，而 str 是要匹配的东西，x 是什么？ Python新手ftw。 :)

@emispowder 它在 Python 2.6.9 中对我来说很好用。

@emispowder：Generator expressions 是在 2.4 中引入的。

zondo

如果您想要的只是 True 或 False，any() 是迄今为止最好的方法，但是如果您想具体了解哪些字符串/字符串匹配，您可以使用一些东西。

如果您想要第一个匹配项（默认使用 False）：

match = next((x for x in a if x in str), False)

如果要获取所有匹配项（包括重复项）：

matches = [x for x in a if x in str]

如果要获取所有非重复匹配项（不考虑顺序）：

matches = {x for x in a if x in str}

如果您想以正确的顺序获取所有非重复匹配项：

matches = []
for x in a:
    if x in str and x not in matches:
        matches.append(x)

请也为最后一场比赛添加示例

@OlegKokorin：它按照找到它们的顺序创建一个匹配字符串的列表，但如果两个相同，它只保留第一个。

使用 OrderedDict 可能比列表更高效。请参阅this answer on "Removing duplicates in lists"

你能举个例子吗？

jbernadas

如果 a 或 str 中的字符串变长，您应该小心。直接的解决方案采用 O(S*(A^2))，其中 S 是 str 的长度，A 是 a 中所有字符串的长度之和。要获得更快的解决方案，请查看用于字符串匹配的 Aho-Corasick 算法，该算法在线性时间 O(S+A) 中运行。

Aho-Corasick 也可以找到子字符串而不是前缀吗？

一些 python Aho-Corasick 库是 here 和 here

Shankar ARUL

使用 regex 增加一些多样性：

import re

if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
    print 'possible matches thanks to regex'
else:
    print 'no matches'

或者如果您的列表太长 - any(re.findall(r'|'.join(a), str, re.IGNORECASE))

这适用于问题的给定用例。如果您搜索 ( 或 * 失败，因为需要引用正则表达式语法。

如有必要，您可以使用 '|'.join(map(re.escape, strings_to_match)) 对其进行转义。您可能也想re.compile('|'.join(...))。

时间复杂度是多少？

Berislav Lopac

一种非常快速的方法是使用 set：

a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
    print("some of the strings found in str")
else:
    print("no strings found in str")

如果 a 不包含任何多字符值（在这种情况下使用列出的 above 中的 any），则此方法有效。如果是这样，将 a 指定为字符串会更简单：a = 'abc'。

zondo

您需要迭代 a 的元素。

a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:    
    if item in str:
        found_a_string = True

if found_a_string:
    print "found a match"
else:
    print "no match found"

是的，我知道该怎么做，但与 Marks 的答案相比，那是可怕的代码。

仅当您了解 Mark 的代码时。您遇到的问题是您没有检查数组的元素。有很多简洁的 Pythonic 方法可以完成你想要的，这会隐藏你的代码错误的本质。

它可能是“可怕的代码”，但它是 exactly what any() does。此外，这会为您提供匹配的实际字符串，而 any() 只是告诉您存在匹配项。

Domi W

jbernadas 已经提到 Aho-Corasick-Algorithm 以降低复杂性。

这是在 Python 中使用它的一种方法：

从这里下载 aho_corasick.py 将它放在与你的主要 Python 文件相同的目录中，并将其命名为 aho_corasick.py 使用以下代码尝试算法： from aho_corasick import aho_corasick #(string, keywords) print(aho_corasick(string, ["keyword1 ", "关键字2"]))

请注意，搜索区分大小写

Jerald Cogswell

在另一个字符串列表中查找多个字符串的一种紧凑方法是使用 set.intersection。这比大型集合或列表中的列表理解执行得快得多。

>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring)  # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>

mluebke

a = ['a', 'b', 'c']
str =  "a123"

a_match = [True for match in a if match in str]

if True in a_match:
  print "some of the strings found in str"
else:
  print "no strings found in str"

Nilesh Birari

只是有关如何获取 String 中可用的所有列表元素的更多信息

a = ['a', 'b', 'c']
str = "a123" 
list(filter(lambda x:  x in str, a))

sjd

还有另一种解决方案。使用 set.intersection。对于单线。

subset = {"some" ,"words"} 
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
   print("All values present in text")

if subset & set(text.split()):
   print("Atleast one values present in text")

balki

python docs 中推荐的 regex 模块支持此功能

words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)

输出：

['he', 'low', 'or']

一些实施细节：link

我在 \L 上找不到任何文档。你能指点我吗？

@DaniloSouzaMorães github.com/mrabarnett/mrab-regex#named-lists-hg-issue-11

Trinadh Koya

这取决于上下文假设如果您想检查单个文字，例如（任何单个单词 a、e、w、..etc）就足够了

original_word ="hackerearcth"
for 'h' in original_word:
      print("YES")

如果您想检查 original_word 中的任何字符：使用

if any(your_required in yourinput for your_required in original_word ):

如果您想要该 original_word 中所需的所有输入，请使用所有简单的

original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
    print("yes")

你的输入是什么？我可以识别两件事：我正在寻找某物的句子。我正在寻找的单词数组。但是你描述了三个变量，我不知道第三个是什么。

Stephen Rauch

flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
     for fstr in strlist:
         if line.find(fstr) != -1:
            print('found') 
            res = True


if res:
    print('res true')
else: 
    print('res false')

https://i.stack.imgur.com/JGKMt.png

Ivan Mikhailov

我会使用这种功能来提高速度：

def check_string(string, substring_list):
    for substring in substring_list:
        if substring in string:
            return True
    return False

Robert I

data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']


# for each
for field in mandatory_fields:
    if field not in data:
        print("Error, missing req field {0}".format(field));

# still fine, multiple if statements
if ('firstName' not in data or 
    'lastName' not in data or
    'age' not in data):
    print("Error, missing a req field");

# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
    print("Error, missing fields {0}".format(", ".join(missing_fields)));

Spirit of the Void

如果您想要单词的完全匹配，请考虑对目标字符串进行单词标记。我使用来自 nltk 的推荐 word_tokenize：

from nltk.tokenize import word_tokenize

这是来自接受的答案的标记化字符串：

a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']

接受的答案修改如下：

matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]

与接受的答案一样，“更多”一词仍然匹配。但是，如果“mo”成为匹配字符串，则接受的答案仍会找到匹配项。这是我不想要的行为。

matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]

使用词标记化，“mo”不再匹配：

[x in tokens for x in matches_2]
Out[44]: [False, False, False]

这就是我想要的额外行为。此答案还回答了重复的问题 here。

检查另一个字符串中是否存在多个字符串

关注公众号

想领先一步获取最新的外包任务吗？

相似问题

平台

支持

友情链接

联系我们