Check if multiple strings exist in another string

python arrays string exists

How can I check if any of the strings in an array exists in another string?

Like:

a = ['a', 'b', 'c']
str = "a123"
if a in str:
  print "some of the strings found in str"
else:
  print "no strings found in str"

That code doesn't work, it's just to show what I want to achieve.

I'm surprised there aren't (yet) any answers comparing to a compiled regex in terms of perf, especially compared to size of the string and number of "needles" to search for.

@Pat I am not surprised. The question is not about performance. Today most programmers care more for getting it done and readability. The performance question is valid, but a different question.

Using str as a variable is confusing and may result in unexpected behavior as it is a reserved word; see link.

regex [abc] also works perfectly well and will be faster if there are more than a couple of candidates to test. But if the strings are arbitrary and you don't know them in advance to construct a regex, you will have to use the any(x in str for x in a) approach.

@CleverGuy You're right, though it's not a reserved word, otherwise you wouldn't be able to assign to it. It's a builtin.

rjurney

You can use any:

a_string = "A string is more than its parts!"
matches = ["more", "wholesome", "milk"]

if any(x in a_string for x in matches):

Similarly to check if all the strings from the list are found, use all instead of any.

any() takes an iterable. I am not sure which version of Python you are using but in 2.6 you will need to put [] around your argument to any(). any([x in str for x in a]) so that the comprehension returns an iterable. But maybe later versions of Python already do this.

@Mark Byers: Sorry for the late comment, but is there a way to print the string that was found? How would you do this. Thank you.

Not sure I understand, if a is the list, and str is the thing to match against, what is the x? Python newbie ftw. :)

@emispowder It works fine for me as-is in Python 2.6.9.

@emispowder: Generator expressions were introduced in 2.4.

zondo

any() is by far the best approach if all you want is True or False, but if you want to know specifically which string/strings match, you can use a couple things.

If you want the first match (with False as a default):

match = next((x for x in a if x in str), False)

If you want to get all matches (including duplicates):

matches = [x for x in a if x in str]

If you want to get all non-duplicate matches (disregarding order):

matches = {x for x in a if x in str}

If you want to get all non-duplicate matches in the right order:

matches = []
for x in a:
    if x in str and x not in matches:
        matches.append(x)

please add example for the last match too

@OlegKokorin: It creates a list of matching strings in the same order it finds them, but it keeps only the first one if two are the same.

Using an OrderedDict is probably more performant than a list. See this answer on "Removing duplicates in lists"

Can you provide an example?

jbernadas

You should be careful if the strings in a or str gets longer. The straightforward solutions take O(S*(A^2)), where S is the length of str and A is the sum of the lenghts of all strings in a. For a faster solution, look at Aho-Corasick algorithm for string matching, which runs in linear time O(S+A).

can Aho-Corasick also find substrings instead of prefixes ?

Some python Aho-Corasick libraries are here and here

Shankar ARUL

Just to add some diversity with regex:

import re

if any(re.findall(r'a|b|c', str, re.IGNORECASE)):
    print 'possible matches thanks to regex'
else:
    print 'no matches'

or if your list is too long - any(re.findall(r'|'.join(a), str, re.IGNORECASE))

This works for the given use case of the question. If the you search for ( or * this fails, since quoting for the regex syntax needs to be done.

You can escape it if necessary with '|'.join(map(re.escape, strings_to_match)). You sould probably re.compile('|'.join(...)) as well.

And What's the time complexity ?

Berislav Lopac

A surprisingly fast approach is to use set:

a = ['a', 'b', 'c']
str = "a123"
if set(a) & set(str):
    print("some of the strings found in str")
else:
    print("no strings found in str")

This works if a does not contain any multiple-character values (in which case use any as listed above). If so, it's simpler to specify a as a string: a = 'abc'.

zondo

You need to iterate on the elements of a.

a = ['a', 'b', 'c']
str = "a123"
found_a_string = False
for item in a:    
    if item in str:
        found_a_string = True

if found_a_string:
    print "found a match"
else:
    print "no match found"

Yes i knew how to do that but compared to Marks answer, that's horrible code.

Only if you understand Mark's code. The problem you were having is that you weren't examining the elements of your array. There are a lot of terse, pythonic ways to accomplish what you want that would hide the essence of what was wrong with your code.

It may be 'horrible code' but it's exactly what any() does. Also, this gives you the actual string that matched, whereas any() just tells you there is a match.

Domi W

jbernadas already mentioned the Aho-Corasick-Algorithm in order to reduce complexity.

Here is one way to use it in Python:

Download aho_corasick.py from here Put it in the same directory as your main Python file and name it aho_corasick.py Try the alrorithm with the following code: from aho_corasick import aho_corasick #(string, keywords) print(aho_corasick(string, ["keyword1", "keyword2"]))

Note that the search is case-sensitive

Jerald Cogswell

A compact way to find multiple strings in another list of strings is to use set.intersection. This executes much faster than list comprehension in large sets or lists.

>>> astring = ['abc','def','ghi','jkl','mno']
>>> bstring = ['def', 'jkl']
>>> a_set = set(astring)  # convert list to set
>>> b_set = set(bstring)
>>> matches = a_set.intersection(b_set)
>>> matches
{'def', 'jkl'}
>>> list(matches) # if you want a list instead of a set
['def', 'jkl']
>>>

mluebke

a = ['a', 'b', 'c']
str =  "a123"

a_match = [True for match in a if match in str]

if True in a_match:
  print "some of the strings found in str"
else:
  print "no strings found in str"

Nilesh Birari

Just some more info on how to get all list elements availlable in String

a = ['a', 'b', 'c']
str = "a123" 
list(filter(lambda x:  x in str, a))

sjd

Yet another solution with set. using set.intersection. For a one-liner.

subset = {"some" ,"words"} 
text = "some words to be searched here"
if len(subset & set(text.split())) == len(subset):
   print("All values present in text")

if subset & set(text.split()):
   print("Atleast one values present in text")

balki

The regex module recommended in python docs, supports this

words = {'he', 'or', 'low'}
p = regex.compile(r"\L<name>", name=words)
m = p.findall('helloworld')
print(m)

output:

['he', 'low', 'or']

Some details on implementation: link

I can't find any documentation on \L. Can you point me to it?

@DaniloSouzaMorães github.com/mrabarnett/mrab-regex#named-lists-hg-issue-11

Trinadh Koya

It depends on the context suppose if you want to check single literal like(any single word a,e,w,..etc) in is enough

original_word ="hackerearcth"
for 'h' in original_word:
      print("YES")

if you want to check any of the character among the original_word: make use of

if any(your_required in yourinput for your_required in original_word ):

if you want all the input you want in that original_word,make use of all simple

original_word = ['h', 'a', 'c', 'k', 'e', 'r', 'e', 'a', 'r', 't', 'h']
yourinput = str(input()).lower()
if all(requested_word in yourinput for requested_word in original_word):
    print("yes")

What would be yourinput? I can recognise two things: the sentence where I'm looking for something. The array of words I'm looking for. But you describe three variables and I can't get what the third one is.

Stephen Rauch

flog = open('test.txt', 'r')
flogLines = flog.readlines()
strlist = ['SUCCESS', 'Done','SUCCESSFUL']
res = False
for line in flogLines:
     for fstr in strlist:
         if line.find(fstr) != -1:
            print('found') 
            res = True


if res:
    print('res true')
else: 
    print('res false')

https://i.stack.imgur.com/JGKMt.png

Ivan Mikhailov

I would use this kind of function for speed:

def check_string(string, substring_list):
    for substring in substring_list:
        if substring in string:
            return True
    return False

Robert I

data = "firstName and favoriteFood"
mandatory_fields = ['firstName', 'lastName', 'age']


# for each
for field in mandatory_fields:
    if field not in data:
        print("Error, missing req field {0}".format(field));

# still fine, multiple if statements
if ('firstName' not in data or 
    'lastName' not in data or
    'age' not in data):
    print("Error, missing a req field");

# not very readable, list comprehension
missing_fields = [x for x in mandatory_fields if x not in data]
if (len(missing_fields)>0):
    print("Error, missing fields {0}".format(", ".join(missing_fields)));

Spirit of the Void

If you want exact matches of words then consider word tokenizing the target string. I use the recommended word_tokenize from nltk:

from nltk.tokenize import word_tokenize

Here is the tokenized string from the accepted answer:

a_string = "A string is more than its parts!"
tokens = word_tokenize(a_string)
tokens
Out[46]: ['A', 'string', 'is', 'more', 'than', 'its', 'parts', '!']

The accepted answer gets modified as follows:

matches_1 = ["more", "wholesome", "milk"]
[x in tokens for x in matches_1]
Out[42]: [True, False, False]

As in the accepted answer, the word "more" is still matched. If "mo" becomes a match string, however, the accepted answer still finds a match. That is a behavior I did not want.

matches_2 = ["mo", "wholesome", "milk"]
[x in a_string for x in matches_1]
Out[43]: [True, False, False]

Using word tokenization, "mo" is no longer matched:

[x in tokens for x in matches_2]
Out[44]: [False, False, False]

That is the additional behavior that I wanted. This answer also responds to the duplicate question here.

Check if multiple strings exist in another string

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Links

Contact US