Is there a NumPy function to return the first index of something in an array?

M

Mateen Ulhaq

Yes, given an array, array, and a value, item to search for, you can use np.where as:

itemindex = numpy.where(array == item)

The result is a tuple with first all the row indices, then all the column indices.

For example, if an array is two dimensions and it contained your item at two locations then

array[itemindex[0][0]][itemindex[1][0]]

would be equal to your item and so would be:

array[itemindex[0][1]][itemindex[1][1]]

If you are looking for the first row in which an item exists in the first column, this works (although it will throw an index error if none exist) rows, columns = np.where(array==item); first_idx = sorted([r for r, c in zip(rows, columns) if c == 0])[0]

What if you want it to stop searching after finding the first value? I don't think where() is comparable to find()

Ah! If you're interested in performance, check out the answer to this question: stackoverflow.com/questions/7632963/…

np.argwhere would be slightly more useful here: itemindex = np.argwhere(array==item)[0]; array[tuple(itemindex)]

It's worth noting that this answer assumes the array is 2D. where works on any array, and will return a tuple of length 3 when used on a 3D array, etc.

V

Vebjorn Ljosa

If you need the index of the first occurrence of only one value, you can use nonzero (or where, which amounts to the same thing in this case):

>>> t = array([1, 1, 1, 2, 2, 3, 8, 3, 8, 8])
>>> nonzero(t == 8)
(array([6, 8, 9]),)
>>> nonzero(t == 8)[0][0]
6

If you need the first index of each of many values, you could obviously do the same as above repeatedly, but there is a trick that may be faster. The following finds the indices of the first element of each subsequence:

>>> nonzero(r_[1, diff(t)[:-1]])
(array([0, 3, 5, 6, 7, 8]),)

Notice that it finds the beginning of both subsequence of 3s and both subsequences of 8s:

[1, 1, 1, 2, 2, 3, 8, 3, 8, 8]

So it's slightly different than finding the first occurrence of each value. In your program, you may be able to work with a sorted version of t to get what you want:

>>> st = sorted(t)
>>> nonzero(r_[1, diff(st)[:-1]])
(array([0, 3, 5, 7]),)

Could you please explain what r_ is?

@Geoff, r_ concatenates; or, more precisely, it translates slice objects to concatenation along each axis. I could have used hstack instead; that may have been less confusing. See the documentation for more information about r_. There is also a c_.

+1, nice one! (vs NP.where) your solution is a lot simpler (and probably faster) in the case where it's only the first occurrence of a given value in a 1D array that we need

The latter case (finding the first index of all values) is given by vals, locs = np.unique(t, return_index=True)

@askewchan your version is functionally equivalent, but much, much, much slower

P

Peter Mortensen

You can also convert a NumPy array to list in the air and get its index. For example,

l = [1,2,3,4,5] # Python list
a = numpy.array(l) # NumPy array
i = a.tolist().index(2) # i will return index of 2
print i

It will print 1.

It may be the library has changed since this was first written. But this was the first solution that worked for me.

I've made good use of this to find multiple values in a list using a list comprehension: [find_list.index(index_list[i]) for i in range(len(index_list))]

@MattWenham If it's big enough, you can convert your find_list to a NumPy array of object (or anything more specific that's appropriate) and just do find_arr[index_list].

Totally off-topic, but this is the first time I see the phrase "in the air" - what I've seen most, in its place, is probably "on the fly".

Simplicity & readability rules, but if you are using Numpy performance must matter to you. This python .index() approach unnecessarily iterates over the data at most twice!

M

MSeifert

Just to add a very performant and handy numba alternative based on np.ndenumerate to find the first index:

from numba import njit
import numpy as np

@njit
def index(array, item):
    for idx, val in np.ndenumerate(array):
        if val == item:
            return idx
    # If no item was found return None, other return types might be a problem due to
    # numbas type inference.

This is pretty fast and deals naturally with multidimensional arrays:

>>> arr1 = np.ones((100, 100, 100))
>>> arr1[2, 2, 2] = 2

>>> index(arr1, 2)
(2, 2, 2)

>>> arr2 = np.ones(20)
>>> arr2[5] = 2

>>> index(arr2, 2)
(5,)

This can be much faster (because it's short-circuiting the operation) than any approach using np.where or np.nonzero.

However np.argwhere could also deal gracefully with multidimensional arrays (you would need to manually cast it to a tuple and it's not short-circuited) but it would fail if no match is found:

>>> tuple(np.argwhere(arr1 == 2)[0])
(2, 2, 2)
>>> tuple(np.argwhere(arr2 == 2)[0])
(5,)

@njit is a shorthand of jit(nopython=True) i.e. the function will be fully compiled on-the-fly at the time of the first run so that the Python interpreter calls are completely removed.

Since version at least 0.20.0, you can also write it as a generator, so that all occurrences of a specific value can be found on-demand.

P

Peter Mortensen

l.index(x) returns the smallest i such that i is the index of the first occurrence of x in the list.

One can safely assume that the index() function in Python is implemented so that it stops after finding the first match, and this results in an optimal average performance.

For finding an element stopping after the first match in a NumPy array use an iterator (ndenumerate).

In [67]: l=range(100)

In [68]: l.index(2)
Out[68]: 2

NumPy array:

In [69]: a = np.arange(100)

In [70]: next((idx for idx, val in np.ndenumerate(a) if val==2))
Out[70]: (2L,)

Note that both methods index() and next return an error if the element is not found. With next, one can use a second argument to return a special value in case the element is not found, e.g.

In [77]: next((idx for idx, val in np.ndenumerate(a) if val==400),None)

There are other functions in NumPy (argmax, where, and nonzero) that can be used to find an element in an array, but they all have the drawback of going through the whole array looking for all occurrences, thus not being optimized for finding the first element. Note also that where and nonzero return arrays, so you need to select the first element to get the index.

In [71]: np.argmax(a==2)
Out[71]: 2

In [72]: np.where(a==2)
Out[72]: (array([2], dtype=int64),)

In [73]: np.nonzero(a==2)
Out[73]: (array([2], dtype=int64),)

Time comparison

Just checking that for large arrays the solution using an iterator is faster when the searched item is at the beginning of the array (using %timeit in the IPython shell):

In [285]: a = np.arange(100000)

In [286]: %timeit next((idx for idx, val in np.ndenumerate(a) if val==0))
100000 loops, best of 3: 17.6 µs per loop

In [287]: %timeit np.argmax(a==0)
1000 loops, best of 3: 254 µs per loop

In [288]: %timeit np.where(a==0)[0][0]
1000 loops, best of 3: 314 µs per loop

This is an open NumPy GitHub issue.

I think you should also include a timing for the worst case (last element) just so readers know what happens to them in the worst case when they use your approach.

@MSeifert I can't get a reasonable timing for the worst case iterator solution--I'm going to delete this answer until I find out what's wrong with it

doesn't %timeit next((idx for idx, val in np.ndenumerate(a) if val==99999)) work? If you're wondering why it's 1000 times slower - it's because python loops over numpy arrays are notoriously slow.

@MSeifert no I didn't know that, but I'm also puzzled by the fact that argmax and where are much faster in this case (searched element at the end of array)

They should be as fast as if the element is at the beginning. They always process the whole array so they always take the same time (at least they should).

M

Matt

If you're going to use this as an index into something else, you can use boolean indices if the arrays are broadcastable; you don't need explicit indices. The absolute simplest way to do this is to simply index based on a truth value.

other_array[first_array == item]

Any boolean operation works:

a = numpy.arange(100)
other_array[first_array > 50]

The nonzero method takes booleans, too:

index = numpy.nonzero(first_array == item)[0][0]

The two zeros are for the tuple of indices (assuming first_array is 1D) and then the first item in the array of indices.

A

Alok Nayak

For one-dimensional sorted arrays, it would be much more simpler and efficient O(log(n)) to use numpy.searchsorted which returns a NumPy integer (position). For example,

arr = np.array([1, 1, 1, 2, 3, 3, 4])
i = np.searchsorted(arr, 3)

Just make sure the array is already sorted

Also check if returned index i actually contains the searched element, since searchsorted's main objective is to find indices where elements should be inserted to maintain order.

if arr[i] == 3:
    print("present")
else:
    print("not present")

searchsorted isn't nlog(n) since it doesn't sort the array before searching, it assumes that the argument array is already sorted. check out the documentation of numpy.searchsorted (link above)

It's mlog(n): m binary searches inside a list of length n.

Its mlog(n) if m elements are to be searched, when a m shaped array is passed instead of a single element like 3. It is log(n) for this question's requirement which is about finding one element.

1

1''

For 1D arrays, I'd recommend np.flatnonzero(array == value)[0], which is equivalent to both np.nonzero(array == value)[0][0] and np.where(array == value)[0][0] but avoids the ugliness of unboxing a 1-element tuple.

P

Peter Mortensen

To index on any criteria, you can so something like the following:

In [1]: from numpy import *
In [2]: x = arange(125).reshape((5,5,5))
In [3]: y = indices(x.shape)
In [4]: locs = y[:,x >= 120] # put whatever you want in place of x >= 120
In [5]: pts = hsplit(locs, len(locs[0]))
In [6]: for pt in pts:
   .....:         print(', '.join(str(p[0]) for p in pt))
4, 4, 0
4, 4, 1
4, 4, 2
4, 4, 3
4, 4, 4

And here's a quick function to do what list.index() does, except doesn't raise an exception if it's not found. Beware -- this is probably very slow on large arrays. You can probably monkey patch this on to arrays if you'd rather use it as a method.

def ndindex(ndarray, item):
    if len(ndarray.shape) == 1:
        try:
            return [ndarray.tolist().index(item)]
        except:
            pass
    else:
        for i, subarray in enumerate(ndarray):
            try:
                return [i] + ndindex(subarray, item)
            except:
                pass

In [1]: ndindex(x, 103)
Out[1]: [4, 0, 3]

N

Noyer282

An alternative to selecting the first element from np.where() is to use a generator expression together with enumerate, such as:

>>> import numpy as np
>>> x = np.arange(100)   # x = array([0, 1, 2, 3, ... 99])
>>> next(i for i, x_i in enumerate(x) if x_i == 2)
2

For a two dimensional array one would do:

>>> x = np.arange(100).reshape(10,10)   # x = array([[0, 1, 2,... 9], [10,..19],])
>>> next((i,j) for i, x_i in enumerate(x) 
...            for j, x_ij in enumerate(x_i) if x_ij == 2)
(0, 2)

The advantage of this approach is that it stops checking the elements of the array after the first match is found, whereas np.where checks all elements for a match. A generator expression would be faster if there's match early in the array.

In case there might not be a match in the array at all, this method also lets you conveniently specify a fallback value. If the first example were to return None as a fallback, it would become next((i for i, x_i in enumerate(x) if x_i == 2), None).

P

Peter Mortensen

There are lots of operations in NumPy that could perhaps be put together to accomplish this. This will return indices of elements equal to item:

numpy.nonzero(array - item)

You could then take the first elements of the lists to get a single element.

wouldn't that give the indices of all elements that are not equal to item?

E

Eelco Hoogendoorn

The numpy_indexed package (disclaimer, I am its author) contains a vectorized equivalent of list.index for numpy.ndarray; that is:

sequence_of_arrays = [[0, 1], [1, 2], [-5, 0]]
arrays_to_query = [[-5, 0], [1, 0]]

import numpy_indexed as npi
idx = npi.indices(sequence_of_arrays, arrays_to_query, missing=-1)
print(idx)   # [2, -1]

This solution has vectorized performance, generalizes to ndarrays, and has various ways of dealing with missing values.

n

njp

Another option not previously mentioned is the bisect module, which also works on lists, but requires a pre-sorted list/array:

import bisect
import numpy as np
z = np.array([104,113,120,122,126,138])
bisect.bisect_left(z, 122)

yields

bisect also returns a result when the number you're looking for doesn't exist in the array, so that the number can be inserted in the correct place.

D

Dmitriy Work

Comparison of 8 methods

TL;DR:

(Note: applicable to 1d arrays under 100M elements.)

For maximum performance use index_of__v5 (numba + numpy.enumerate + for loop; see the code below). If numba is not available: Use index_of__v7 (for loop + enumerate) if the target value is expected to be found within the first 100k elements. Else use index_of__v2/v3/v4 (numpy.argmax or numpy.flatnonzero based).

https://i.stack.imgur.com/VH8n8.png

Powered by perfplot

import numpy as np
from numba import njit

# Based on: numpy.argmax()
# Proposed by: John Haberstroh (https://stackoverflow.com/a/67497472/7204581)
def index_of__v1(arr: np.array, v):
    is_v = (arr == v)
    return is_v.argmax() if is_v.any() else -1


# Based on: numpy.argmax()
def index_of__v2(arr: np.array, v):
    return (arr == v).argmax() if v in arr else -1


# Based on: numpy.flatnonzero()
# Proposed by: 1'' (https://stackoverflow.com/a/42049655/7204581)
def index_of__v3(arr: np.array, v):
    idxs = np.flatnonzero(arr == v)
    return idxs[0] if len(idxs) > 0 else -1


# Based on: numpy.argmax()
def index_of__v4(arr: np.array, v):
    return np.r_[False, (arr == v)].argmax() - 1


# Based on: numba, for loop
# Proposed by: MSeifert (https://stackoverflow.com/a/41578614/7204581)
@njit
def index_of__v5(arr: np.array, v):
    for idx, val in np.ndenumerate(arr):
        if val == v:
            return idx[0]
    return -1


# Based on: numpy.ndenumerate(), for loop
def index_of__v6(arr: np.array, v):
    return next((idx[0] for idx, val in np.ndenumerate(arr) if val == v), -1)


# Based on: enumerate(), for loop
# Proposed by: Noyer282 (https://stackoverflow.com/a/40426159/7204581)
def index_of__v7(arr: np.array, v):
    return next((idx for idx, val in enumerate(arr) if val == v), -1)


# Based on: list.index()
# Proposed by: Hima (https://stackoverflow.com/a/23994923/7204581)
def index_of__v8(arr: np.array, v):
    l = list(arr)
    try:
        return l.index(v)
    except ValueError:
        return -1

Go to Colab

J

John Haberstroh

There is a fairly idiomatic and vectorized way to do this built into numpy. It uses a quirk of the np.argmax() function to accomplish this -- if many values match, it returns the index of the first match. The trick is that for booleans, there will only ever be two values: True (1) and False (0). Therefore, the returned index will be that of the first True.

For the simple example provided, you can see it work with the following

>>> np.argmax(np.array([1,2,3]) == 2)
1

A great example is computing buckets, e.g. for categorizing. Let's say you have an array of cut points, and you want the "bucket" that corresponds to each element of your array. The algorithm is to compute the first index of cuts where x < cuts (after padding cuts with np.Infitnity). I could use broadcasting to broadcast the comparisons, then apply argmax along the cuts-broadcasted axis.

>>> cuts = np.array([10, 50, 100])
>>> cuts_pad = np.array([*cuts, np.Infinity])
>>> x   = np.array([7, 11, 80, 443])
>>> bins = np.argmax( x[:, np.newaxis] < cuts_pad[np.newaxis, :], axis = 1)
>>> print(bins)
[0, 1, 2, 3]

As expected, each value from x falls into one of the sequential bins, with well-defined and easy to specify edge case behavior.

S

Statham

Note: this is for python 2.7 version

You can use a lambda function to deal with the problem, and it works both on NumPy array and list.

your_list = [11, 22, 23, 44, 55]
result = filter(lambda x:your_list[x]>30, range(len(your_list)))
#result: [3, 4]

import numpy as np
your_numpy_array = np.array([11, 22, 23, 44, 55])
result = filter(lambda x:your_numpy_array [x]>30, range(len(your_list)))
#result: [3, 4]

And you can use

result[0]

to get the first index of the filtered elements.

For python 3.6, use

list(result)

instead of

result

This results in <filter object at 0x0000027535294D30> on Python 3 (tested on Python 3.6.3). Perhaps update for Python 3?

S

Sangavi Loganathan

Use ndindex

Sample array

arr = np.array([[1,4],
                 [2,3]])
print(arr)

...[[1,4],
    [2,3]]

create an empty list to store the index and the element tuples

 index_elements = []
 for i in np.ndindex(arr.shape):
     index_elements.append((arr[i],i))

convert the list of tuples into dictionary

 index_elements = dict(index_elements)

The keys are the elements and the values are their indices - use keys to access the index

 index_elements[4]

  ... (0,1)

M

Matt Raymond

For my use case, I could not sort the array ahead of time because the order of the elements is important. This is my all-NumPy implementation:

import numpy as np

# The array in question
arr = np.array([1,2,1,2,1,5,5,3,5,9]) 

# Find all of the present values
vals=np.unique(arr)
# Make all indices up-to and including the desired index positive
cum_sum=np.cumsum(arr==vals.reshape(-1,1),axis=1)
# Add zeros to account for the n-1 shape of diff and the all-positive array of the first index
bl_mask=np.concatenate([np.zeros((cum_sum.shape[0],1)),cum_sum],axis=1)>=1
# The desired indices
idx=np.where(np.diff(bl_mask))[1]

# Show results
print(list(zip(vals,idx)))

>>> [(1, 0), (2, 1), (3, 7), (5, 5), (9, 9)]

I believe it accounts for unsorted arrays with duplicate values.

b

ben othman zied

index_lst_form_numpy = pd.DataFrame(df).reset_index()["index"].tolist()

P

Pobaranchuk

Found another solution with loops:

new_array_of_indicies = []

for i in range(len(some_array)):
  if some_array[i] == some_value:
    new_array_of_indicies.append(i)

loops are very slow in python they should be avoided if there is another solution

This solution should be avoided as it will be too slow.

Is there a NumPy function to return the first index of something in an array?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Links

Contact US