I have some data either in a list of lists or a list of tuples, like this:
data = [[1,2,3], [4,5,6], [7,8,9]]
data = [(1,2,3), (4,5,6), (7,8,9)]
And I want to sort by the 2nd element in the subset. Meaning, sorting by 2,5,8 where 2
is from (1,2,3)
, 5
is from (4,5,6)
. What is the common way to do this? Should I store tuples or lists in my list?
sorted_by_second = sorted(data, key=lambda tup: tup[1])
or:
data.sort(key=lambda tup: tup[1]) # sorts in place
The default sort mode is ascending. To sort in descending order use the option reverse=True
:
sorted_by_second = sorted(data, key=lambda tup: tup[1], reverse=True)
or:
data.sort(key=lambda tup: tup[1], reverse=True) # sorts in place
from operator import itemgetter
data.sort(key=itemgetter(1))
itemgetter
class to sort 126% faster on average than the equivalent lambda
function.
data.sort(key=itemgetter(3,1))
For sorting by multiple criteria, namely for instance by the second and third elements in a tuple, let
data = [(1,2,3),(1,2,1),(1,1,4)]
and so define a lambda that returns a tuple that describes priority, for instance
sorted(data, key=lambda tup: (tup[1],tup[2]) )
[(1, 1, 4), (1, 2, 1), (1, 2, 3)]
I just want to add to Stephen's answer if you want to sort the array from high to low, another way other than in the comments above is just to add this to the line:
reverse = True
and the result will be as follows:
data.sort(key=lambda tup: tup[1], reverse=True)
Stephen's answer is the one I'd use. For completeness, here's the DSU (decorate-sort-undecorate) pattern with list comprehensions:
decorated = [(tup[1], tup) for tup in data]
decorated.sort()
undecorated = [tup for second, tup in decorated]
Or, more tersely:
[b for a,b in sorted((tup[1], tup) for tup in data)]
As noted in the Python Sorting HowTo, this has been unnecessary since Python 2.4, when key functions became available.
In order to sort a list of tuples (<word>, <count>)
, for count
in descending order and word
in alphabetical order:
data = [
('betty', 1),
('bought', 1),
('a', 1),
('bit', 1),
('of', 1),
('butter', 2),
('but', 1),
('the', 1),
('was', 1),
('bitter', 1)]
I use this method:
sorted(data, key=lambda tup:(-tup[1], tup[0]))
and it gives me the result:
[('butter', 2),
('a', 1),
('betty', 1),
('bit', 1),
('bitter', 1),
('bought', 1),
('but', 1),
('of', 1),
('the', 1),
('was', 1)]
Without lambda:
def sec_elem(s):
return s[1]
sorted(data, key=sec_elem)
itemgetter()
is somewhat faster than lambda tup: tup[1]
, but the increase is relatively modest (around 10 to 25 percent).
(IPython session)
>>> from operator import itemgetter
>>> from numpy.random import randint
>>> values = randint(0, 9, 30000).reshape((10000,3))
>>> tpls = [tuple(values[i,:]) for i in range(len(values))]
>>> tpls[:5] # display sample from list
[(1, 0, 0),
(8, 5, 5),
(5, 4, 0),
(5, 7, 7),
(4, 2, 1)]
>>> sorted(tpls[:5], key=itemgetter(1)) # example sort
[(1, 0, 0),
(4, 2, 1),
(5, 4, 0),
(8, 5, 5),
(5, 7, 7)]
>>> %timeit sorted(tpls, key=itemgetter(1))
100 loops, best of 3: 4.89 ms per loop
>>> %timeit sorted(tpls, key=lambda tup: tup[1])
100 loops, best of 3: 6.39 ms per loop
>>> %timeit sorted(tpls, key=(itemgetter(1,0)))
100 loops, best of 3: 16.1 ms per loop
>>> %timeit sorted(tpls, key=lambda tup: (tup[1], tup[0]))
100 loops, best of 3: 17.1 ms per loop
@Stephen 's answer is to the point! Here is an example for better visualization,
Shout out for the Ready Player One fans! =)
>>> gunters = [('2044-04-05', 'parzival'), ('2044-04-07', 'aech'), ('2044-04-06', 'art3mis')]
>>> gunters.sort(key=lambda tup: tup[0])
>>> print gunters
[('2044-04-05', 'parzival'), ('2044-04-06', 'art3mis'), ('2044-04-07', 'aech')]
key
is a function that will be called to transform the collection's items for comparison.. like compareTo
method in Java.
The parameter passed to key must be something that is callable. Here, the use of lambda
creates an anonymous function (which is a callable).
The syntax of lambda is the word lambda followed by a iterable name then a single block of code.
Below example, we are sorting a list of tuple that holds the info abt time of certain event and actor name.
We are sorting this list by time of event occurrence - which is the 0th element of a tuple.
Note - s.sort([cmp[, key[, reverse]]])
sorts the items of s in place
I use this in my code:
#To sort the list based on each element's second integer (elem[1])
sorted(d2, key=lambda elem: elem[1])
Depending on which element you want to sort it by you can put it in the
(elem[*insert the index of the element you are sorting it by*])
sorted
creates new list. To do in-place sorting use .sort(key=...)
Sorting a tuple is quite simple:
tuple(sorted(t))
Success story sharing
key=itemgetter(1)
and at the beginning of the file:from operator import itemgetter
sort
here is a method ofList
object of Python, which receives a lambda function as itskey
parameter. You may name it astup
, ort
, or whatever you like and it'll still work.tup
here specifies index of the list's tuple, so1
means that sorting will be performed by the second values of tuples from the original list (2, 5, 8
).lambda
approach to be simpler than the unintuitiveitemgetter
class,itemgetter
does indeed appear to be faster. I'm curious as to why this is. My crude suspicion is that alambda
incurs the hidden cost of capturing all local variables into a closure context, whereas anitemgetter
instance does not. tl;dr: Always useitemgetter
, because speed wins.