Let's say I have the following models
class Photo(models.Model):
tags = models.ManyToManyField(Tag)
class Tag(models.Model):
name = models.CharField(max_length=50)
In a view I have a list with active filters called categories. I want to filter Photo objects which have all tags present in categories.
I tried:
Photo.objects.filter(tags__name__in=categories)
But this matches any item in categories, not all items.
So if categories would be ['holiday', 'summer'] I want Photo's with both a holiday and summer tag.
Can this be achieved?
Photo.objects.filter(tags__name='holiday').filter(tags__name='summer')
is the way to go. (This is same as jpic's example). Each filter
should add more JOIN
s to query, so you could take annotation approach if they are too many.
Summary:
One option is, as suggested by jpic and sgallen in the comments, to add .filter()
for each category. Each additional filter
adds more joins, which should not be a problem for small set of categories.
There is the aggregation approach. This query would be shorter and perhaps quicker for a large set of categories.
You also have the option of using custom queries.
Some examples
Test setup:
class Photo(models.Model):
tags = models.ManyToManyField('Tag')
class Tag(models.Model):
name = models.CharField(max_length=50)
def __unicode__(self):
return self.name
In [2]: t1 = Tag.objects.create(name='holiday')
In [3]: t2 = Tag.objects.create(name='summer')
In [4]: p = Photo.objects.create()
In [5]: p.tags.add(t1)
In [6]: p.tags.add(t2)
In [7]: p.tags.all()
Out[7]: [<Tag: holiday>, <Tag: summer>]
Using chained filters approach:
In [8]: Photo.objects.filter(tags=t1).filter(tags=t2)
Out[8]: [<Photo: Photo object>]
Resulting query:
In [17]: print Photo.objects.filter(tags=t1).filter(tags=t2).query
SELECT "test_photo"."id"
FROM "test_photo"
INNER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
INNER JOIN "test_photo_tags" T4 ON ("test_photo"."id" = T4."photo_id")
WHERE ("test_photo_tags"."tag_id" = 3 AND T4."tag_id" = 4 )
Note that each filter
adds more JOINS
to the query.
Using annotation approach:
In [29]: from django.db.models import Count
In [30]: Photo.objects.filter(tags__in=[t1, t2]).annotate(num_tags=Count('tags')).filter(num_tags=2)
Out[30]: [<Photo: Photo object>]
Resulting query:
In [32]: print Photo.objects.filter(tags__in=[t1, t2]).annotate(num_tags=Count('tags')).filter(num_tags=2).query
SELECT "test_photo"."id", COUNT("test_photo_tags"."tag_id") AS "num_tags"
FROM "test_photo"
LEFT OUTER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
WHERE ("test_photo_tags"."tag_id" IN (3, 4))
GROUP BY "test_photo"."id", "test_photo"."id"
HAVING COUNT("test_photo_tags"."tag_id") = 2
ANDed Q objects would not work:
In [9]: from django.db.models import Q
In [10]: Photo.objects.filter(Q(tags__name='holiday') & Q(tags__name='summer'))
Out[10]: []
In [11]: from operator import and_
In [12]: Photo.objects.filter(reduce(and_, [Q(tags__name='holiday'), Q(tags__name='summer')]))
Out[12]: []
Resulting query:
In [25]: print Photo.objects.filter(Q(tags__name='holiday') & Q(tags__name='summer')).query
SELECT "test_photo"."id"
FROM "test_photo"
INNER JOIN "test_photo_tags" ON ("test_photo"."id" = "test_photo_tags"."photo_id")
INNER JOIN "test_tag" ON ("test_photo_tags"."tag_id" = "test_tag"."id")
WHERE ("test_tag"."name" = holiday AND "test_tag"."name" = summer )
Another approach that works, although PostgreSQL only, is using django.contrib.postgres.fields.ArrayField
:
Example copied from docs:
>>> Post.objects.create(name='First post', tags=['thoughts', 'django'])
>>> Post.objects.create(name='Second post', tags=['thoughts'])
>>> Post.objects.create(name='Third post', tags=['tutorial', 'django'])
>>> Post.objects.filter(tags__contains=['thoughts'])
<QuerySet [<Post: First post>, <Post: Second post>]>
>>> Post.objects.filter(tags__contains=['django'])
<QuerySet [<Post: First post>, <Post: Third post>]>
>>> Post.objects.filter(tags__contains=['django', 'thoughts'])
<QuerySet [<Post: First post>]>
ArrayField
has some more powerful features such as overlap and index transforms.
This also can be done by dynamic query generation using Django ORM and some Python magic :)
from operator import and_
from django.db.models import Q
categories = ['holiday', 'summer']
res = Photo.filter(reduce(and_, [Q(tags__name=c) for c in categories]))
The idea is to generate appropriate Q objects for each category and then combine them using AND operator into one QuerySet. E.g. for your example it'd be equal to
res = Photo.filter(Q(tags__name='holiday') & Q(tags__name='summer'))
filter
would be the same as using and
for Q objects in one filter... My mistake.
filter
to exclude
and use a negate operator. Like so: res = Photo.exclude(~reduce(and_, [Q(tags__name=c) for c in categories]))
I use a little function that iterates filters over a list for a given operator an a column name :
def exclusive_in (cls,column,operator,value_list):
myfilter = column + '__' + operator
query = cls.objects
for value in value_list:
query=query.filter(**{myfilter:value})
return query
and this function can be called like that:
exclusive_in(Photo,'tags__name','iexact',['holiday','summer'])
it also work with any class and more tags in the list; operators can be anyone like 'iexact','in','contains','ne',...
If you struggled with this problem as i did and nothing mentioned helped you, maybe this one will solve your issue
Instead of chaining filter, in some cases it would be better just to store ids of previous filter
tags = [1, 2]
for tag in tags:
ids = list(queryset.filter(tags__id=tag).values_list("id", flat=True))
queryset = queryset.filter(id__in=ids)
Using this approach will help you to avoid stacking JOIN
in SQL query:
My solution: let say author is list of elements that need to match all item in list, so:
for a in author:
queryset = queryset.filter(authors__author_first_name=a)
if not queryset:
break
for category in categories:
query = Photo.objects.filter(tags_name=category)
this piece of code , filters your photos which have all the tags name coming from categories.
If we want to do it dynamically, followed the example:
tag_ids = [t1.id, t2.id]
qs = Photo.objects.all()
for tag_id in tag_ids:
qs = qs.filter(tag__id=tag_id)
print qs
queryset = Photo.objects.filter(tags__name="vacaciones") | Photo.objects.filter(tags__name="verano")
Success story sharing
t3
, and a photo has the tagst2
andt3
. Then this photo will still match the given query.Photo.objects.filter(tags__in=tags)
matches photos that have any of the tags, not only those that has all. Some of those that only has one of the desired tags, may have exactly the amount of tags that you are looking for, and some of those that has all the desired tags, may also have additional tags.