Note: At the time of writing, Django is at v1.4.1
So what actually happens when you have a Django query like the following?
Django is smart enough to know that you want to use the results from the first unevaluated
QuerySet as the input for the
__in clause, and in turn, generates a subquery. The raw query generated is something like this:
Awesome. Django saved us an extra round trip by inlining the query instead of evaluating the first query separately.
In this scenario, nothing at all. We’ve saved a round trip to the database, all is good. Let’s throw caching into the mix.
In this scenario, we want to cache the entire list of cities. The
QuerySet gets cached for us like we expect, but when Django gets to the
filter(), it sees that you’re passing it a
QuerySet and generates a subquery again. We’ve effective cached nothing because the exact same query is being run.
This also applies when you want to use the same
__in across multiple filters.
Again, the exact same subquery is repeated across both queries.
The reason why Django is doing this is solely because the argument passed into
filter() is a
QuerySet. If we just pass a
list, Django will use an array of ids inside the
Let’s look at an improved version with caching.
The only thing that has changed is forcing the
QuerySet to be evaluated by converting it to a
cities is a
list, so Django won’t try to perform a subquery anymore. This could be simplified to just caching the primary keys as a
list as well, depending on what you’re trying to achieve.
The resulting SQL will be something like:
We all know that Django’s
QuerySets are lazy, right? No query is actually performed until the
QuerySet is iterated over. Once a
QuerySet has been iterated over, it won’t query again for the results. The results get cached into an internal
At the moment, Django is not smart enough to detect that a
QuerySet has already been evaluated before generating a subquery, effectively causing the query to be run again when we already know the results.
Picture this scenario:
Since we’ve already evaluated the
QuerySet before passing it to the
__in clause, it’d be smart to just use the ids that we’ve already calculated from before, but it doesn’t. So pay attention and be careful.
I’ve submitted a patch to Django core to try and get that behavior fixed: