Welcome to django-model-values’s documentation.¶
Taking the O out of ORM.
Introduction¶
Provides Django model utilities for encouraging direct data access instead of unnecessary object overhead. Implemented through compatible method and operator extensions [1] to QuerySets and Managers.
The primary motivation is the experiential observation that the active record pattern - specifically Model.save
- is the root of all evil.
The secondary goal is to provide a more intuitive data layer, similar to PyData projects such as pandas.
Usage: instantiate the custom manager in your models.
Updates¶
The Bad:
book = Book.objects.get(pk=pk)
book.rating = 5.0
book.save()
This example is ubiquitous and even encouraged in many django circles. It’s also an epic fail.
- Runs an unnecessary select query, as no fields need to be read.
- Updates all fields instead of just the one needed.
- Therefore also suffers from race conditions.
- And is relatively verbose, without addressing errors yet.
The solution is relatively well-known, and endorsed by django’s own docs, but remains under-utilized.
The Ugly:
Book.objects.filter(pk=pk).update(rating=5.0)
So why not provide syntactic support for the better approach. The Manager supports filtering by primary key, since that’s so common. The QuerySet supports column updates.
The Good:
Book.objects[pk]['rating'] = 5.0
But one might posit…
- “Isn’t the encapsulation
save
provides worth it in principle?”- “Doesn’t the new
update_fields
option fix this in practice?”- “What if the object is cached or has custom logic in the
save
method?”
No, no, and good luck with that. [2] Consider a more realistic example which addresses these concerns.
The Bad:
try:
book = Book.objects.get(pk=pk)
except Book.DoesNotExist:
changed = False
else:
changed = book.publisher != publisher
if changed:
book.publisher = publisher
book.pubdate = today
book.save(update_fields=['publisher', 'pubdate'])
This solves the most severe problem, though with more verbosity and still an unnecessary read. [3]
Note handling pubdate
in the save
implementation would only spare the caller one line of code.
But the real problem is how to handle custom logic when update_fields
isn’t specificed.
There’s no one obvious correct behavior, which is why projects like django-model-utils have to track the changes on the object itself. [4]
A better approach would be an update_publisher
method which does all and only what is required.
So what would such an implementation be? A straight-forward update won’t work, yet only a minor tweak is needed.
The Ugly:
changed = Book.objects.filter(pk=pk).exclude(publisher=publisher) \
.update(publisher=publisher, pubdate=today)
Now the update is only executed if necessary.
And this can be generalized with a little inspiration from {get,update}_or_create
.
The Good:
changed = Book.objects[pk].change({'pubdate': today}, publisher=publisher)
Selects¶
Direct column access has some of the clunkiest syntax: values_list(..., flat=True)
.
QuerySets override __getitem__
, as well as comparison operators for simple filters.
Both are common syntax in panel data layers.
The Bad:
{book.pk: book.name for book in qs}
(book.name for book in qs.filter(name__isnull=False))
if qs.filter(author=author):
The Ugly:
dict(qs.values_list('pk', 'name'))
qs.exclude(name=None).values_list('name', flat=True)
if qs.filter(author=author).exists():
The Good:
dict(qs['pk', 'name'])
qs['name'] != None
if author in qs['author']:
Aggregation¶
Once accustomed to working with data values, a richer set of aggregations becomes possible. Again the method names mirror projects like pandas whenever applicable.
The Bad:
collections.Counter(book.author for book in qs)
sum(book.rating for book in qs) / len(qs)
counts = collections.Counter()
for book in qs:
counts[book.author] += book.quantity
The Ugly:
dict(qs.values_list('author').annotate(model.Count('author')))
qs.aggregate(models.Avg('rating'))['rating__avg']
dict(qs.values_list('author').annotate(models.Sum('quantity')))
The Good:
dict(qs['author'].value_counts())
qs['rating'].mean()
dict(qs['quantity'].groupby('author').sum())
Expressions¶
F
expressions are similarly extended to easily create Q
, Func
, and OrderBy
objects.
Note they can be used directly even without a custom manager.
The Bad:
(book for book in qs if book.author.startswith('A') or book.author.startswith('B'))
(book.title[:10] for book in qs)
for book in qs:
book.rating += 1
book.save()
The Ugly:
qs.filter(Q(author__startswith='A') | Q(author__startswith='B'))
qs.values_list(functions.Substr('title', 1, 10), flat=True)
qs.update(rating=models.F('rating') + 1)
The Good:
qs[F.any(map(F.author.startswith, 'AB'))]
qs[F.title[:10]]
qs['rating'] += 1
Conditionals¶
Annotations and updates with Case
and When
expressions.
See also bulk_changed and bulk_change for efficient bulk operations on primary keys.
The Bad:
collections.Counter('low' if book.quantity < 10 else 'high' for book in qs).items()
for author, quantity in items:
for book in qs.filter(author=author):
book.quantity = quantity
book.save()
The Ugly:
qs.values_list(models.Case(
models.When(quantity__lt=10, then=models.Value('low')),
models.When(quantity__gte=10, then=models.Value('high')),
output_field=models.CharField(),
)).annotate(count=models.Count('*'))
cases = (models.When(author=author, then=models.Value(quantity)) for author, quantity in items)
qs.update(quantity=models.Case(*cases, default='quantity'))
The Good:
qs[{F.quantity < 10: 'low', F.quantity >= 10: 'high'}].value_counts()
qs['quantity'] = {F.author == author: quantity for author, quantity in items}
Indices and tables¶
Footnotes
[1] | The only incompatible changes are edge cases which aren’t documented behavior, such as queryset comparison. |
[2] | In the vast majority of instances of that idiom, the object is immediately discarded and no custom logic is necessary. Furthermore the dogma of a model knowing how to serialize itself doesn’t inherently imply a single all-purpose instance method. Specialized classmethods or manager methods would be just as encapsulated. |
[3] | Premature optimization? While debatable with respect to general object overhead, nothing good can come from running superfluous database queries. |
[4] | Supporting update_fields with custom logic also results in complex conditionals, ironic given that OO methodology ostensibly favors separate methods over large switch statements. |