Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix prefetch_related updates.#4553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
lovelydinosaur merged 1 commit intomasterfromfix-prefetch-related-bug
Oct 11, 2016

Conversation

@lovelydinosaur
Copy link
Contributor

Closes#2442.

@lovelydinosaurlovelydinosaur added this to the3.5.0 Release milestoneOct 11, 2016
@lovelydinosaurlovelydinosaur merged commitd0b3b64 intomasterOct 11, 2016
@lovelydinosaurlovelydinosaur deleted the fix-prefetch-related-bug branchOctober 11, 2016 10:07
@spectras
Copy link

spectras commentedNov 4, 2016
edited
Loading

Although this does fix the bug where people update an object and forget to reload it, it can have a big performance impact when unneeded. I just noticed this change because many PUT operation on my objects suddenly got bumped from 9 to 14 requests, from 8 to 12, etc. That's quite a performance hit.

The point being: the object is loaded with multipleprefetch_related clauses, but they are either not changed, or not reflected in the serialized form, or the code updating them takes care of keeping the prefetch cache updated manually.
For instance, complex user and group permissions are loaded, used to determine whether current user can modify the object (and which fields he can modify), then discarded.

I would believe that code modifying data is reponsible for reloading it. Plus, that's consistent with Django form views. But at least, is it possible to opt out of this mechanism without making my ownUpdateMixin that I use instead of DRF's provided one?

(on a more sentimental issue, I'm feeling after carefully managing my reloads, I'm taking the performance hit because some other users were too lazy to do it themselves)

@lovelydinosaur
Copy link
ContributorAuthor

I'm not exactly clear what "carefully managing my reloads" means here - can you give an example of howyou're managing this in your codebase so that this hasn't been an issue for you.

Wedo need to prioritize not breaking user code more highly than making user code more performant, but happy to consider tweaks if a good case can be made.

@spectras
Copy link

spectras commentedNov 7, 2016
edited
Loading

Well, for everyprefetch_related added when updating an instance there are 4 typical use cases in my codebase:

  • It's used read-only. For instance, permissions. Nothing special needs to be done as I know for sure the update will not affect the permissions.

  • It's used read-write, but only for modifying related instances, knowing none will be added or removed. Just be careful to use the instances from the prefetch cache for updating, and they will be up-to-date.

  • The whole set of related objects is overwritten. In this case, I just replace the prefetch cache with the new set. For instance:

    # pictures comes from a PrimaryKeyRelatedField with many=True# <here, I update picture attachments in the database>, then:instance._prefetched_objects_cache[instance.pictures.prefetch_cache_name]=pictures

    (Actually, I put this in a small utility function so I only have one place to update if theprefetch_related() internals change).

  • Something too complex to easily maintain the prefetch cache manually. In that case, I finish myperform_update with the single line:

    serializer.instance=self.get_object()

    That is roughly equivalent to what was introduced in this pull request. But in my codebase, this is used only as a last resort. And it's only rarely actually needed. And when it is, it's because complex changes were done which already triggered many queries, making the reload relatively cheap (4 additional queries on top of 40 don't bother me as much as 4 on top of 6).


By the way, Django's generic form views do not reload the object upon update. It's unlikely to bite often because it sends a redirect, but if your `get_absolute_url()` does use some of the prefetched objects to build the url, user code is responsible for reloading the object, either in his `MyView.form_valid()` method or his `MyForm.save()` method. Otherwise, it will still see the old values.

The fact that prefetched data is not updated on saves is basic Django knowledge that any Django developer should know.

instance=Parent.objects.prefetch_related('children').get(pk=42)Child.objects.filter(parent__pk=42).delete()instance.children.all()# still shows all children despite having deleted them all

And tickets on that subject on Django tracker are systematically closed as Invalid with a core developer saying it is by design (example) I'm not sure DRF working around this is really helpful in the long run. Rather, I would believe making it inconsistent with Django just adds confusion… and useless requests in all cases where reloads are not needed.

@lovelydinosaur
Copy link
ContributorAuthor

The change here resolves an easily reproducible bug, and is only applied toprefetch_related cases. Can't see us changing that. I'd suggest a view mixin if you don't want the behavior applied.

The ticket linked is closed as invalid because there's no desire to attempt to resolve this at the ORM level. That doesn't mean that we can't or shouldn't resolve it at the generic view level - it's not the same consideration.

@spectras
Copy link

spectras commentedNov 7, 2016
edited
Loading

Perhaps the specific ticket was not the best example, I just took the first that showed up in google.

The point being: prefetched data is not refreshed when saving an instance. This is expected behavior for all Django APIs, be it ORM, forms or views, and I would have expected DRF to behave the same.
Also, why address data pre-loaded throughprefetch_related but not data pre-loaded throughselect_related? Or models loaded in a separate query? Or additional context loaded by third-party modules? None of those will be updated automatically.

It seems odd that if I doprefetch_related('parent'), DRF will reload the object, but if I doselect_related('parent') it will not.

It seems odd that adding aprefetch_related() clause to my querysets, with the intent of reducing the number of queries, actually increases that number because it has the unpredictable effect of triggering a reload by DRF.

It seems odd also that saving the object might succeed - potentially generating side-effects such as logging an audit entry, then raise aDoesNotExist exception if the object gets deleted in between the save and the reload. (*)

But alright, mixin it will be for sure. I don't want to have to track the exact queryset methods invoked to guess whether DRF will reload the object or not. Explicit is better than implicit.

__
(*) actually it's even worse: if I update the model in such a way it's now filtered out by the queryset, the reload will always fail. Consider:

# This viewset works as expected:classDraftViewSet(viewsets.ModelViewSet):queryset=Document.objects.filter(status=Document.DRAFT)serializer_class=DocumentSerializer# This viewset will *always* return a 404 when changing the document status# Actual database outcome depends on Django settings:#    -> if ATOMIC_REQUESTS is enabled, the model update is cancelled#    -> if ATOMIC_REQUESTS is disabled (default), the model remains updated despite the 404classDraftViewSet(viewsets.ModelViewSet):queryset=Document.objects.filter(status=Document.DRAFT).prefetch_related('attachments')serializer_class=DocumentSerializer

Only difference comes from adding theprefetch_related. Good luck tracking the root cause as a user.

@lovelydinosaur
Copy link
ContributorAuthor

lovelydinosaur commentedNov 7, 2016
edited
Loading

but not data pre-loaded through select_related

I've currently no reason to believe we have a behavioral bug there.

Or additional context loaded by third-party modules?

I'm not aware of any public API for doing so (other than ad-hoc assignments to the instance or queryset, which we shouldn't support), or have any reason to believe we have a bug there.

It seems odd that adding a prefetch_related() clause to my querysets, with the intent of reducing the number of queries, actually increases that number because it has the unpredictable effect of triggering a reload by DRF.

So don't performprefetch_related in update cases (modifyget_queryset to avoid it) - you'd end up with incorrect results if we didn't apply this fix, so you're in a better position anyway (It's just that now the equation is: "don't useprefetch_related on updates in order to avoid a DB reload", rather than "don't useprefetch_related on updates in order to avoid incorrect responses")

@lovelydinosaur
Copy link
ContributorAuthor

lovelydinosaur commentedNov 7, 2016
edited
Loading

It seems odd that adding a prefetch_related() clause to my querysets

Sure, but it's better than returning incorrect results.

It seems odd also that saving the object might succeed - potentially generating side-effects such as logging an audit entry, then raise a DoesNotExist exception if the object gets deleted in between the save and the reload.

If you think that we've behavior here that can beimproved, then sure, let's discuss that. Note that thereare already other similar cases, eg. application level validation might pass but database level validation fail in some race conditions.

if I update the model in such a way it's now filtered out by the queryset, the reload will always fail.

Grand, so let's get together a failing test case for that and then useinstance.refresh_from_db() instead in our implementation. (Started along that road at#4661)

(Even better, consider if we can alter the code here so that weonly invalidate theprefetch_related cache, without having to reload anything else - demonstrate that gives the correct behavior and we can happily take that path.)

@lovelydinosaur
Copy link
ContributorAuthor

Interested to know if#4668 addresses your issue sufficiently or not. Certainly a better approach.

@spectras
Copy link

Although it does leave open the case where prefetched data is not updatedand is used in the representation, it does fix the odd404 error.

I will still use a mixin, because I manually keep the prefetch cache up-to-date when I update a full set of objects:

instance._prefetched_objects_cache[instance.pictures.prefetch_cache_name]=pictures

But with the new approach:

  1. this should trigger much less queries, only prefetched data used in serialization will be actually reloaded.
  2. with this, maybe 1/3 of my viewsets will need the mixin, as opposed to all of them using the old approach.

That gives me an idea, posting on the other issue in 5 minutes.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

Projects

None yet

Milestone

3.5.0 Release

Development

Successfully merging this pull request may close these issues.

3 participants

@lovelydinosaur@spectras

[8]ページ先頭

©2009-2025 Movatter.jp