The N+1 problem is one of the most challenging and foundational problems for applications that use GraphQL’s API. It doesn’t affect application performance before it grows enough, and then it starts to be a critical issue. And that’s a major risk since solving it at that time will be difficult. In addition, the Django ORM itself adds an extra layer of problems since any model instance may load data when the field is accessed.
I wrote a few tips on how to continue use Graphene object types from Django models, but prevent or at least control N+1 queries.
Don't use DjangoObjectType from django-graphene
Using auto-generated types from models is very seductive. However, it mostly hides the problem, so detecting would be more complicated. Pure graphene ObjectType is powerful enough, and even this tip doesn’t solve the issue, it highlights it.
class PostObjectType(graphene.ObjectType):
id = graphene.ID(required=True)
name = graphene.String(required=True)
comments = graphene.List(graphene.NonNull(CommentObjectType), required=True)
@staticmethod
def resolve_comments(root, _info):
return root.comments.all() // <-- the line highlights problem
Force database usage inside "root" resolvers and mutations
Since we always have a single "root" resolver or mutation, retrieving all required data in them would be the ideal solution. That gives us a single and clear point for generating the most efficient database query.
For the example above, the "root" resolver might look like:
class Query(graphene.ObjectType):
posts = graphene.List(PostObjectType, required=True)
@staticmethod
def resolve_posts(_, info):
return Post.objects.prefetch_related("comments").all() // <-- calling prefetch_related here solves the problem
Managing ObjectType-level database access
However, missing prefetch_related
in the root resolver still produces an N+1 problem. Fortunately, we already have explicit resolvers for related fields, so we may extend it and check whether the field is cached or not.
class PostObjectType(graphene.ObjectType):
...
@staticmethod
def resolve_comments(root, _info):
if not is_field_prefetched("comments"):
logger.warning("Post.comments isn't prefetched, resolving it may produce N+1 problem")
return root.comments.all()
def is_field_prefetched(instance, field):
field = instance._meta.get_field(field)
if field.name in instance._state.fields_cache:
return True
elif field.name in getattr(instance, '_prefetched_objects_cache', {}):
return True
return False
While this example continues working without prefetch_related
call, it would log a warning that helps investigate and solve the problem. It mostly matters when PostObjectType is returned from several root resolvers or as a nested object field.
Use dataloaders
The dataloader pattern is widely known as the way to solve the N+1 problem. A brief explanation may be found here: https://docs.graphene-python.org/en/latest/execution/dataloader/
As well, dataloaders could be either an extra caching layer or a replacement. For the example above, using them will generate an extra query to load all comments, but it ensures that it will be only one query for both single object and list resolving.
class PostObjectType(graphene.ObjectType):
...
@staticmethod
def resolve_comments(root, _info):
dataloader = get_dataloader("post_comments")
return dataloader.load(root.id)
Although dataloaders are well-known and commonly used solutions to prevent N+1, I believe that it's not a silver bullet. Combining them with previous ways may be the best possible approach.
Conclusion
The N+1 problem isn’t extremely difficult to manage but requires attention. Following and combining provided tips could be summarised as "less magic and more logging". It helps avoid most performance issues related to the problem, and actually pretty clear to understand by anyone who investigating the code.
Top comments (1)
How tester can identify/reproduce and test this issue while testing GraphQL APIs?