Tag:DataLoader

DataLoader for GraphQL Implementations

A popular library used in GraphQL implementations is called DataLoader, and in many ways the name is somewhat descriptive of its purpose. As described in the JavaScript repo for the Node.js implementation for GraphQL

“DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching.”

The DataLoader solvers the N+1 problem that otherwise requires a resolver to make multiple individual requests to a database (or data source, i.e. another API), resulting in inefficient and slow data retrieval.

A DataLoader serves as a batching and caching layer for combining multiple requests int a single request. Grouping together identical requests and executing them more efficiently, thus minimizing the number of database or API round trips.

DataLoader Operation:

  1. Create a new instance of DataLoader, specifying a batch loading function. This function would define how to load the data for a given set of keys.
  2. The resolver iterates through the collection and instead of fetching the related data adds the keys for the data to be fetched to the DataLoader instance.
  3. The DataLoader collects the keys and for multiple keys, deduplicates the request and executes.
  4. Once the batch is executed DataLoader returns the results associating them with their respective keys.
  5. The resolver can then access the response data and resolve the field or relationships as needed.

DataLoader also caches the results of the previous requests so if the same key is requested again DataLoader retrieves from cache instead of making another request. This caching further improves performance and reduces redundant fetching.

DataLoader Implementation Examples

JavaScript & Node.js

The following is a basic implementation using Apollo Server of DataLoader for GraphQL.

const { ApolloServer, gql } = require("apollo-server");const { DataLoader } = require("dataloader");// Simulated data sourceconst db = {  users: [    { id: 1, name: "John" },    { id: 2, name: "Jane" },  ],  posts: [    { id: 1, userId: 1, title: "Post 1" },    { id: 2, userId: 2, title: "Post 2" },    { id: 3, userId: 1, title: "Post 3" },  ],};// Simulated asynchronous data loader functionconst batchPostsByUserIds = async (userIds) => {  console.log("Fetching posts for user ids:", userIds);  const posts = db.posts.filter((post) => userIds.includes(post.userId));  return userIds.map((userId) => posts.filter((post) => post.userId === userId));};// Create a DataLoader instanceconst postsLoader = new DataLoader(batchPostsByUserIds);const resolvers = {  Query: {    getUserById: (_, { id }) => {      return db.users.find((user) => user.id === id);    },  },  User: {    posts: (user) => {      // Use DataLoader to load posts for the user      return postsLoader.load(user.id);    },  },};// Define the GraphQL schemaconst typeDefs = gql`  type User {    id: ID!    name: String!    posts: [Post]  }  type Post {    id: ID!    title: String!  }  type Query {    getUserById(id: ID!): User  }`;// Create Apollo Server instanceconst server = new ApolloServer({ typeDefs, resolvers });// Start the serverserver.listen().then(({ url }) => {  console.log(`Server running at ${url}`);});

This example I created a DataLoader instancepostsLoader using theDataLoaderclass from thedataloader package. I define a batch loading functionbatchPostsByUserIds that takes an array of user IDs and retrieves the corresponding posts for each user from thedb.posts array. The function returns an array of arrays, where each sub-array contains the posts for a specific user.

In theUser resolver I user theload method of DataLoader to load the posts for a user. Theload method handles batching and caching behind the scenes, ensuring that redundant requests are minimized and results are cached for subsequent requests.

When the GraphQL server receives a query for theposts field of aUser the DataLoader automatically batches the requests for multiple users and executes the batch loading function to retrieve the posts.

This example demonstrates a very basic implementation of DataLoader in a GraphQL server. In a real-world scenario there would of course be a number of additional capabilities and implementation details that you’d need to work on for your particular situation.

Spring Boot Java Implementation

Just furthering the kinds of examples, the following is a Spring Boot example.

First add the dependencies.

<dependencies>  <!-- GraphQL for Spring Boot -->  <dependency>    <groupId>com.graphql-java</groupId>    <artifactId>graphql-spring-boot-starter</artifactId>    <version>5.0.2</version>  </dependency>    <!-- DataLoader -->  <dependency>    <groupId>org.dataloader</groupId>    <artifactId>dataloader</artifactId>    <version>3.4.0</version>  </dependency></dependencies>

Next create the components and configure DataLoader.

import com.graphql.spring.boot.context.GraphQLContext;import graphql.servlet.context.DefaultGraphQLServletContext;import org.dataloader.BatchLoader;import org.dataloader.DataLoader;import org.dataloader.DataLoaderRegistry;import org.springframework.boot.SpringApplication;import org.springframework.boot.autoconfigure.SpringBootApplication;import org.springframework.context.annotation.Bean;import org.springframework.web.context.request.WebRequest;import java.util.List;import java.util.concurrent.CompletableFuture;import java.util.concurrent.CompletionStage;import java.util.stream.Collectors;@SpringBootApplicationpublic class DataLoaderExampleApplication {  // Simulated data source  private static class Db {    List<User> users = List.of(        new User(1, "John"),        new User(2, "Jane")    );    List<Post> posts = List.of(        new Post(1, 1, "Post 1"),        new Post(2, 2, "Post 2"),        new Post(3, 1, "Post 3")    );  }  // User class  private static class User {    private final int id;    private final String name;    User(int id, String name) {      this.id = id;      this.name = name;    }    int getId() {      return id;    }    String getName() {      return name;    }  }  // Post class  private static class Post {    private final int id;    private final int userId;    private final String title;    Post(int id, int userId, String title) {      this.id = id;      this.userId = userId;      this.title = title;    }    int getId() {      return id;    }    int getUserId() {      return userId;    }    String getTitle() {      return title;    }  }  // DataLoader batch loading function  private static class BatchPostsByUserIds implements BatchLoader<Integer, List<Post>> {    private final Db db;    BatchPostsByUserIds(Db db) {      this.db = db;    }    @Override    public CompletionStage<List<List<Post>>> load(List<Integer> userIds) {      System.out.println("Fetching posts for user ids: " + userIds);      List<List<Post>> result = userIds.stream()          .map(userId -> db.posts.stream()              .filter(post -> post.getUserId() == userId)              .collect(Collectors.toList()))          .collect(Collectors.toList());      return CompletableFuture.completedFuture(result);    }  }  // GraphQL resolver  private static class UserResolver implements GraphQLResolver<User> {    private final DataLoader<Integer, List<Post>> postsDataLoader;    UserResolver(DataLoader<Integer, List<Post>> postsDataLoader) {      this.postsDataLoader = postsDataLoader;    }    List<Post> getPosts(User user) {      return postsDataLoader.load(user.getId()).join();    }  }  // GraphQL configuration  @Bean  public GraphQLSchemaProvider graphQLSchemaProvider() {    return (graphQLSchemaBuilder, environment) -> {      // Define the GraphQL schema      GraphQLObjectType userObjectType = GraphQLObjectType.newObject()          .name("User")          .field(field -> field.name("id").type(Scalars.GraphQLInt))          .field(field -> field.name("name").type(Scalars.GraphQLString))          .field(field -> field.name("posts").type(new GraphQLList(postObjectType)))          .build();      GraphQLObjectType postObjectType = GraphQLObjectType.newObject()          .name("Post")          .field(field -> field.name("id").type(Scalars.GraphQLInt))          .field(field -> field.name("title").type(Scalars.GraphQLString))          .build();      GraphQLObjectType queryObjectType = GraphQLObjectType.newObject()          .name("Query")          .field(field -> field.name("getUserById")              .type(userObjectType)              .argument(arg -> arg.name("id").type(Scalars.GraphQLInt))              .dataFetcher(environment -> {                // Retrieve the requested user ID                int userId = environment.getArgument("id");                // Fetch the user by ID from the data source                Db db = new Db();                return db.users.stream()                    .filter(user -> user.getId() == userId)                    .findFirst()                    .orElse(null);              }))          .build();      return graphQLSchemaBuilder.query(queryObjectType).build();    };  }  // DataLoader registry bean  @Bean  public DataLoaderRegistry dataLoaderRegistry() {    DataLoaderRegistry dataLoaderRegistry = new DataLoaderRegistry();    Db db = new Db();    dataLoaderRegistry.register("postsDataLoader", DataLoader.newDataLoader(new BatchPostsByUserIds(db)));    return dataLoaderRegistry;  }  // GraphQL context builder  @Bean  public GraphQLContext.Builder graphQLContextBuilder(DataLoaderRegistry dataLoaderRegistry) {    return new GraphQLContext.Builder().dataLoaderRegistry(dataLoaderRegistry);  }  public static void main(String[] args) {    SpringApplication.run(DataLoaderExampleApplication.class, args);  }}

This example I define theDb class as a simulated data source withusers andposts lists. I create aBatchPostsByUserIds class that implements theBatchLoader interface from DataLoader for batch loading of posts based on user IDs.

TheUserResolver class is a GraphQL resolver that uses thepostsDataLoader to load posts for a specific user.

For the configuration I define the schema usingGraphQLSchemaProvider and createGraphQLObjectType forUser andPost, andQuery object type with a resolver for thegetUserById field.

ThedataLoaderRegistry bean registers thepostsDataLoader with the DataLoader registry.

This implementation will efficiently batch and cache requests for loading posts based on user IDs.

References

Other GraphQL Standards, Practices, Patterns, & Related Posts