graphql-java/java-dataloaderPublic

NotificationsYou must be signed in to change notification settings
Fork95
Star517

A Java 11 port of Facebook DataLoader

License

Apache-2.0 license

517 stars 95 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 500 Commits
.github		.github
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Repository files navigation

java-dataloader

This small and simple utility library is a pure Java 11 port ofFacebook DataLoader.

It can serve as integral part of your application's data layer to provide aconsistent API over various back-ends and reduce message communication overhead through batching and caching.

An important use case forjava-dataloader is improving the efficiency of GraphQL query execution. Graphql fieldsare resolved independently and, with a true graph of objects, you may be fetching the same object many times.

A naive implementation of graphql data fetchers can easily lead to the dreaded "n+1" fetch problem.

Most of the code is ported directly from Facebook's reference implementation, with one IMPORTANT adaptation to makeit work for Java 11. (more on this below).

Before reading on, be sure to take a short dive into theoriginal documentation provided by Lee Byron (@leebyron)and Nicholas Schrock (@schrockn) fromFacebook, the creators of the original data loader.

Features

java-dataloader is a feature-complete port of the Facebook reference implementation withone major difference. These features are:

Simple, intuitive API, using generics and fluent coding
Define batch load function with lambda expression
Schedule a load request in queue for batching
Add load requests from anywhere in code
Request returns aCompleteableFuture<V> of the requested value
Can create multiple requests at once
Caches load requests, so data is only fetched once
Can clear individual cache keys, so data is re-fetched on next batch queue dispatch
Can prime the cache with key/values, to avoid data being fetched needlessly
Can configure cache key function with lambda expression to extract cache key from complex data loader key types
Individual batch futures complete / resolve as batch is processed
Results are ordered according to insertion order of load requests
Deals with partial errors when a batch future fails
Can disable batching and/or caching in configuration
Can supply your ownCacheMap<K, V> implementations
Can supply your ownValueCache<K, V> implementations
Has very high test coverage

Getting started!

Installing

Gradle users configure thejava-dataloader dependency inbuild.gradle:

repositories {    mavenCentral()}dependencies {    compile 'com.graphql-java:java-dataloader: 4.0.0'}

Building

To build from source use the Gradle wrapper:

./gradlew clean build

Examples

ADataLoader object requires aBatchLoader function that is responsible for loading a promise of values givena list of keys

BatchLoader<Long,User>userBatchLoader =newBatchLoader<Long,User>() {@OverridepublicCompletionStage<List<User>>load(List<Long>userIds) {returnCompletableFuture.supplyAsync(() -> {returnuserManager.loadUsersById(userIds);                });            }        };DataLoader<Long,User>userLoader =DataLoaderFactory.newDataLoader(userBatchLoader);

You can then use it to load values which will beCompleteableFuture promises to values

CompletableFuture<User>load1 =userLoader.load(1L);

or you can use it to compose future computations as follows. The key requirement is that you calldataloader.dispatch() or its variantdataloader.dispatchAndJoin() at some point in order to make the underlying calls happen to the batch loader.

In this version of data loader, this does not happen automatically. More on this inManual dispatching .

userLoader.load(1L)                    .thenAccept(user -> {System.out.println("user = " +user);userLoader.load(user.getInvitedByID())                                .thenAccept(invitedBy -> {System.out.println("invitedBy = " +invitedBy);                                });                    });userLoader.load(2L)                    .thenAccept(user -> {System.out.println("user = " +user);userLoader.load(user.getInvitedByID())                                .thenAccept(invitedBy -> {System.out.println("invitedBy = " +invitedBy);                                });                    });userLoader.dispatchAndJoin();

As stated on the original Facebook project :

A naive application may have issued four round-trips to a backend for the required information,but with DataLoader this application will make at most two.

DataLoader allows you to decouple unrelated parts of your application without sacrificing theperformance of batch data-loading. While the loader presents an API that loads individual values, allconcurrent requests will be coalesced and presented to your batch loading function. This allows yourapplication to safely distribute data fetching requirements throughout your application andmaintain minimal outgoing data requests.

In the example above, the first call to dispatch will cause the batched user keys (1 and 2) to be fired at the BatchLoader function to load 2 users.

Since eachthenAccept callback made more calls touserLoader to get the "user they have invited", another 2 user keys are given at theBatchLoaderfunction for them.

In this case theuserLoader.dispatchAndJoin() is used to make a dispatch call, wait for it (aka join it), see if the data loader has more batched entries, (which is does)and then it repeats this until the data loader internal queue of keys is empty. At this point we have made 2 batched calls instead of the naive 4 calls we might have made ifwe did not "batch" the calls to load data.

Batching requires batched backing APIs

You will notice in our BatchLoader example that the backing service had the ability to get a list of users givena list of user ids in one call.

publicCompletionStage<List<User>>load(List<Long>userIds) {returnCompletableFuture.supplyAsync(() -> {returnuserManager.loadUsersById(userIds);                });            }

This is important consideration. By usingdataloader you have batched up the requests for N keys in a list of keys that can beretrieved at one time.

If you don't have batched backing services, then you can't be as efficient as possible as you will have to make N calls for each key.

BatchLoader<Long,User>lessEfficientUserBatchLoader =newBatchLoader<Long,User>() {@OverridepublicCompletionStage<List<User>>load(List<Long>userIds) {returnCompletableFuture.supplyAsync(() -> {//// notice how it makes N calls to load by single user id out of the batch of N keys//returnuserIds.stream()                           .map(id ->userManager.loadUserById(id))                           .collect(Collectors.toList());               });           }       };

That said, with key caching turn on (the default), it will still be more efficient usingdataloader than without it.

Calling the batch loader function with call context environment

Often there is a need to call the batch loader function with some sort of call context environment, such as the calling users securitycredentials or the database connection parameters.

You can do this by implementing aorg.dataloader.BatchLoaderContextProvider and using one ofthe batch loading interfaces such asorg.dataloader.BatchLoaderWithContext.

It will be given aorg.dataloader.BatchLoaderEnvironment parameter and it can then ask itfor the context object.

DataLoaderOptionsoptions =DataLoaderOptions.newOptions()            .setBatchLoaderContextProvider(() ->SecurityCtx.getCallingUserCtx()).build();BatchLoaderWithContext<String,String>batchLoader =newBatchLoaderWithContext<String,String>() {@OverridepublicCompletionStage<List<String>>load(List<String>keys,BatchLoaderEnvironmentenvironment) {SecurityCtxcallCtx =environment.getContext();returncallDatabaseForResults(callCtx,keys);            }        };DataLoader<String,String>loader =DataLoaderFactory.newDataLoader(batchLoader,options);

The batch loading code will now receive this environment object and it can be used to get context perhaps allowing itto connect to other systems.

You can also pass in context objects per load call. This will be captured and passed to the batch loader function.

You can gain access to them as a map by key or as the original list of context objects.

DataLoaderOptionsoptions =DataLoaderOptions.newOptions()           .setBatchLoaderContextProvider(() ->SecurityCtx.getCallingUserCtx()).build();BatchLoaderWithContext<String,String>batchLoader =newBatchLoaderWithContext<String,String>() {@OverridepublicCompletionStage<List<String>>load(List<String>keys,BatchLoaderEnvironmentenvironment) {SecurityCtxcallCtx =environment.getContext();//// this is the load context objects in map form by key// in this case [ keyA : contextForA, keyB : contextForB ]//Map<Object,Object>keyContexts =environment.getKeyContexts();//// this is load context in list form//// in this case [ contextForA, contextForB ]returncallDatabaseForResults(callCtx,keys);            }        };DataLoader<String,String>loader =DataLoaderFactory.newDataLoader(batchLoader,options);loader.load("keyA","contextForA");loader.load("keyB","contextForB");

Returning a Map of results from your batch loader

Often there is not a 1:1 mapping of your batch loaded keys to the values returned.

For example, let's assume you want to load users from a database, you could probably use a query that looks like this:

SELECT*FROM UserWHERE idIN (keys)

Given say 10 user id keys you might only get 7 results back. This can be more naturally represented in a mapthan in an ordered list of values from the batch loader function.

You can useorg.dataloader.MappedBatchLoader for this purpose.

When the map is processed by theDataLoader code, any keys that are missing in the mapwill be replaced with null values. The semantic that the number ofDataLoader.load requestsare matched with an equal number of values is kept.

The keys provided MUST be first class keys since they will be used to examine the returned map andcreate the list of results, with nulls filling in for missing values.

MappedBatchLoaderWithContext<Long,User>mapBatchLoader =newMappedBatchLoaderWithContext<Long,User>() {@OverridepublicCompletionStage<Map<Long,User>>load(Set<Long>userIds,BatchLoaderEnvironmentenvironment) {SecurityCtxcallCtx =environment.getContext();returnCompletableFuture.supplyAsync(() ->userManager.loadMapOfUsersByIds(callCtx,userIds));            }        };DataLoader<Long,User>userLoader =DataLoaderFactory.newMappedDataLoader(mapBatchLoader);// ...

Returning a stream of results from your batch publisher

It may be that your batch loader function can use aReactive Streams Publisher, where values are emitted as an asynchronous stream.

For example, let's say you wanted to load many users from a service without forcing the service to load allusers into its memory (which may exert considerable pressure on it).

Aorg.dataloader.BatchPublisher may be used to load this data:

BatchPublisher<Long,User>batchPublisher =newBatchPublisher<Long,User>() {@Overridepublicvoidload(List<Long>userIds,Subscriber<User>userSubscriber) {Publisher<User>userResults =userManager.streamUsersById(userIds);userResults.subscribe(userSubscriber);            }        };DataLoader<Long,User>userLoader =DataLoaderFactory.newPublisherDataLoader(batchPublisher);// ...

Rather than waiting for all user values to be returned on one batch, thisDataLoader will completetheCompletableFuture<User> returned byDataloader#load(Long) as each value ispublished.

This pattern means that data loader values can (in theory) be satisfied more quickly than if we wait forall results in the batch to be retrieved and hence the overall result may finish more quickly.

If an exception is thrown, the remaining futures yet to be completed are completedexceptionally.

YouMUST ensure that the values are streamed in the same order as the keys provided,with the same cardinality (i.e. the number of values must match the number of keys).

Failing to do so will result in incorrect data being returned fromDataLoader#load.

BatchPublisher is the reactive version ofBatchLoader.

Returning a mapped stream of results from your batch publisher

Your publisher may not necessarily return values in the same order in which it processes keys and itmay not be able to find a value for each key presented.

For example, let's say your batch publisher function loads user data which is spread across shards,with some shards responding more quickly than others.

In instances like these,org.dataloader.MappedBatchPublisher can be used.

MappedBatchPublisher<Long,User>mappedBatchPublisher =newMappedBatchPublisher<Long,User>() {@Overridepublicvoidload(Set<Long>userIds,Subscriber<Map.Entry<Long,User>>userEntrySubscriber) {Publisher<Map.Entry<Long,User>>userEntries =userManager.streamUsersById(userIds);userEntries.subscribe(userEntrySubscriber);            }        };DataLoader<Long,User>userLoader =DataLoaderFactory.newMappedPublisherDataLoader(mappedBatchPublisher);// ...

Like theBatchPublisher, if an exception is thrown, the remaining futures yet to be completed are completedexceptionally.

Unlike theBatchPublisher, however, it is not necessary to return values in the same order as the provided keys,or even the same number of values.

MappedBatchPublisher is the reactive version ofMappedBatchLoader.

Error object is not a thing in a type safe Java world

In the reference JS implementation if the batch loader returns anError object back from theload() promise is rejectedwith that error. This allows fine grain (per object in the list) sets of error. If I ask for keys A,B,C and B errors out the promisefor B can contain a specific error.

This is not quite as loose in a Java implementation as Java is a type safe language.

A batch loader function is defined asBatchLoader<K, V> meaning for a key of typeK it returns a value of typeV.

It can't just return someException as an object of typeV. Type safety matters.

However, you can use theTry data type which can encapsulate a computation that succeeded or returned an exception.

Try<String>tryS =Try.tryCall(() -> {if (rollDice()) {return"OK";            }else {thrownewRuntimeException("Bang");            }        });if (tryS.isSuccess()) {System.out.println("It work " +tryS.get());        }else {System.out.println("It failed with exception :  " +tryS.getThrowable());        }

DataLoader supports this type, and you can use this form to create a batch loader that returns a list ofTry objects, some of which may have succeeded,and some of which may have failed. From that data loader can infer the right behavior in terms of theload(x) promise.

DataLoader<String,User>dataLoader =DataLoaderFactory.newDataLoaderWithTry(newBatchLoader<String,Try<User>>() {@OverridepublicCompletionStage<List<Try<User>>>load(List<String>keys) {returnCompletableFuture.supplyAsync(() -> {List<Try<User>>users =newArrayList<>();for (Stringkey :keys) {Try<User>userTry =loadUser(key);users.add(userTry);                    }returnusers;                });            }        });

On the above example if one of theTry objects represents a failure, then itsload() promise will complete exceptionally and you canreact to that, in a type safe manner.

Caching

DataLoader has a two tiered caching system in place.

The first cache is represented by the interfaceorg.dataloader.CacheMap. It will cacheCompletableFutures by key and hence futureload(key) callswill be given the same future and hence the same value.

This cache can only work local to the JVM, since its cachesCompletableFutures which cannot be serialised across a network say.

The second level cache is a value cache represented by the interfaceorg.dataloader.ValueCache. By default, this is not enabled and is a no-op.

The value cache uses an async API pattern to encapsulate the idea that the value cache could be in a remote place such as REDIS or Memcached.

Custom future caches

The default future cache behindDataLoader is an in memoryHashMap. There is no expiry on this, and it lives for as long as the data loaderlives.

However, you can create your own custom future cache and supply it to the data loader on construction via theorg.dataloader.CacheMap interface.

MyCustomCachecustomCache =newMyCustomCache();DataLoaderOptionsoptions =DataLoaderOptions.newOptions().setCacheMap(customCache).build();DataLoaderFactory.newDataLoader(userBatchLoader,options);

You could choose to use one of the fancy cache implementations from Guava or Caffeine and wrap it in aCacheMap wrapper readyfor data loader. They can do fancy things like time eviction and efficient LRU caching.

As stated above, a customorg.dataloader.CacheMap is a local cache ofCompleteFutures to values, not values per se.

If you want to externally cache values then you need to use theorg.dataloader.ValueCache interface.

Custom value caches

Theorg.dataloader.ValueCache allows you to use an external cache.

The API ofValueCache has been designed to be asynchronous because it is expected that the value cache could be outsideyour JVM. It usesCompleteableFutures to get and set values into cache, which may involve a network call and hence exceptional failures to getor set values.

TheValueCache API is batch oriented, if you have a backing cache that can do batch cache fetches (such a REDIS) then you can use theValueCache.getValues*(call directly. However, if you don't have such a backing cache, then the default implementation will break apart the batch of cache value into individual requeststoValueCache.getValue() for you.

This library does not ship with any implementations ofValueCache because it does not want to haveproduction dependencies on external cache libraries, but you can easily write your own.

The tests have an example based onCaffeine.

Disabling caching

In certain uncommon cases, a DataLoader which does not cache may be desirable.

DataLoaderFactory.newDataLoader(userBatchLoader,DataLoaderOptions.newOptions().setCachingEnabled(false).build());

Calling the above will ensure that every call to.load() will produce a new promise, and requested keys will not be saved in memory.

However, when the memoization cache is disabled, your batch function will receive an array of keys which may contain duplicates! Each key willbe associated with each call to.load(). Your batch loader MUST provide a value for each instance of the requested key as per the contract

userDataLoader.load("A");userDataLoader.load("B");userDataLoader.load("A");userDataLoader.dispatch();// will result in keys to the batch loader with [ "A", "B", "A" ]

More complex cache behavior can be achieved by calling.clear() or.clearAll() rather than disabling the cache completely.

Caching errors

If a batch load fails (that is, a batch function returns a rejected CompletionStage), then the requested values will not be cached.However, if a batch function returns aTry orThrowable instance for an individual value, then that will be cached to avoid frequently loadingthe same problem object.

In some circumstances you may wish to clear the cache for these individual problems:

userDataLoader.load("r2d2").whenComplete((user,throwable) -> {if (throwable !=null) {userDataLoader.clear("r2dr");throwable.printStackTrace();            }else {processUser(user);            }        });

Statistics on what is happening

DataLoader keeps statistics on what is happening. It can tell you the number of objects asked for, the cache hit number, the number of objectsasked for via batching and so on.

Knowing what the behaviour of your data is important for you to understand how efficient you are in serving the data via this pattern.

Statisticsstatistics =userDataLoader.getStatistics();System.out.println(format("load : %d",statistics.getLoadCount()));System.out.println(format("batch load: %d",statistics.getBatchLoadCount()));System.out.println(format("cache hit: %d",statistics.getCacheHitCount()));System.out.println(format("cache hit ratio: %d",statistics.getCacheHitRatio()));

DataLoaderRegistry can also roll up the statistics for all data loaders inside it.

You can configure the statistics collector used when you build the data loader

DataLoaderOptionsoptions =DataLoaderOptions.newOptions().setStatisticsCollector(() ->newThreadLocalStatisticsCollector()).build();DataLoader<String,User>userDataLoader =DataLoaderFactory.newDataLoader(userBatchLoader,options);

Which collector you use is up to you. It ships with the following:SimpleStatisticsCollector,ThreadLocalStatisticsCollector,DelegatingStatisticsCollectorandNoOpStatisticsCollector.

The scope of a data loader is important

If you are serving web requests then the data can be specific to the user requesting it. If you have user specific datathen you will not want to cache data meant for user A to then later give it user B in a subsequent request.

The scope of yourDataLoader instances is important. You will want to create them per web request to ensure data is only cached within thatweb request and no more.

If your data can be shared across web requests then use a customorg.dataloader.ValueCache to keep values in a common place.

Data loaders are stateful components that contain promises (with context) that are likely share the same affinity as the request.

Manual dispatching

The originalFacebook DataLoader was written in Javascript for NodeJS.

NodeJS is single-threaded in nature, but simulates asynchronous logic by invoking functions on separate threads in an event loop, as explainedin this post on StackOverflow.

NodeJS generates so-call 'ticks' in which queued functions are dispatched for execution, and FacebookDataLoader usesthenextTick() function in NodeJS toautomatically dequeue load requests and send them to the batch execution functionfor processing.

Here there is anIMPORTANT DIFFERENCE compared to howjava-dataloader operates!!

In NodeJS the batch preparation will not affect the asynchronous processing behaviour in any way. It will just preparebatches in 'spare time' as it were.

This is different in Java as you will actuallydelay the execution of your load requests, until the moment where you make acall todataLoader.dispatch().

Does this make JavaDataLoader any less useful than the reference implementation? We would argue this is not the case,and there are also gains to this different mode of operation:

In contrast to the NodeJS implementationyou as developer are in full control of when batches are dispatched
You can attach any logic that determines when a dispatch takes place
You still retain all other features, full caching support and batching (e.g. to optimize message bus traffic, GraphQL query execution time, etc.)

However, with batch execution control comes responsibility! If you forget to make the call todispatch() then the futuresin the load request queue will never be batched, and thuswill never complete! So be careful when crafting your loader designs.

The BatchLoader Scheduler

By default, whendataLoader.dispatch() is called, theBatchLoader /MappedBatchLoader function will be invokedimmediately.

However, you can provide your ownBatchLoaderScheduler that allows this call to be done some time intothe future.

You will be passed a callback (ScheduledBatchLoaderCall /ScheduledMapBatchLoaderCall) and you are expectedto eventually call this callback method to make the batch loading happen.

The following is aBatchLoaderScheduler that waits 10 milliseconds before invoking the batch loading functions.

newBatchLoaderScheduler() {@Overridepublic <K,V>CompletionStage<List<V>>scheduleBatchLoader(ScheduledBatchLoaderCall<V>scheduledCall,List<K>keys,BatchLoaderEnvironmentenvironment) {returnCompletableFuture.supplyAsync(() -> {snooze(10);returnscheduledCall.invoke();                }).thenCompose(Function.identity());            }@Overridepublic <K,V>CompletionStage<Map<K,V>>scheduleMappedBatchLoader(ScheduledMappedBatchLoaderCall<K,V>scheduledCall,List<K>keys,BatchLoaderEnvironmentenvironment) {returnCompletableFuture.supplyAsync(() -> {snooze(10);returnscheduledCall.invoke();                }).thenCompose(Function.identity());            }@Overridepublic <K>voidscheduleBatchPublisher(ScheduledBatchPublisherCallscheduledCall,List<K>keys,BatchLoaderEnvironmentenvironment) {snooze(10);scheduledCall.invoke();             }        };

You are given the keys to be loaded and an optionalBatchLoaderEnvironment for informative purposes. You can't change the list ofkeys that will be loaded via this mechanism say.

Also note, because there is a max batch size, it is possible for this scheduling to happen N times for a givendispatch()call. The total set of keys will be sliced into batches themselves and then theBatchLoaderScheduler will be called foreach batch of keys.

Do not assume that a single call todispatch() results in a single call toBatchLoaderScheduler.

This code is inspired from the scheduling code in thereference JS implementation

Scheduled Registry Dispatching

ScheduledDataLoaderRegistry is a registry that allows for dispatching to be done on a schedule. It contains apredicate that is evaluated (per data loader contained within) whendispatchAll is invoked.

If that predicate is true, it will make adispatch call on the data loader, otherwise is will schedule a task toperform that check again. Once a predicate evaluated to true, it will not reschedule and another call todispatchAll is required to be made.

This allows you to do things like "dispatch ONLY if the queue depth is > 10 deep or more than 200 millis have passedsince it was last dispatched".

DispatchPredicatedepthOrTimePredicate =DispatchPredicate            .dispatchIfDepthGreaterThan(10)            .or(DispatchPredicate.dispatchIfLongerThan(Duration.ofMillis(200)));ScheduledDataLoaderRegistryregistry =ScheduledDataLoaderRegistry.newScheduledRegistry()            .dispatchPredicate(depthOrTimePredicate)            .schedule(Duration.ofMillis(10))            .register("users",userDataLoader)            .build();

The above acts as a kind of minimum batch depth, with a time overload. It won't dispatch if the loader depth is lessthan or equal to 10 but if 200ms pass it will dispatch.

Chaining DataLoader calls

It's natural to want to have chainedDataLoader calls.

CompletableFuture<Object>chainedCalls =dataLoaderA.load("user1")        .thenCompose(userAsKey ->dataLoaderB.load(userAsKey));

However, the challenge here is how to be efficient in batching terms.

This is discussed in detail in the#54 issue.

Since CompletableFuture's are async and can complete at some time in the future, when is the best time to calldispatch again when a load call has completed to maximize batching?

The most naive approach is to immediately dispatch the second chained call as follows :

CompletableFuture<Object>chainedWithImmediateDispatch =dataLoaderA.load("user1")                .thenCompose(userAsKey -> {CompletableFuture<Object>loadB =dataLoaderB.load(userAsKey);dataLoaderB.dispatch();returnloadB;                });

The above will work however the window of batching together multiple calls todataLoaderB will be very small and sinceit will likely result in batch sizes of 1.

This is a very difficult problem to solve because you have to balance two competing design ideals which is to maximize thebatching window of secondary calls in a small window of time so you customer requests don't take longer than necessary.

If the batching window is wide you will maximize the number of keys presented to aBatchLoader but your request latency will increase.
If the batching window is narrow you will reduce your request latency, but also you will reduce the number of keys presented to aBatchLoader.

ScheduledDataLoaderRegistry ticker mode

TheScheduledDataLoaderRegistry offers one solution to this called "ticker mode" where it will continually reschedule secondaryDataLoader calls after the initialdispatch() call is made.

The batch window of time is controlled by the schedule duration setup at when theScheduledDataLoaderRegistry is created.

ScheduledExecutorServiceexecutorService =Executors.newSingleThreadScheduledExecutor();ScheduledDataLoaderRegistryregistry =ScheduledDataLoaderRegistry.newScheduledRegistry()        .register("a",dataLoaderA)        .register("b",dataLoaderB)        .scheduledExecutorService(executorService)        .schedule(Duration.ofMillis(10))        .tickerMode(true)// ticker mode is on        .build();CompletableFuture<Object>chainedCalls =dataLoaderA.load("user1")        .thenCompose(userAsKey ->dataLoaderB.load(userAsKey));

When ticker mode is on the chained dataloader calls will complete but the batching window size will depend on how quicklythe first level ofDataLoader calls returned compared to theschedule of theScheduledDataLoaderRegistry.

If you use ticker mode, then you MUSTregistry.close() on theScheduledDataLoaderRegistry at the end of the request (say) otherwiseit will continue to reschedule tasks to theScheduledExecutorService associated with the registry.

You will want to look at sharing theScheduledExecutorService in some way between requests when creating theScheduledDataLoaderRegistryotherwise you will be creating a thread perScheduledDataLoaderRegistry instance created and with enough concurrent requestsyou may create too many threads.

ScheduledDataLoaderRegistry dispatching algorithm

When ticker mode isfalse theScheduledDataLoaderRegistry algorithm is as follows :

Nothing starts scheduled - some code must callregistry.dispatchAll() a first time
Then for everyDataLoader in the registry
- TheDispatchPredicate is called to test if the data loader should be dispatched
  - if it returnsfalse then a task is scheduled to re-evaluate this specific dataloader in the near future
  - If it returnstrue, thendataLoader.dispatch() is called and the dataloader is not rescheduled again
The re-evaluation tasks are run periodically according to theregistry.getScheduleDuration()

When ticker mode istrue theScheduledDataLoaderRegistry algorithm is as follows:

Nothing starts scheduled - some code must callregistry.dispatchAll() a first time
Then for everyDataLoader in the registry
- TheDispatchPredicate is called to test if the data loader should be dispatched
  - if it returnsfalse then a task is scheduled to re-evaluate this specific dataloader in the near future
  - If it returnstrue, thendataLoader.dispatch() is calledand a task is scheduled to re-evaluate this specific dataloader in the near future
The re-evaluation tasks are run periodically according to theregistry.getScheduleDuration()

Instrumenting the data loader code

ADataLoader can have aDataLoaderInstrumentation associated with it. This callback interface is intended to provideinsight into working of theDataLoader such as how long it takes to run or to allow for logging of key events.

You set theDataLoaderInstrumentation into theDataLoaderOptions at build time.

DataLoaderInstrumentationtimingInstrumentation =newDataLoaderInstrumentation() {@OverridepublicDataLoaderInstrumentationContext<DispatchResult<?>>beginDispatch(DataLoader<?, ?>dataLoader) {longthen =System.currentTimeMillis();returnDataLoaderInstrumentationHelper.whenCompleted((result,err) -> {longms =System.currentTimeMillis() -then;System.out.println(format("dispatch time: %d ms",ms));                });            }@OverridepublicDataLoaderInstrumentationContext<List<?>>beginBatchLoader(DataLoader<?, ?>dataLoader,List<?>keys,BatchLoaderEnvironmentenvironment) {longthen =System.currentTimeMillis();returnDataLoaderInstrumentationHelper.whenCompleted((result,err) -> {longms =System.currentTimeMillis() -then;System.out.println(format("batch loader time: %d ms",ms));                });            }        };DataLoaderOptionsoptions =DataLoaderOptions.newOptions().setInstrumentation(timingInstrumentation).build();DataLoader<String,User>userDataLoader =DataLoaderFactory.newDataLoader(userBatchLoader,options);

The example shows how long the overallDataLoader dispatch takes or how long the batch loader takes to run.

Instrumenting the DataLoaderRegistry

You can also associate aDataLoaderInstrumentation with aDataLoaderRegistry. EveryDataLoader registered will be changed so that the registryDataLoaderInstrumentation is associated with it. This allows you to set just the oneDataLoaderInstrumentation in place and it applies to alldata loaders.

DataLoader<String,User>userDataLoader =DataLoaderFactory.newDataLoader(userBatchLoader);DataLoader<String,User>teamsDataLoader =DataLoaderFactory.newDataLoader(teamsBatchLoader);DataLoaderRegistryregistry =DataLoaderRegistry.newRegistry()            .instrumentation(timingInstrumentation)            .register("users",userDataLoader)            .register("teams",teamsDataLoader)            .build();DataLoader<String,User>changedUsersDataLoader =registry.getDataLoader("users");

ThetimingInstrumentation here will be associated with theDataLoader under the keyusers and the keyteams. Note that sinceDataLoader is immutable, a new changed object is created so you must use the registry to get theDataLoader.

Other information sources

Contributing

All your feedback and help to improve this project is very welcome. Please create issues for your bugs, ideas andenhancement requests, or better yet, contribute directly by creating a PR.

When reporting an issue, please add a detailed instruction, and if possible a code snippet or test that can be usedas a reproducer of your problem.

When creating a pull request, please adhere to the current coding style where possible, and create tests with yourcode so it keeps providing an excellent test coverage level. PR's without tests may not be accepted unless they onlydeal with minor changes.

Acknowledgements

This library was originally written for use within aVertX world and it used the vertx-coreFuture classes to implementitself. All the heavy lifting has been done by this project :vertx-dataloaderincluding the extensive testing (which itself came from Facebook).

This particular port was done to reduce the dependency on Vertx and to write a pure Java 11 implementation with no dependencies and alsoto use the more normative Java CompletableFuture.

vertx-core is not a lightweight library by any means so having a pure Java 11 implementation isvery desirable.

This library is entirely inspired by the great works ofLee Byron andNicholas Schrock fromFacebook whom we would like to thank, andespecially @leebyron for taking the time and effort to provide 100% coverage on the codebase. The original set of testswere also ported.

Licensing

This project is licensed under theApache Commons v2.0 license.

About

A Java 11 port of Facebook DataLoader

Releases27

6.0.0 Latest

Nov 5, 2025

+ 26 releases

Packages

No packages published

Contributors25

+ 11 contributors

Movatterモバイル変換

License

graphql-java/java-dataloader

Folders and files

Latest commit

History

Repository files navigation

java-dataloader

Table of contents

Features

Getting started!

Installing

Building

Examples

Batching requires batched backing APIs

Calling the batch loader function with call context environment

Returning a Map of results from your batch loader

Returning a stream of results from your batch publisher

Returning a mapped stream of results from your batch publisher

Error object is not a thing in a type safe Java world

Caching

Custom future caches

Custom value caches

Disabling caching

Caching errors

Statistics on what is happening

The scope of a data loader is important

Manual dispatching

The BatchLoader Scheduler

Scheduled Registry Dispatching

Chaining DataLoader calls

ScheduledDataLoaderRegistry ticker mode

ScheduledDataLoaderRegistry dispatching algorithm

Instrumenting the data loader code

Instrumenting the DataLoaderRegistry

Other information sources

Contributing

Acknowledgements

Licensing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases27

Packages0

Uh oh!

Contributors25

Uh oh!

Languages

Packages