- Notifications
You must be signed in to change notification settings - Fork521
DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.
License
graphql/dataloader
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
DataLoader is a generic utility to be used as part of your application's datafetching layer to provide a simplified and consistent API over various remotedata sources such as databases or web services via batching and caching.
A port of the "Loader" API originally developed by@schrockn at Facebook in2010 as a simplifying force to coalesce the sundry key-value store back-endAPIs which existed at the time. At Facebook, "Loader" became one of theimplementation details of the "Ent" framework, a privacy-aware data entityloading and caching layer within web server product code. This ultimately becamethe underpinning for Facebook's GraphQL server implementation and typedefinitions.
DataLoader is a simplified version of this original idea implemented inJavaScript for Node.js services. DataLoader is often used when implementing agraphql-js service, though it is also broadly useful in other situations.
This mechanism of batching and caching data requests is certainly not unique toNode.js or JavaScript, it is also the primary motivation forHaxl, Facebook's data loading libraryfor Haskell. More about how Haxl works can be read in thisblog post.
DataLoader is provided so that it may be useful not just to build GraphQLservices for Node.js but also as a publicly available reference implementationof this concept in the hopes that it can be ported to other languages. If youport DataLoader to another language, please open an issue to include a link fromthis repository.
First, install DataLoader using npm.
npm install --save dataloader
To get started, create aDataLoader. EachDataLoader instance represents aunique cache. Typically instances are created per request when used within aweb-server likeexpress if different users can see different things.
Note: DataLoader assumes a JavaScript environment with global ES6
PromiseandMapclasses, available in all supported versions of Node.js.
Batching is not an advanced feature, it's DataLoader's primary feature.Create loaders by providing a batch loading function.
constDataLoader=require('dataloader');constuserLoader=newDataLoader(keys=>myBatchGetUsers(keys));
A batch loading function accepts an Array of keys, and returns a Promise whichresolves to an Array of values*.
Then load individual values from the loader. DataLoader will coalesce allindividual loads which occur within a single frame of execution (a single tickof the event loop) and then call your batch function with all requested keys.
constuser=awaituserLoader.load(1);constinvitedBy=awaituserLoader.load(user.invitedByID);console.log(`User 1 was invited by${invitedBy}`);// Elsewhere in your applicationconstuser=awaituserLoader.load(2);constlastInvited=awaituserLoader.load(user.lastInvitedID);console.log(`User 2 last invited${lastInvited}`);
A naive application may have issued four round-trips to a backend for therequired information, but with DataLoader this application will make at mosttwo.
DataLoader allows you to decouple unrelated parts of your application withoutsacrificing the performance of batch data-loading. While the loader presents anAPI that loads individual values, all concurrent requests will be coalesced andpresented to your batch loading function. This allows your application to safelydistribute data fetching requirements throughout your application and maintainminimal outgoing data requests.
A batch loading function accepts an Array of keys, and returns a Promise whichresolves to an Array of values or Error instances. The loader itself is providedas thethis context.
asyncfunctionbatchFunction(keys){constresults=awaitdb.fetchAllKeys(keys);returnkeys.map(key=>results[key]||newError(`No result for${key}`));}constloader=newDataLoader(batchFunction);
There are a few constraints this function must uphold:
- The Array of values must be the same length as the Array of keys.
- Each index in the Array of values must correspond to the same index in the Array of keys.
For example, if your batch function was provided the Array of keys:[ 2, 9, 6, 1 ],and loading from a back-end service returned the values:
{id:9,name:'Chicago'}{id:1,name:'New York'}{id:2,name:'San Francisco'}
Our back-end service returned results in a different order than we requested, likelybecause it was more efficient for it to do so. Also, it omitted a result for key6,which we can interpret as no value existing for that key.
To uphold the constraints of the batch function, it must return an Array of valuesthe same length as the Array of keys, and re-order them to ensure each index alignswith the original keys[ 2, 9, 6, 1 ]:
[{id:2,name:'San Francisco'},{id:9,name:'Chicago'},null,// or perhaps `new Error()`{id:1,name:'New York'},];
By default DataLoader will coalesce all individual loads which occur within asingle frame of execution before calling your batch function with all requestedkeys. This ensures no additional latency while capturing many related requestsinto a single batch. In fact, this is the same behavior used in Facebook'soriginal PHP implementation in 2010. SeeenqueuePostPromiseJob in thesource code for more details about how this works.
However sometimes this behavior is not desirable or optimal. Perhaps you expectrequests to be spread out over a few subsequent ticks because of an existing useofsetTimeout, or you just want manual control over dispatching regardless ofthe run loop. DataLoader allows providing a custom batch scheduler to providethese or any other behaviors.
A custom scheduler is provided asbatchScheduleFn in options. It must be afunction which is passed a callback and is expected to call that callback in theimmediate future to execute the batch request.
As an example, here is a batch scheduler which collects all requests over a100ms window of time (and as a consequence, adds 100ms of latency):
constmyLoader=newDataLoader(myBatchFn,{batchScheduleFn:callback=>setTimeout(callback,100),});
As another example, here is a manually dispatched batch scheduler:
functioncreateScheduler(){letcallbacks=[];return{schedule(callback){callbacks.push(callback);},dispatch(){callbacks.forEach(callback=>callback());callbacks=[];},};}const{ schedule, dispatch}=createScheduler();constmyLoader=newDataLoader(myBatchFn,{batchScheduleFn:schedule});myLoader.load(1);myLoader.load(2);dispatch();
DataLoader provides a memoization cache for all loads which occur in a singlerequest to your application. After.load() is called once with a given key,the resulting value is cached to eliminate redundant loads.
DataLoader cachingdoes not replace Redis, Memcache, or any other sharedapplication-level cache. DataLoader is first and foremost a data loading mechanism,and its cache only serves the purpose of not repeatedly loading the same data inthe context of a single request to your Application. To do this, it maintains asimple in-memory memoization cache (more accurately:.load() is a memoized function).
Avoid multiple requests from different users using the DataLoader instance, whichcould result in cached data incorrectly appearing in each request. Typically,DataLoader instances are created when a Request begins, and are not used once theRequest ends.
For example, when using withexpress:
functioncreateLoaders(authToken){return{users:newDataLoader(ids=>genUsers(authToken,ids)),};}constapp=express();app.get('/',function(req,res){constauthToken=authenticateUser(req);constloaders=createLoaders(authToken);res.send(renderPage(req,loaders));});app.listen();
Subsequent calls to.load() with the same key will result in that key notappearing in the keys provided to your batch function.However, the resultingPromise will still wait on the current batch to complete. This way both cachedand uncached requests will resolve at the same time, allowing DataLoaderoptimizations for subsequent dependent loads.
In the example below, User1 happens to be cached. However, because User1and2 are loaded in the same tick, they will resolve at the same time. Thismeans bothuser.bestFriendID loads will also happen in the same tick whichresults in two total requests (the same as if User1 had not been cached).
userLoader.prime(1,{bestFriend:3});asyncfunctiongetBestFriend(userID){constuser=awaituserLoader.load(userID);returnawaituserLoader.load(user.bestFriendID);}// In one part of your applicationgetBestFriend(1);// ElsewheregetBestFriend(2);
Without this optimization, if the cached User1 resolved immediately, thiscould result in three total requests since eachuser.bestFriendID load wouldhappen at different times.
In certain uncommon cases, clearing the request cache may be necessary.
The most common example when clearing the loader's cache is necessary is aftera mutation or update within the same request, when a cached value could be out ofdate and future loads should not use any possibly cached value.
Here's a simple example using SQL UPDATE to illustrate.
// Request begins...constuserLoader=newDataLoader(...);// And a value happens to be loaded (and cached).constuser=awaituserLoader.load(4);// A mutation occurs, invalidating what might be in cache.awaitsqlRun('UPDATE users WHERE id=4 SET username="zuck"');userLoader.clear(4);// Later the value load is loaded again so the mutated data appears.constuser=awaituserLoader.load(4);// Request completes.
If a batch load fails (that is, a batch function throws or returns a rejectedPromise), then the requested values will not be cached. However if a batchfunction returns anError instance for an individual value, thatError willbe cached to avoid frequently loading the sameError.
In some circumstances you may wish to clear the cache for these individual Errors:
try{constuser=awaituserLoader.load(1);}catch(error){if(/* determine if the error should not be cached */){userLoader.clear(1);}throwerror}
In certain uncommon cases, a DataLoader whichdoes not cache may be desirable.Callingnew DataLoader(myBatchFn, { cache: false }) will ensure that everycall to.load() will produce anew Promise, and requested keys will not besaved in memory.
However, when the memoization cache is disabled, your batch function willreceive an array of keys which may contain duplicates! Each key will beassociated with each call to.load(). Your batch loader should provide a valuefor each instance of the requested key.
For example:
constmyLoader=newDataLoader(keys=>{console.log(keys);returnsomeBatchLoadFn(keys);},{cache:false},);myLoader.load('A');myLoader.load('B');myLoader.load('A');// > [ 'A', 'B', 'A' ]
More complex cache behavior can be achieved by calling.clear() or.clearAll()rather than disabling the cache completely. For example, this DataLoader willprovide unique keys to a batch function due to the memoization cache beingenabled, but will immediately clear its cache when the batch function is calledso later requests will load new values.
constmyLoader=newDataLoader(keys=>{myLoader.clearAll();returnsomeBatchLoadFn(keys);});
As mentioned above, DataLoader is intended to be used as a per-request cache.Since requests are short-lived, DataLoader uses an infinitely growingMap asa memoization cache. This should not pose a problem as most requests areshort-lived and the entire cache can be discarded after the request completes.
However this memoization caching strategy isn't safe when using a long-livedDataLoader, since it could consume too much memory. If using DataLoader in thisway, you can provide a custom Cache instance with whatever behavior you prefer,as long as it follows the same API asMap.
The example below uses an LRU (least recently used) cache to limit total memoryto hold at most 100 cached values via thelru_map npm package.
import{LRUMap}from'lru_map';constmyLoader=newDataLoader(someBatchLoadFn,{cacheMap:newLRUMap(100),});
More specifically, any object that implements the methodsget(),set(),delete() andclear() methods can be provided. This allows for custom Mapswhich implement variouscache algorithms to be provided.
DataLoader creates a public API for loading data from a particulardata back-end with unique keys such as theid column of a SQL table ordocument name in a MongoDB database, given a batch loading function.
EachDataLoader instance contains a unique memoized cache. Use caution whenused in long-lived applications or those which serve many users with differentaccess permissions and consider creating a new instance per web request.
Create a newDataLoader given a batch loading function and options.
batchLoadFn: A function which accepts an Array of keys, and returns aPromise which resolves to an Array of values.
options: An optional object of options:
| Option Key | Type | Default | Description |
|---|---|---|---|
batch | Boolean | true | Set tofalse to disable batching, invokingbatchLoadFn with a single load key. This is equivalent to settingmaxBatchSize to1. |
maxBatchSize | Number | Infinity | Limits the number of items that get passed in to thebatchLoadFn. May be set to1 to disable batching. |
batchScheduleFn | Function | SeeBatch scheduling | A function to schedule the later execution of a batch. The function is expected to call the provided callback in the immediate future. |
cache | Boolean | true | Set tofalse to disable memoization caching, creating a new Promise and new key in thebatchLoadFn for every load of the same key. This is equivalent to settingcacheMap tonull. |
cacheKeyFn | Function | key => key | Produces cache key for a given load key. Useful when objects are keys and two objects should be considered equivalent. |
cacheMap | Object | new Map() | Instance ofMap (or an object with a similar API) to be used as cache. May be set tonull to disable caching. |
name | String | null | The name given to thisDataLoader instance. Useful for APM tools. |
Loads a key, returning aPromise for the value represented by that key.
- key: A key value to load.
Loads multiple keys, promising an array of values:
const[a,b]=awaitmyLoader.loadMany(['a','b']);
This is similar to the more verbose:
const[a,b]=awaitPromise.all([myLoader.load('a'),myLoader.load('b')]);
However it is different in the case where any load fails. WherePromise.all() would reject, loadMany() always resolves, however each resultis either a value or an Error instance.
var[a,b,c]=awaitmyLoader.loadMany(['a','b','badkey']);// c instanceof Error
- keys: An array of key values to load.
Clears the value atkey from the cache, if it exists. Returns itself formethod chaining.
- key: A key value to clear.
Clears the entire cache. To be used when some event results in unknowninvalidations across this particularDataLoader. Returns itself formethod chaining.
Primes the cache with the provided key and value. If the key already exists, nochange is made. (To forcefully prime the cache, clear the key first withloader.clear(key).prime(key, value).) Returns itself for method chaining.
To prime the cache with an error at a key, provide an Error instance.
DataLoader pairs nicely well withGraphQL. GraphQL fields aredesigned to be stand-alone functions. Without a caching or batching mechanism,it's easy for a naive GraphQL server to issue new database requests each time afield is resolved.
Consider the following GraphQL request:
{ me { name bestFriend { name } friends(first: 5) { name bestFriend { name } } }}Naively, ifme,bestFriend andfriends each need to request the backend,there could be at most 13 database requests!
When using DataLoader, we could define theUser type using theSQLite example with clearer code and at most 4 database requests,and possibly fewer if there are cache hits.
constUserType=newGraphQLObjectType({name:'User',fields:()=>({name:{type:GraphQLString},bestFriend:{type:UserType,resolve:user=>userLoader.load(user.bestFriendID),},friends:{args:{first:{type:GraphQLInt},},type:newGraphQLList(UserType),resolve:async(user,{ first})=>{constrows=awaitqueryLoader.load(['SELECT toID FROM friends WHERE fromID=? LIMIT ?',user.id,first,]);returnrows.map(row=>userLoader.load(row.toID));},},}),});
In many applications, a web server using DataLoader serves requests to manydifferent users with different access permissions. It may be dangerous to useone cache across many users, and is encouraged to create a new DataLoaderper request:
functioncreateLoaders(authToken){return{users:newDataLoader(ids=>genUsers(authToken,ids)),cdnUrls:newDataLoader(rawUrls=>genCdnUrls(authToken,rawUrls)),stories:newDataLoader(keys=>genStories(authToken,keys)),};}// When handling an incoming web request:constloaders=createLoaders(request.query.authToken);// Then, within application logic:constuser=awaitloaders.users.load(4);constpic=awaitloaders.cdnUrls.load(user.rawPicUrl);
Creating an object where each key is aDataLoader is one common pattern whichprovides a single value to pass around to code which needs to performdata loading, such as part of therootValue in agraphql-js request.
Occasionally, some kind of value can be accessed in multiple ways. For example,perhaps a "User" type can be loaded not only by an "id" but also by a "username"value. If the same user is loaded by both keys, then it may be useful to fillboth caches when a user is loaded from either source:
constuserByIDLoader=newDataLoader(asyncids=>{constusers=awaitgenUsersByID(ids);for(letuserofusers){usernameLoader.prime(user.username,user);}returnusers;});constusernameLoader=newDataLoader(asyncnames=>{constusers=awaitgenUsernames(names);for(letuserofusers){userByIDLoader.prime(user.id,user);}returnusers;});
Since DataLoader caches values, it's typically assumed these values will betreated as if they were immutable. While DataLoader itself doesn't enforcethis, you can create a higher-order function to enforce immutabilitywith Object.freeze():
functionfreezeResults(batchLoader){returnkeys=>batchLoader(keys).then(values=>values.map(Object.freeze));}constmyLoader=newDataLoader(freezeResults(myBatchLoader));
DataLoader expects batch functions which return an Array of the same length asthe provided keys. However this is not always a common return format from otherlibraries. A DataLoader higher-order function can convert from one format to another. The example below converts a{ key: value } result to the formatDataLoader expects.
functionobjResults(batchLoader){returnkeys=>batchLoader(keys).then(objValues=>keys.map(key=>objValues[key]||newError(`No value for${key}`)),);}constmyLoader=newDataLoader(objResults(myBatchLoader));
Looking to get started with a specific back-end? Try theloaders in the examples directory.
Listed in alphabetical order
- Elixir
- Golang
- Java
- .Net
- Perl
- PHP
- Python
- ReasonML
- Ruby
- Rust
- Swift
- C++
DataLoader Source Code Walkthrough (YouTube):
A walkthrough of the DataLoader v1 source code. While the source has changedsince this video was made, it is still a good overview of the rationale ofDataLoader and how it works.
This repository is managed by EasyCLA. Project participants must sign the free (GraphQL Specification Membership agreement before making a contribution. You only need to do this one time, and it can be signed byindividual contributors or theiremployers.
To initiate the signature process please open a PR against this repo. The EasyCLA bot will block the merge if we still need a membership agreement from you.
You can finddetailed information here. If you have issues, please emailoperations@graphql.org.
If your company benefits from GraphQL and you would like to provide essential financial support for the systems and people that power our community, please also consider membership in theGraphQL Foundation.
About
DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.
Topics
Resources
License
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
