Posted onMar 27, 2020 • Edited onSep 23, 2023

Caching & IOStrategy @ SwayDB

#java #scala #kotlin #storage

We all know that Memory IO is 50-200 times faster than Disk IO!

Caching plays a decent role in boostingread andcompaction performance but some machines & smaller devices (eg: mobile phones) might have not have enough memory for cache so aconfigurable cache allowingdisabled,partial orfull cache is required.

Note - ASegment(.seg) file in SwayDB is simply a byte array that stores other bytes arrays like keys, values, indexes etc (Array<Array<Byte>>). All these bytes can be cached based onany condition which is configurable.

Configuring IO and Cache

When accessingany file with a custom format we generally

Open the file (OpenResource)
Read the file header or info (ReadDataOverview) to understand the files content eg: format etc.
Finally read the content of the file (Compressed orUncompressed data).

The following sampleioStrategy function does exactly that where we get anIOAction that describes what IO is being performed by SwayDB and in our function we define how we want to perform IO (IOStrategy) for that action andalso configure caching for the read data/bytes.

.ioStrategy((IOActionioAction)->{if(ioAction.isOpenResource()){//Here we are just opening the file so do synchronised IO because//blocking when opening a file might be cheaper than thread//context switching. Also set cacheOnAccess to true so that other//concurrent threads accessing the same file channel do not//open multiple channels to the same file.returnnewIOStrategy.SynchronisedIO(true);}elseif(ioAction.isReadDataOverview()){//Data overview is always small and less than 128 bytes and can be//read sychronously to avoid switching threads. Also cache//this data (cacheOnAccess) for the benifit of other threads and to save IO.returnnewIOStrategy.SynchronisedIO(true);}else{//Here we are reading actual content of the file which can be compressed//or uncompressed.IOAction.DataActionaction=(IOAction.DataAction)ioAction;if(action.isCompressed()){//If the data is compressed we do not want multiple threads to concurrently//decompress it so perform either Async or Sync IO for decompression//and then cache the compressed data. You can also read the compressed//and decompressed size with the following code//IOAction.ReadCompressedData dataAction = (IOAction.ReadCompressedData) action;//dataAction.compressedSize();//dataAction.decompressedSize();returnnewIOStrategy.AsyncIO(true);}else{//Else the data is not compressed so we allow concurrent access to it.//Here cacheOnAccess can also be set to true but that could allow multiple//threads to concurrently cache the same data. If cacheOnAccess is required//then use Asyc or Sync IO instead.returnnewIOStrategy.ConcurrentIO(false);}}})

You will find the aboveioStrategy property inall data-blocks that form a Segment -SortedKeyIndex,RandomKeyIndex,BinarySearchIndex,MightContainIndex &ValuesConfig.

A Segment itself is also adata-block and it'sioStrategy can also be configured viaSegmentConfig.

Cache control/limit with MemoryCache

Caching should be controlled so that it does not lead to memory overflow!

You canenable ordisable caching for any or all of the following

Bytes within a Segment (ByteCacheOnly).
Parsed key-values (KeyValueCacheOnly).
Or all the above (MemoryCache.All).

By defaultByteCacheOnly is used becauseKeyValueCacheOnly uses an in-memorySkipList and inserts to a largeSkipList are expensive which is not useful for general use-case. ButKeyValueCacheOnly can be useful for applications that perform multiple reads to the same data and if that data rarely changes.

AnActor configuration is also required here which manages the cache in the background. You can configure the Actor to be aBasic,Timer orTimerLoop.

The following demoes how to configured all caches.

//Byte cache only.setMemoryCache(MemoryCache.byteCacheOnlyBuilder().minIOSeekSize(4096).skipBlockCacheSeekSize(StorageUnits.mb(4)).cacheCapacity(StorageUnits.gb(2)).actorConfig(newActorConfig.Basic((ExecutionContext)DefaultConfigs.sweeperEC())))//or key-value cache only.setMemoryCache(MemoryCache.keyValueCacheOnlyBuilder().cacheCapacity(StorageUnits.gb(3)).maxCachedKeyValueCountPerSegment(Optional.of(100)).actorConfig(newSome(newActorConfig.Basic((ExecutionContext)DefaultConfigs.sweeperEC()))))//or enable both the above..setMemoryCache(MemoryCache.allBuilder().minIOSeekSize(4096).skipBlockCacheSeekSize(StorageUnits.mb(4)).cacheCapacity(StorageUnits.gb(1)).maxCachedKeyValueCountPerSegment(Optional.of(100)).sweepCachedKeyValues(true).actorConfig(newActorConfig.Basic((ExecutionContext)DefaultConfigs.sweeperEC())))

`minIOSeekSize`

TheblockSize which set the minimum number of bytes to read for each IO. For example in the above configuration if you ask for6000 bytes then4096 * 2 bytes will be read.

The value to set depends on your machines block size. On Mac this can be read with the followingcommand:

diskutil info / | grep "Block Size"

which returns

Device Block Size: 4096 Bytes
Allocation Block Size: 4096 Bytes

`skipBlockCacheSeekSize`

This skips theBlockCache and perform direct IO if the data size is greater than this value.

`cacheCapacity`

Sets the total memory capacity. On overflow the oldest data in the cache is dropped by theActor.

`maxCachedKeyValueCountPerSegment`

If set, eachSegment is initialised with a dedicatedLimitSkipList. This cache is managed by theActor or by theSegment itself if it gets deleted or when the max limit is reached.

`sweepCachedKeyValues`

Enables clearing cached key-values via theActor. Iffalse, key-values are kept in-memory indefinitely unless theSegment gets deleted. This configuration can be used for smaller databases (eg: application configs) that read the same data more often.

Memory-mapping (MMAP)

MMAP can also be optionally enabled for all files.

Map<Integer,String,Void>map=MapConfig.functionsOff(Paths.get("myMap"),intSerializer(),stringSerializer()).setMmapAppendix(true)//enable MMAP for appendix files.setMmapMaps(true)//enable MMAP for LevelZero write-ahead log files.setSegmentConfig(//configuring MMAP for Segment filesSegmentConfig.builder()...//either disable memory-mapping Segments.mmap(MMAP.disabled())//or enable for writes and reads..mmap(MMAP.writeAndRead())//or enable for reads only..mmap(MMAP.readOnly())...).get();map.put(1,"one");map.get(1);//Optional[one]

Summary

You are in full control ofCaching &IO and can configure it to suit your application needs. If yourIOStrategy configurations uses onlyAsyncIO andConcurrentIO then you can truely buildreactive applications which are non-blocking end-to-end other than the file system IO performed byjava.nio.* classes. Support forLibio to provide aysnc file system IO can be implemented as a feature if requested.