Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork129
ReGrid File Storage
ReGrid is a distributed large file storage on top ofRethinkDB.ReGrid is similarly inspired byGridFS fromMongoDB. WithReGrid, a large 4GB file can be broken up into chunks and stored on aRethinkDB cluster. Later, the file can be retrieved by streaming the file back to the client. The figure below showsReGrid storing a large video file in chunks across a three node cluster.
(Note: Pleaseask before using figures in presentations, videos, or other works. Thanks.)
- Physical view refers to the low-level view of the physical topology, location, and layout of raw file data.
- Logical view refers to the high-level view of the file system's organization of files regardless of thephysical layout of data.
NuGet PackageRethinkDb.Driver.ReGrid
Install-Package RethinkDb.Driver.ReGrid
ABucket is a logical set of files organized together.Fileread/download andwrite/upload operations are performed using aBucket.
- ABucket requires aRethinkDBdatabase.
- ARethinkDBdatabase can be partitioned into severalBuckets.
- MultipleBuckets in the sameRethinkDBdatabase are differentiated by aBucket's name.
- The default name for aBucket is
fs
.
The figure below illustrates the logical separation of buckets within a singleMyFiles
database:
InFigure 2 above, there are three logical fileBucket stores in theMyFiles
RethinkDBdatabase. It is important to note thatvideo.mp4
from thefs
bucket isnot the same file asvideo.mp4
from thedev
bucket.Buckets can be used to organize files in any way app developers see fit.
To create aBucket nameddev
inMyFiles
simply:
varbucket=newBucket(conn,"MyFiles",bucketName:"dev");bucket.Mount();// required before use...
Mounting thedev
Bucket before use is required.Mount
is necessary to ensure the existence of tables and indexes.
A path is specified when aFile isuploaded into aBucket. Multipleuploads to thesame path cause the file to berevisioned.Figure 3 below shows/video.mp4
uploaded and revisioned 5 times.
Positive | Negative |
---|---|
0: The original stored file. 1: The first revision. 2: The second revision. etc... | - 1: The most recent revision. - 2: The second most recent revision. - 3: The third most recent revision. etc... |
The following code uploads a file to aBucket:
// Upload a file using byte[]varfileId=bucket.Upload("/video.mp4",videoBytes);// Upload a file using an IO streamGuiduploadId;using(varfileStream=File.Open("C:\\video.mp4",FileMode.Open))using(varuploadStream=bucket.OpenUploadStream("/video.mp4")){uploadId=uploadStream.FileInfo.Id;fileStream.CopyTo(uploadStream);}
fileId
will be the file reference for that specificrevision. There are many methods onbucket
that allow the use of IO streams andasync
methods.
UploadOptions
can be specified to control theChunkSizeBytes
. This value controls the size of the document chunks stored in theRethinkDB. Optionally, additional variableMetadata
can also be stored along with the uploaded file.
varopts=newUploadOptions();opts.SetMetadata(new{UserId="123",LastAccess=R.Now(),Roles=R.Array("admin","office"),ContentType="application/pdf"});varid=bucket.Upload(testFile,TestBytes.HalfChunk,opts);varfileInfo=bucket.GetFileInfo(id);fileInfo.Metadata["UserId"].Value<string>().Should().Be("123");
// Downloads to a byte[]varbytes=bucket.DownloadAsBytesByName("/video.mp4");// Download revision:0 to a file stream on the clientvarlocalFileStream=File.Open("C:\\video_original.mp4",FileMode.Create);bucket.DownloadToStreamByName("/video.mp4",localFileStream,revision:0);localFileStream.Close();
Caution usingDownloadAsBytes
as it returns abyte[]
withint.MaxValue
as a maximum size. For relatively large files useDownloadToStream
.DownloadToStream
does not have any maximum size limit beyond the host's OS limitations on the client side.
ReGrid supports starting downloads at an offset by seeking into part of a large file.
varopts=newDownloadOptions{Seekable=true};using(varstream=bucket.OpenDownloadStream("/video.mp4",options:opts)){stream.Seek(1024*1024*20,SeekOrigin.Begin);//start reading 20MB into the file...}
By default,ReGrid willSoft delete files. Below shows a few examples of how to delete a file inReGrid:
varfile=bucket.GetFileInfoByName(testfile);// Soft deletebucket.DeleteRevision(file.Id,mode:DeleteMode.Soft);// Hard deletebucket.DeleteRevision(file.Id,mode:DeleteMode.Hard);
Remember, multiple uploads to the same file path do not overwrite a file. Uploading files to the same path cause the file to berevisioned. Deleting a file is deleting arevision of that file.
A convenience methodDeleteAllRevisions
exists that deletes filerevisions one-by-one, iteratively. If there is a failure during the iterative deletion, somerevisions of the deleted files might still exist and may not appear fully removed from the file system.
Soft deletes simply set thestatus
flag of aFileInfo
document. This operation is fast and atomic.
Hard deletes, likeSoft deletes, set thestatus
flag of aFileInfo
document. However,Hard delete operations involve deleting multiple documents.RethinkDB only supports atomic operations per document. So, a full and completeHard delete on a logicalFile and its revision is inherently non-atomic at the physical layer. If theHard delete operation fails and is incomplete, theGridUtility
class contains operations to clean up and restart partially deleted files.
Recommended Usage: Always useSoft delete to delete files. Space can be reclaimed later by using theGridUtility
class to reclaim space occupied bySoft deleted files and associatedchunks. Ifoverwrite semantics are desired, delete the original file before uploading a new file to the same path.