- Notifications
You must be signed in to change notification settings - Fork0
complyue/jdfs
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Just Data FileSystem -JDFS is anetworkeduserspace filesystemwith responsibilities (such asaccess control) those beyond upright data availability & consistency, offloaded. Its purposehas a few implications, including:
- It's highly vulnerable if exposed to untrusted environments. When access mustcross trust boundaries, some other means, e.g.SSH tunnelingorVPNshould be implemented to guard the exposed mountpoints.
- Files and directories atjdfs host's local filesystem are exposed tojdfc with owner identity mapped, files ownend by the uid/gid running thejdfs process will appear atjdfc as if owned by the uid/gid mountedthe JDFS mountpoint, and file creation/reading/writing/deleting all followthis proxy relationship.
Simply deployed alone (1jdfs <=> njdfc), JDFS seeks to replaceNFSin manyHPCscenarios whereit sucks.
But the main purpose of JDFS is to contribute data focused, performance-criticalparts (i.e. components at various granularity, withjdfs - the service/server,andjdfc - the consumer/client, the most coarse ones) into analytical solutions(e.g. a homegrownarray database), with ease.
In my opinion, what’s going to happen over the next five years is thateveryone is going to move from business intelligence to data science,and this data will be a sea change from what I’ll call stupid analytics,to what I’ll call smart analytics, which is correlations, data clustering,predictive modeling, data mining, Bayes classification.
All of these words mean complex analytics. All that stuff is defined onarrays, and none of it is in SQL. So the world will move to smart analyticsfrom stupid analytics, and that’s where we are.
—— Michael Stonebraker2014
JDFS server is stateful, in contrast to NFS, ajdfs process basicallyproxies all file operations on behalf of thejdfc:
- fsync
- always mapped 1 to 1
- open/close
- mapped 1 to 1 fromjdfc on Linux
- forged by osxfuse fromjdfc on macOS
- read/write/mmap
- forged by all FUSE kernels with writeback cache enabled
Any new connection is treated by thejdfs as a fresh new mount, a fresh serverprocess is started to proxy all operations from the connectingjdfc.
And all server side states, including resource occupation from os perspective,will be naturally freed/released by means of that thejdfs process,just exits, once the underlying JDFS connection is disconnected.