- Notifications
You must be signed in to change notification settings - Fork11
Scratch spaces for all your persistent mutable data needs
License
JuliaPackaging/Scratch.jl
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This repository implements the scratch spaces API for package-specific mutable containers of data.These spaces can contain datasets, text, binaries, or any other kind of data that would be convenient to store in a location specific to your package.As compared toArtifacts, these containers of data are mutable.Because the scratch space location on disk is not very user-friendly, scratch spaces should, in general, not be used for a storing files that the user must interact with through a file browser.In that event, packages should simply write out to disk at a location given by the user.Scratch spaces are designed for data caches that are completely managed by a package and should be removed when the package itself is uninstalled.In the current implementation, scratch spaces are removed during Pkg garbage collection if the owning package has been removed.Users can also request a full wipe of all scratch spaces to clean up unused disk space throughclear_scratchspaces!()
, or a more targeted wipe of a particular package throughclear_scratchspaces!(pkg)
Scratch space usage is performed primarily through one function:get_scratch!()
.It provides a single interface for creating and getting previously-created spaces, either tied to a package by its UUID, or as a global scratch space that can be accessed by any package.Here is an example where a package creates a scratch space that is namespaced to its own UUID:
module ScratchExampleusing Scratch# This will be filled in inside `__init__()`download_cache=""# Downloads a resource, stores it within a scratchspacefunctiondownload_dataset(url) fname=joinpath(download_cache,basename(url))if!isfile(fname)download(url, fname)endreturn fnameendfunction__init__()global download_cache=@get_scratch!("downloaded_files")endend# module ScratchExample
Note that we initialize thedownload_cache
within__init__()
so that our packages are as relocatable as possible; we typically do not want to bake absolute paths into our precompiled files.This makes use of the@get_scratch!()
macro, which is identical to theget_scratch!()
method, except it automatically determines the UUID of the calling module, if possible. The user can manually pass in aModule
as well for a slightly more verbose incantation:
function__init__()global download_cache=get_scratch!(@__MODULE__,"downloaded_files")end
If a user wishes to manually delete a scratch space, the methoddelete_scratch!(key; pkg_uuid)
is the natural analog toget_scratch!()
, however in general users will not need to do so, the scratch spaces will be garbage collected byPkg.gc()
automatically.
For a full listing of docstrings and methods, see theScratch Space Reference section.
Yes, this is quite simple; just check the contents of the directory when you first callget_scratch!()
, and if it's empty, run your generation function:
using Scratchfunctionget_dataset_dir() dataset_dir=@get_scratch!("dataset")ifisempty(readdir(dataset_dir))perform_expensive_dataset_generation(dataset_dir)endreturn dataset_dirend
This ensures your package is resilient against situations such as scratch spaces being deleted by a user that has calledclear_scratchspaces!()
to free up disk space.
Yes! Make use of thekey
parameter and the version of your package at compile-time:
module VersionSpecificExampleusing Pkg.TOML, Scratch# Get the current version at compile-time, that's fine it's not going to change. ;)functionget_version()returnVersionNumber(TOML.parsefile(joinpath(dirname(@__DIR__),"Project.toml"))["version"])endconst pkg_version=get_version()# This will be filled in by `__init__()`; it might change if we get deployed somewhereconst version_specific_scratch=Ref{String}()function__init__()# This space will be unique between versions of my package that different major and# minor versions, but allows patch releases to share the same. scratch_name="data_for_version-$(pkg_version.major).$(pkg_version.minor)"global version_specific_scratch[]=@get_scratch!(scratch_name)endend# module
Artifacts should, in general, be used when dealing with storing data that is write-once, read-many times.Because Artifacts are read-only and are content-addressed, this enables very easy transmission of Artifacts from machine to machine, and is why we use them extensively in the package ecosystem.Scratch spaces, on the other hand, are mutable and not easily distributed, they should generally follow a write-many, read-many access pattern.Scratch spaces are well-suited for storing machine-specific data, such as compiled objects, results of host introspection, or user-specific data.
Yes! Once you're satisfied with your dataset that has been cooking inside a space, and you're ready to share it with the world as an immutable artifact, you can usecreate_artifact()
to create an artifact from the space,archive_artifact()
to get a tarball that you can upload somewhere, andbind_artifact!()
to write out anArtifacts.toml
that allows others to download and use it:
using Pkg, Scratch, Pkg.Artifactsfunctionexport_scratch(scratch_name::String, github_repo::String) scratch_dir=@get_scratch!(scratch_name)# Copy space directory over to an Artifact hash=create_artifact()do artifact_dirrm(artifact_dir)cp(scratch_dir, artifact_dir)end# Archive artifact out to a tarball. Since `upload_tarball()` is not a function that# exists, users must either write it themselves (uploading to whatever hosting# provider they prefer), or run each line of this `do`-block manually, upload the# tarball manually, record its URL, and pass that to `bind_artifact!()`.mktempdir()do upload_dir tarball_path=joinpath(upload_dir,"$(scratch_name).tar.gz") tarball_hash=archive_artifact(hash, tarball_path)# Upload tarball to a hosted site somewhere. Note; this function does not# exist, it's put here simply to show the flow of events. tarball_url=upload_tarball(tarball_path)# Bind artifact to an Artifacts.toml file in the current directory; this file can# be used by others to download and use your newly-created Artifact!bind_artifact!(joinpath(@__DIR__,"./Artifacts.toml"), scratch_name, hash; download_info=[(tarball_url, tarball_hash)], force=true, )endend
You can disable logging tologs/scratch_usage.toml
by settingJULIA_SCRATCH_TRACK_ACCESS
to0
in your environment.
The package is tested and works correctly with Julia 1.0 and above. However, Pkg's built-ingarbage collection, i.e.Pkg.gc()
, is only aware of scratchspaces forJulia 1.6 and above.
About
Scratch spaces for all your persistent mutable data needs
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Contributors12
Uh oh!
There was an error while loading.Please reload this page.