Movatterモバイル変換

bair-climate-initiative/multiearthPublic

NotificationsYou must be signed in to change notification settings
Fork16
Star158

Download and access remote sensing data from any platform

License

MIT license

158 stars 16 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
config		config
multiearth		multiearth
nbs		nbs
requirements		requirements
util		util
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Repository files navigation

MultiEarth: Download any remote sensing data from any provider.

🔥 Warning 🔥 This is an early alpha version of MultiEarth: things will change quickly, with little/no warning. The current MultiEarth explainer image above is aspirational: we're actively working on adding more data providers.

Quick Start

Install MultiEarth as a library and download about 18MB ofCopernicus DEM data from Microsoft Planetary Computer -- this small example should Just Work™ without any additional authentication.

# OPTIONAL: set up a conda environment (use at least python 3.7)conda create -n multiearth python=3.8 geopandasconda activate multiearthgit clone git@github.com:bair-climate-initiative/multiearth.gitcd multiearthpip install -e.# Take a look at the download using a dry run (you could also set dry_run in the config file):python multiearth/cli.py --config config/demo.yaml system.dry_run=True# If everything looks good, remove the dry_run and download Copernicus DEM data from Microsoft Planetary Computerpython multiearth/cli.py --config config/demo.yaml# see the extracted data in the output directoryls data/demo-extraction-dem-glo-90/cop-dem-glo-90/

Quick Explanation: The config we're providing,config/demo.yaml, contains a fully annotated example: take a look at it to get a sense of config options and how to control MultiEarth. While playing with MultiEarth, set the dryrun config option in order to display a summary of the assets without downloading anything, e.g.system.dry_run=True. Note that to download more/different data from Microsoft Planetary Computer, you'll want to authenticate with them (see the instructions underProvider Configurations).

Documentation

See theQuick Start instructions above and then consultconfig/demo.yaml for annotated configuration (we'll keep this annotated config updated).

MultiEarth Configuration

The following describes some common goals for configuring MultiEarth, such as specifying a data collection, geographical region, and timerange to extract data from, or specifying a provider to download data from. Consultconfig/demo.yaml for an annotated configuration. The configuration schemas are defined inmultiearth/config.py: take a look atConfigSchema.

Specifying the data to be downloaded takes place through thecollections config option for each provider list inproviders. For instance, to downloadCopernicus DEM from Microsoft Planetary Computer (MPC), which has the collection idcop-dem-glo-90 (see below on how to find this), the config could look like:

providers:# provider id  -id:MPC# collections describe the assets to extract.# the collection id, e.g. cop-dem-glo-90# is the id used to find the collection# in this case copernicus DEM global 90m# from the provider, in this case# Microsoft Planetary Computercollections:      -id:cop-dem-glo-90outdir:data/demo-extraction-dem-glo-90# Explicitly set the assets to be downloaded or use `all` to download all assets, like this:# assets:#  - allassets:          -data

Finding the provider id: seeProvider Configurations below.

Finding the collection id: This depends on the individual provider (seeProvider Configurations below).

Finding the assets:This depends on the individual provider (seeProvider Configurations below), but the following seems to be a pretty solid method:

Create a config with your desired collection id, set theassets option to["all"] like this (and settingmax_items to 1 to speed things up):

providers:  -id:MPCcollections:        -id:landsat-8-c2-l2assets:            -allmax_items:1...

Run a dry run to see what assets will be downloaded:

python multiearth/cli.py --config path/to/your/config.yaml system.dry_run=True

which will print out a list of assets that will be downloaded and their descriptions, e.g.:

To Extract:Microsoft Planetary Computer (MPC): landsat-8-c2-l2Collection               | Key                 | Description--------------------------------------------------------------------------------landsat-8-c2-l2          | ANG                 | Angle Coefficientslandsat-8-c2-l2          | SR_B1               | Coastal/Aerosol Band (B1) landsat-8-c2-l2          | SR_B2               | Blue Band (B2)landsat-8-c2-l2          | SR_B3               | Green Band (B3) ...

Let's say we want the RGB channels (see the descriptions), so we then update our config to download only the assets we want:

assets:    -SR_B2    -SR_B3    -SR_B4...

Selecting a Region and TimerangeSpecify region:

Use (something like)https://geojson.io to specify the region you care about in geojson format
Save to file, e.g.my_region.json
Set that file to theaoi_file key under the collection you want to extract or todefault_collection if you want to extract it for multiple collections.

Specify a timerange by using single date+time, or a range ('/' separator), formatted toRFC 3339, section 5.6. Seeconfig/example.yaml and it should be pretty clear. Use double dots .. for open date ranges.

Output Directory and Data Format:The saved data will be placed in the directory format{outdir}/{collection_id}/{item_id}/{asset_id}.{asset_appendix}.

Defaults when downloading multiple collectionsYou can specify adefault_collection in your config, which will be inherited by all collections that don't specify a specific key, e.g.

# fallback for each collection# where each of these entries can be overridden# in each collection config under "collections"default_collection:# will output to ${output}/collection_name/ by default, can override as an entry in the collection configoutdir:data/demo-extraction# default datetime range for each collection,# can override as an entry in the collection config# Single date+time, or a range ('/' separator),# formatted to RFC 3339, section 5.6.# Use double dots .. for open date ranges.datetime:2021-04-01/2021-04-23# default aoi for each collection (use geojson format - see geojson.io)# can override as an entry in the collection config# this demo contains a small section in Yosemiteaoi_file:config/aoi/demo.json# Max number of items# (not assets, e.g. each item could have 3 images)# to download. -1 for unlimited (or limit set)# by the providermax_items:-1

Dry run and DEBUG are your friend. You have lots of friends. When dialing in your configuration, keep thesystem.dry_run=True option on your call tomultiearth/cli.py (or set it in your config). Also, set thesystem.log_level=DEBUG option to see more verbose output.

Programmatic API Usage

Programmatic MultiEarth API usage is still under development, but very much a part of our roadmap. For now, you can roughly do the following (let us know if you're interested in API support and how you'd like to use MultiEarth in this context):

fromomegaconfimportOmegaConffrommultiearth.apiimportextract_assetsfrommultiearth.configimportConfigSchemadict_cfg=dict(providers=[dict(id="MPC",collections=[dict(id="cop-dem-glo-90",outdir="data",assets=["all"],aoi_file="config/aoi/demo.json",datetime="2021-04-01/2021-04-23",        )],    )])in_cfg=OmegaConf.create(dict_cfg)cfg_schema=OmegaConf.structured(ConfigSchema)cfg=OmegaConf.merge(cfg_schema,in_cfg)success=extract_assets(cfg)print("Successfully extracted assets."ifsuccesselse"Asset extraction failed.")

Provider Configurations

🔥 Warning 🔥 This is a very early alpha version of MultiEarth: there's only a few providers and meta-providers supported at the moment. Let us know if you need other providers and we can prioritize adding them.

Each provider needs its own authentication and setup. For each provider you'd like to use, follow the set-up instructions below.

Microsoft Planetary Computer (provider key: MPC)

Make sure to run the following and enter your api key (this helps increase the amount of data you can download from MPC).

planetarycomputer configure

Finding the collection id:

Go to theMPC Data Catalog
Find/click-on the desired collection
In the Collection Overview Page, click on the "Example Notebook" tab
The example notebook will contain an example of accessing the collection using the collection id.

NASA EarthData (provider key: EARTHDATA)

NASA EarthData provides access to a diverse range of subproviders (around 60!), where each subprovider has different data sources.

Access

For NASA EarthData, you need to create an account at:https://urs.earthdata.nasa.gov/
Note: if using data from the ASF subprovider, you must also accept the EULA by logging intohttps://auth.asf.alaska.edu/
Add a~/.netrc file (if it doesn't exist) and then append the following contents:

machine urs.earthdata.nasa.gov    login <username>    password <password>

EarthData is a provider of providers, so you must include asubprovider_id in yourkwargs argument to the provider, like the following example that accesses ASO data from NSIDC from EarthData (config/nsidc.yaml):

providers:  - id : EARTHDATA    kwargs:      subprovider_id: NSIDC_ECS    collections:      - id: ASO_50M_SD        assets:          - all

Finding the Provider ID: Consultearthdata_providers.py for a list of providers and their provider ids.

Finding the collection id: TODO (this depends on the provider and we need to figure out a general approach)

Radiant MLHub (provider key: RADIANT)

🔥 Warning 🔥 Radiant MLHub is under development and may be rough around the edges. Let us know if you have any issues.

To query and access the data, you need to obtain an api key fromRadiant MLHub. There are two ways to setup your api key with MultiEarth.

You can set an environment variableMLHUB_API_KEY as instructed by theofficial documentation
You can hardcode it as a kwargapi_key in the config under provider section

providers:  -id:RADIANTkwargs:api_key:<your_api_key>

Seeradiant_ml_landcover.yaml for an example of how to configure a Radiant MLHub collection.

Metloom (provider key: METLOOM)

🔥 Warning 🔥 Metloom is under development and may be rough around the edges. Let us know if you have any issues.

Metloom uses theMetloom library to pull data from theSNOTEL andCDEC data sources.

Seemetloom.yaml for an example of how to configure a Metloom collection. Checkmetloom.py for a list of valid SNOTEL and CDEC assets.

Contributing and Development

The general flow for development looks like this:

Read the Getting Started Guide - make sure you can sucessfully download some data, and make sure to install this repository in editable modepip install -e .
Create a new branch for your feature.
Edit the code.
Run linters and tests (see subsections below)
Commit your changes, push to the branch, and open a pull request.
???
Profit $$$

Linting

FollowingTorchGeo (and literally copying their docs), we use the following linters:

black for code formatting
isort for import ordering
pyupgrade for code formatting
flake8 for code formatting
pydocstyle for docstrings
mypy for static type analysis

Usegit pre-commit hooks to automatically run these checks before each commit. pre-commit is a tool that automatically runs linters locally, so that you don't have to remember to run them manually and then have your code flagged by CI. You can setup pre-commit with:

pip install pre-commitpre-commit installpre-commit run --all-files

Note: a small, but unexpected, oddity is that mypy runs differently as a precommit hook vs. as a standalone command (mypy .). Make sure both the pre-commit hook and the standalone command pass before committing.

Testing

Data sources and providers have integration tests that are implemented via Jupyter Notebooks. Serving as both an explanatory medium and documentation, notebooks inside thenbs folder serve as a great way to verify that

Required data is pulled correctly from the specified provider
Data is able to be loaded and processed (and therefore not corrupted on download)
A visualization is provided for users to know what their data should look like

To run notebook integration tests, follow these steps:

Install required test packages withpip install -e .[tests]
Execute pytest withpytest --nbmake nbs/*

Addings New Tests

When writing a test notebook, please ensure you are meeting the following criteria:

The notebook has a simple but descriptive file name.
Download the smallest amount of data/assets possible to successfully run the test.
- Avoid usingassets: all; usemax_items: 1 instead.
- Seenbs/grace-fo-plot.ipynb for an example.

Your tests may require additional libraries or dependencies to load or plot the data that are not required by the main MultiEarth library. To properly execute these tests, please add your dependencies to thesetup.cfg file under the[options.extras_require] -> tests section.

Automated Test Runs and Authentication

Tests are executed automatically viaGitHub Actions when a pull request is opened against themain orrelease branches. You are able to verify that your PR passes tests within the PR itself.

If you are adding a new provider to MultiEarth which requires credentials in order to pull data, please work with the project maintainers to add a test username and password to the proper dotfiles and Actions Secrets for test runners to appropriately pull the data.

Adding authentication for your tests will require editing the.github/workflows/nb_integration.yaml file. Work with the project maintainers to achieve this.

Useful links

Usehttps://geojson.io to extract a region-of-interest

OmegaConf References for Config

Passing in configs

Related Projects

Sat-Extractor. Sat-Extractor has a similar goal as MultiEarth, though at the moment it has been designed to run on Google Compute Engine, and as of the start of MultiEarth, Sat-Extractor can only be used with Sen2 and LandSats out-of-the-box. By starting MultiEarth with Microsoft's Planetary Computer, MultiEarth immediately has access to their full data catalog:https://planetarycomputer.microsoft.com/catalog (which subsumes the data accessible by Sat-Extractor plus ~100 other sources). Still, Sat-Extractor is an awesome and highly-configurable project: please use and support it if Sat-Extractor aligns with your goals =).
openEO is a very well done project. We'll eventually add them as a provider. A key difference is that we wanted anyone to be able to add a new provider/data-source by opening a PR, rather than integrating with the openEO API.

About

Download and access remote sensing data from any platform

Movatterモバイル変換

License

bair-climate-initiative/multiearth

Folders and files

Latest commit

History

Repository files navigation

MultiEarth: Download any remote sensing data from any provider.

Quick Start

Documentation

MultiEarth Configuration

Programmatic API Usage

Provider Configurations

Microsoft Planetary Computer (provider key: MPC)

NASA EarthData (provider key: EARTHDATA)

Radiant MLHub (provider key: RADIANT)

Metloom (provider key: METLOOM)

Contributing and Development

Linting

Testing

Addings New Tests

Automated Test Runs and Authentication

Useful links

OmegaConf References for Config

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors6

Uh oh!

Languages

Packages