- Notifications
You must be signed in to change notification settings - Fork0
allenai/olmoearth_projects
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository contains configuration files, model checkpoint references, anddocumentation for several remote sensing models built on top of OlmoEarth at Ai2. Italso includes tooling and tutorials for building new models using various components ofOlmoEarth.
The models available here are:
- Live Fuel Moisture Content Mapping
- Forest Loss Driver Classification
- Mangrove Mapping
- Ecosystem Type Mapping
- Land Use / Land Cover Mapping in Southern Kenya
The links above provide more details about the training data and intended use case foreach model.
Here are tutorials for applying OlmoEarth for new tasks:
- Fine-tuning OlmoEarth for Segmentation
- Computing Embeddings using OlmoEarth
- Fine-tuning OlmoEarth in rslearn
These tutorials use all or a subset of the components of OlmoEarth:
- olmoearth_pretrain, the OlmoEarthpre-trained model.
- rslearn, our tool for obtaining satelliteimages and other geospatial data from online data sources, and for fine-tuningremote sensing foundation models.
- olmoearth_run, our higher-levelinfrastructure that automates various steps on top of rslearn such as window creationand inference post-processing.
We recommend installing using uv. SeeInstalling uv forinstructions to install uv. Once uv is installed:
git clone https://github.com/allenai/olmoearth_projects.gitcd olmoearth_projectsuv syncsource .venv/bin/activateThere are three steps to applying the models in this repository:
- Customize the prediction request geometry, which specifies the spatial and temporalextent to run the model on.
- Execute the olmoearth_run steps to build an rslearn dataset for inference, and toapply the model on the dataset.
- Collect and visualize the outputs.
The configuration files for each project are stored underolmoearth_run_data/PROJECT_NAME/. There are three configuration files:
dataset.json: this is an rslearn dataset configuration file that specifies thetypes of satellite images that need to be downloaded to run the model, and how toobtain them. Most models rely on some combination of Sentinel-1 and Sentinel-2satellite images, and are configured to download those images from MicrosoftPlanetary Computer.model.yaml: this is an rslearn model configuration file that specifies the modelarchitecture, fine-tuning hyperparameters, data loading steps, etc.olmoearth_run.yaml: this is an olmoearth_run configuration file that specifies howthe prediction request geometry should be translated into rslearn windows, and howthe inference outputs should be combined together.
Some projects also include an exampleprediction_request_geometry.geojson, but thiswill need to be modified to specify your target region. The spatial extent is specifiedwith standard GeoJSON features; you can usegeojson.io to drawpolygons on a map and get the corresponding GeoJSON. The temporal extent is specifiedusing properties on each feature:
{"type":"FeatureCollection","properties": {},"features": [ {"type":"Feature","geometry": {// ... },"properties": {"oe_start_time":"2024-01-01T00:00:00+00:00","oe_end_time":"2024-02-01T00:00:00+00:00" }, } ]}Here, theoe_start_time andoe_end_time indicate that the prediction for thelocation of this feature should be based on satellite images around January 2024. Theper-model documentation details how these timestamps should be chosen. Some models likeforest loss driver classification provide project-specific tooling for generating theprediction request geometry.
Consult the per-model documentation to download the associated fine-tuned modelcheckpoint. For example:
mkdir ./checkpointswget https://huggingface.co/allenai/OlmoEarth-v1-FT-LFMC-Base/resolve/main/model.ckpt -O checkpoints/lfmc.ckptSet needed environment variables:
export NUM_WORKERS=32export WANDB_PROJECT=lfmcexport WANDB_NAME=lfmc_inference_runexport WANDB_ENTITY=YOUR_WANDB_ENTITYThen, execute olmoearth_run:
mkdir ./project_datapython -m olmoearth_projects.main olmoearth_run olmoearth_run --config_path $PWD/olmoearth_run_data/lfmc/ --checkpoint_path $PWD/checkpoints/lfmc.ckpt --scratch_path project_data/lfmc/The results directory (project_data/lfmc/results/results_raster/ in the example)should be populated with one or more GeoTIFFs. You can visualize this in GIS softwarelike qgis:
qgis project_data/lfmc/results/results_raster/*.tifWe have released model checkpoints for each of the fine-tuned models in thisrepository, but you can reproduce the model by fine-tuning the pre-trained OlmoEarthcheckpoint on each task training dataset.
First, consult the per-model documentation above for the URL of the rslearn dataset tarfile, and download and extract it. For example, for the LFMC model:
wget https://huggingface.co/datasets/allenai/olmoearth_projects_lfmc/blob/main/dataset.tartar xvf dataset.tarSet environment variables expected by the fine-tuning procedure (uses W&B)
export DATASET_PATH=/path/to/extracted/data/export NUM_WORKERS=32export TRAINER_DATA_PATH=./trainer_dataexport PREDICTION_OUTPUT_LAYER=outputexport WANDB_PROJECT=olmoearth_projectsexport WANDB_NAME=my_training_runexport WANDB_ENTITY=...Then run fine-tuning using the model configuration file in theolmoearth_run_data,e.g.:
rslearn model fit --config olmoearth_run_data/lfmc/model.yamlLosses and metrics should then be logged to your W&B. The checkpoint would be saved inthe TRAINER_DATA_PATH (e.g../trainer_data); two checkpoints should be saved, thelatest checkpoint (last.ckpt) and the best checkpoint (epoch=....ckpt). You can usethe best checkpoint for the Applying Existing Models section in lieu of the checkpointthat we proivde.
If training fails halfway, you can resume it fromlast.ckpt:
rslearn model fit --config olmoearth_run_data/lfmc/model.yaml --ckpt_path $TRAINER_DATA_PATH/last.ckptThis code is licensed under theOlmoEarth Artifact License.
About
OlmoEarth projects
Resources
License
Uh oh!
There was an error while loading.Please reload this page.