- Notifications
You must be signed in to change notification settings - Fork20
Sen4AgriNet: A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning
License
Orion-AI-Lab/S4A
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning
Contributors:Sykas D.,Zografakis D.,Sdraka M.
Supplementary repo with DL experiments using the Sen4AgriNet dataset:Sen4AgriNet-Models.
This repository provides a native PyTorch Dataset Class for Sen4AgriNet dataset (patches_dataset.py
). Should work with any new version of PyTorch1.7.1+ and Python3.8.5+.
Dataset heavily relies oncocoapi for dataloading and indexing, therefore make sure you have it installed:
pip3installpycocotools
Then make sure every other requirement is installed:
pip3install-rrequirements.txt
In order to use the provided PyTroch Dataset class, the required netCDF files of Sen4AgriNet must be downloaded and placed inside thedataset/netcdf/
folder. These files are available for download atDropbox,Google Drive andHuggingFace Hub.
Then, three separate COCO files must be created: one for training, one for validation and one for testing. Alternatively, the predefined COCO files for the 3 Scenarios can be downloaded fromhere.
After this initial setup,patches_dataset.py
can be used in a PyTorch deep learning pipeline to load, prepare and return patches from the dataset according to the split dictated by the COCO files. This Dataset class has the following features:
- Reads the netCDF files of the dataset containing the Sentinel-2 observations over time and the corresponding labels.
- Isolates the Sentinel-2 bands requested by the user.
- Computes the median Sentinel-2 image on a given frequency, e.g. monthly (or loads precomputed medians, if any).
- Returns the timeseries of median images inside a predefined window.
- Normalizes the images.
- Returns hollstein masks for clouds, cirrus, shadow or snow.
- Returns a parcel mask: 1 for parcel, 0 for non-parcel.
- Can alternatively return binary labels: 1 for crops, 0 for non-crops.
This is roughly the way that ourpatches_dataset.py
works. The whole procedure is also described in the providednotebook.
- Open a netCDF file for exploration.
importnetCDF4frompathlibimportPathpatch=netCDF4.Dataset(Path('data/2020_31TCG_patch_14_14.nc'),'r')patch
Outputs
"""<class 'netCDF4._netCDF4.Dataset'>root group (NETCDF4 data model, file format HDF5): title: S4A Patch Dataset authors: Papoutsis I., Sykas D., Zografakis D., Sdraka M. patch_full_name: 2020_31TCG_patch_14_14 patch_year: 2020 patch_name: patch_14_14 patch_country_code: ES patch_tile: 31TCG creation_date: 27 Apr 2021 references: Documentation available at . institution: National Observatory of Athens. version: 21.03 _format: NETCDF4 _nco_version: netCDF Operators version 4.9.1 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco) _xarray_version: 0.17.0 dimensions(sizes): variables(dimensions): groups: B01, B02, B03, B04, B05, B06, B07, B08, B09, B10, B11, B12, B8A, labels, parcels"""
- Visualize a single timestamp.
importxarrayasxrband_data=xr.open_dataset(xr.backends.NetCDF4DataStore(patch['B02']))band_data.B02.isel(time=0).plot()
- Visualize the labels:
labels=xr.open_dataset(xr.backends.NetCDF4DataStore(patch['labels']))labels.labels.plot()
- Visualize the parcels:
parcels=xr.open_dataset(xr.backends.NetCDF4DataStore(patch['parcels']))parcels.parcels.plot()
- Plot the median of observations for each month:
importpandasaspd# Or maybe aggregate based on a given frequency# Refer to# https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliasesgroup_freq='1MS'# Grab year from netcdf4's global attributeyear=patch.patch_year# output intervalsdate_range=pd.date_range(start=f'{year}-01-01',end=f'{int(year)+1}-01-01',freq=group_freq)# Aggregate based on given frequencyband_data=band_data.groupby_bins('time',bins=date_range,right=True,include_lowest=False,labels=date_range[:-1]).median(dim='time')
If you plot right now, you might notice that some months are empty:
(Optional) Fill in empty months:
importmatplotlib.pyplotaspltband_data=band_data.interpolate_na(dim='time_bins',method='linear',fill_value='extrapolate')fig,axes=plt.subplots(nrows=3,ncols=4,figsize=(18,12))fori,seasoninenumerate(band_data.B02):ax=axes.flat[i]cax=band_data.B02.isel(time_bins=i).plot(ax=ax)fori,axinenumerate(axes.flat):ax.axes.get_xaxis().set_ticklabels([])ax.axes.get_yaxis().set_ticklabels([])ax.axes.axis('tight')ax.set_xlabel('')ax.set_ylabel('')ax.set_title(f'Month:{i+1}')plt.tight_layout()plt.show()
Please refer to the providednotebook for a detailed usage example of the providedPatchesDataset
.
- Read the COCO file to be used.
frompathlibimportPathfrompycocotools.cocoimportCOCOroot_path_coco=Path('coco_files/')coco_train=COCO(root_path_coco/'coco_example.json')
- Initialize the PatchesDataset.
fromtorch.utils.dataimportDataLoaderfrompatches_datasetimportPatchesDatasetfromutils.configimportLINEAR_ENCODERroot_path_netcdf=Path('dataset/netcdf')# Path to the netCDF filesdataset_train=PatchesDataset(root_path_netcdf=root_path_netcdf,coco=coco_train,group_freq='1MS',prefix='test_patchesdataset',bands=['B02','B03','B04'],linear_encoder=LINEAR_ENCODER,saved_medians=False,window_len=6,requires_norm=False,return_masks=False,clouds=False,cirrus=False,shadow=False,snow=False,output_size=(183,183) )
- Initialize the Dataloader.
dataloader_train=DataLoader(dataset_train,batch_size=1,shuffle=True,num_workers=4,pin_memory=True )
- Get a batch.
batch=next(iter(dataloader_train))
Thebatch
variable is a dictionary containing the keys:medians
,labels
,idx
.batch['medians']
contains a pytorch tensor of size[1, 6, 3, 183, 183]
where:
- batch size: 1
- timestamps: 6
- bands: 3
- height: 183
- width: 183
batch['labels']
contains the corresponding labels of the medians, which is a pytorch tensor of size[1, 183, 183]
where:
- batch size: 1
- height: 183
- width: 183
batch['idx']
contains the index of the returned timeseries.
Dataset Webpage:https://www.sen4agrinet.space.noa.gr/
Please visitSen4AgriNet-Models for a complete experimentation pipeline using the Sen4AgriNet dataset.
To cite please use:
@ARTICLE{ 9749916, author={Sykas, Dimitrios and Sdraka, Maria and Zografakis, Dimitrios and Papoutsis, Ioannis}, journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing}, title={A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning}, year={2022}, doi={10.1109/JSTARS.2022.3164771}}
About
Sen4AgriNet: A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.