- Notifications
You must be signed in to change notification settings - Fork32
scores: Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.
License
nci/scores
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A list of over 60 metrics, statistical techniques and data processing tools contained in
scores
isavailable here.
scores
is a Python package containing mathematical functions for the verification, evaluation and optimisation of forecasts, predictions or models. It supports labelled n-dimensional (multidimensional) data, which is used in many scientific fields and in machine learning. At present,scores
primarily supports the geoscience communities; in particular, the meteorological, climatological and oceanographic communities.
Documentation:scores.readthedocs.io
Source code:github.com/nci/scores
Tutorial gallery:available here
Journal article:scores: A Python package for verifying and evaluating models and predictions with xarray
Below is acurated selection of the metrics, tools and statistical tests included inscores
.(Click here for the full list.)
Description | Selection of Included Functions | |
---|---|---|
Continuous | Scores for evaluating single-valued continuous forecasts. | MAE, MSE, RMSE, Additive Bias, Multiplicative Bias, Percent Bias, Pearson's Correlation Coefficient, Kling-Gupta Efficiency, Flip-Flop Index, Quantile Loss, Quantile Interval Score, Interval Score, Murphy Score, and threshold weighted scores for expectiles, quantiles and Huber Loss. |
Probability | Scores for evaluating forecasts that are expressed as predictive distributions, ensembles, and probabilities of binary events. | Brier Score, Continuous Ranked Probability Score (CRPS) for Cumulative Density Functions (CDF) and ensembles (including threshold weighted versions), Receiver Operating Characteristic (ROC), Isotonic Regression (reliability diagrams). |
Categorical | Scores for evaluating forecasts of categories. | 18 binary contingency table (confusion matrix) metrics, the FIxed Risk Multicategorical (FIRM) Score, and the SEEPS score. |
Spatial | Scores that take into account spatial structure. | Fractions Skill Score. |
Statistical Tests | Tools to conduct statistical tests and generate confidence intervals. | Diebold Mariano. |
Processing Tools | Tools to pre-process data. | Data matching, Discretisation, Cumulative Density Function Manipulation. |
Emerging | Emerging scores that are still undergoing mathematical peer review. They may change in line with the peer review process. | Risk Matrix Score. |
scores
not only includes common scores (e.g., MAE, RMSE), it also includes novel scores not commonly found elsewhere (e.g., FIRM, Flip-Flop Index), complex scores (e.g., threshold weighted CRPS), and statistical tests (e.g., the Diebold Mariano test). Additionally, it provides pre-processing tools for preparing data for scores in a variety of formats including cumulative distribution functions (CDF).scores
provides its own implementations where relevant to avoid extensive dependencies.
scores
primarily supports xarray datatypes for Earth system data allowing it to work with NetCDF4, HDF5, Zarr and GRIB data formats among others.scores
uses Dask for scaling and performance. Some metrics work with pandas and we aim to expand this capability.
All of the scores and metrics in this package have undergone a thorough scientific and software review. Every score has a companion Jupyter Notebook tutorial that demonstrates its use in practice.
To find out more about contributing, see ourcontributing guide.
All interactions in discussions, issues, emails and code (e.g., pull requests, code comments) will be managed according to the expectations outlined in the code of conduct and in accordance with all relevant laws and obligations. This project is an inclusive, respectful and open project with high standards for respectful behaviour and language. The code of conduct is the Contributor Covenant, adopted by over 40,000 open source projects. Any concerns will be dealt with fairly and respectfully, with the processes described in the code of conduct.
Theinstallation guide describes four different use cases for installing, using and working with this package.
Most users currently want theall installation option. This includes the mathematical functions (scores, metrics, statistical tests etc.), the tutorial dependencies and development libraries.
# From a local checkout of the Git repositorypip install -e .[all]
To install the mathematical functions ONLY (no tutorial dependencies, no developer libraries), use the defaultminimal installation option.minimal is a stable version with limited dependencies. This can be installed from thePython Package Index (PyPI) or withconda.
# From PyPIpip install scores
# From conda-forgeconda install conda-forge::scores
(Note: at present, only theminimal installation option is available from conda. In time, we intend to add more installation options to conda.)
Here is a short example of the use ofscores
:
>importscores>forecast=scores.sample_data.simple_forecast()>observed=scores.sample_data.simple_observations()>mean_absolute_error=scores.continuous.mae(forecast,observed)>print(mean_absolute_error)<xarray.DataArray ()>array(2.)
Jupyter Notebook tutorials are provided for each metric and statistical test inscores
, as well as for some of the key features ofscores
(e.g.,dimension handling andweighting results).
To watch a PyCon AU 2024 conference presentation aboutscores
click here.
All metrics, statistical techniques and data processing tools inscores
work withxarray.Some metrics work withpandas. As such,scores
works with any data source for which xarray or pandas can be used. See thedata sources page andthis tutorial for more information on finding, downloading and working with different sources of data.
scores
is archived on Zenodo.Click here to see the latest version on Zenodo.
If you usescores
for a published work, we would appreciate you citing ourpaper:
Leeuwenburg, T., Loveday, N., Ebert, E. E., Cook, H., Khanarmuei, M., Taggart, R. J., Ramanathan, N., Carroll, M., Chong, S., Griffiths, A., & Sharples, J. (2024). scores: A Python package for verifying and evaluating models and predictions with xarray.Journal of Open Source Software, 9(99), 6889.https://doi.org/10.21105/joss.06889
BibTeX:
@article{Leeuwenburg_scores_A_Python_2024,author = {Leeuwenburg, Tennessee and Loveday, Nicholas and Ebert, Elizabeth E. and Cook, Harrison and Khanarmuei, Mohammadreza and Taggart, Robert J. and Ramanathan, Nikeeth and Carroll, Maree and Chong, Stephanie and Griffiths, Aidan and Sharples, John},doi = {10.21105/joss.06889},journal = {Journal of Open Source Software},month = jul,number = {99},pages = {6889},title = {{scores: A Python package for verifying and evaluating models and predictions with xarray}},url = {https://joss.theoj.org/papers/10.21105/joss.06889},volume = {9},year = {2024}}
About
scores: Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.