NotificationsYou must be signed in to change notification settings
Fork2
Star5

A Python package for evaluating radiology report generation using multiple standard and medical-specific metrics.

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
rrg_metric		rrg_metric
.gitignore		.gitignore
README.md		README.md
playground.ipynb		playground.ipynb
requirements.txt		requirements.txt
test_reports.csv		test_reports.csv

Repository files navigation

rrg-metric

A Python package for evaluating Radiology Report Generation (RRG) using multiple metrics including:
BLEU, ROUGE, METEOR, BERTScore, F1RadGraph, F1CheXbert, and SembScore.

Features

Multiple evaluation metrics supported:
- BLEU
- ROUGE
- METEOR
- BERTScore
- F1 RadGraph
- F1 CheXbert
- SembScore (CheXbert vector similarity)
- RaTEScore (Entity-aware metric)
Easy-to-use API
Support for batch processing
Detailed per-sample and aggregated results
Visualization tools for correlation analysis

TODO

Add CLI usage
AddGREEN score

Installation

Clone the repository:

git clone https://github.com/jogihood/rrg-metric.gitcd rrg-metric

Create and activate a conda environment using the providedenvironment.yml:

conda env create -f environment.ymlconda activate rrg-metric

Alternatively, you can install the required packages using pip:

pip install -r requirements.txt

Usage

Metric Computation

Here's a simple example of how to use the package:

importrrg_metric# Example usagepredictions= ["Normal chest x-ray","Bilateral pleural effusions noted"]ground_truth= ["Normal chest radiograph","Small bilateral pleural effusions present"]# Compute BLEU scoreresults=rrg_metric.compute(metric="bleu",preds=predictions,gts=ground_truth,per_sample=True,verbose=True)print(f"Total BLEU score:{results['total_results']}")ifresults['per_sample_results']:print(f"Per-sample scores:{results['per_sample_results']}")

Visualization

The package provides visualization tools for correlation analysis between metric scores and radiologist error counts:

For preprocessing tools related to radiology error validation (ReXVal), please check:https://github.com/jogihood/rexval-preprocessor

importrrg_metricimportmatplotlib.pyplotasplt# Example datametric_scores= [0.8,0.7,0.9,0.6,0.85]error_counts= [1,2,0,3,1]# Create correlation plotax,tau,tau_ci=rrg_metric.plot_corr(metric="BLEU",metric_scores=metric_scores,radiologist_error_counts=error_counts,error_type="total",# or "significant"color='blue',# custom colorscatter_alpha=0.6,# scatter point transparencyshow_tau=True# show Kendall's tau in title)print(f"Kendall's tau:{tau:.3f}")print(f"95% CI: [{tau_ci[0]:.3f},{tau_ci[1]:.3f}]")plt.show()

Parameters

`compute(metric, preds, gts, per_sample=False, verbose=False)`

Required Parameters:

metric (str): The evaluation metric to use. Must be one of: ["bleu", "rouge", "meteor", "bertscore", "f1radgraph", "chexbert", "ratescore"]
preds (List[str]): List of model predictions/generated texts
gts (List[str]): List of ground truth/reference texts

Optional Parameters:

per_sample (bool, default=False): If True, returns scores for each individual prediction-reference pair
verbose (bool, default=False): If True, displays progress bars and loading messages
f1radgraph_model_type /f1radgraph_reward_level: Parameters for RadGraph. Recommend default values
cache_dir:cache_dir for huggingface model downloads

`plot_corr(metric, metric_scores, radiologist_error_counts, error_type="total", ax=None, **params)`

Required Parameters:

metric (str): Name of the metric being visualized
metric_scores (List[float]): List of metric scores
radiologist_error_counts (List[float]): List of radiologist error counts

Optional Parameters:

error_type (str, default="total"): Type of error to plot. Must be either "total" or "significant"
ax (matplotlib.axes.Axes, default=None): Matplotlib axes for plotting. If None, creates new figure and axes
Additional parameters for plot customization (see docstring for details)

Requirements

Python 3.10+
Other dependencies listed inrequirements.txt

Contributing

This repository is still under active development. If you encounter any issues or bugs, I would really appreciate if you could submit a Pull Request. Your contributions will help make this package more robust and useful for the community!

About

A Python package for evaluating radiology report generation using multiple standard and medical-specific metrics.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

rrg-metric

Features

TODO

Installation

Usage

Metric Computation

Visualization

Parameters

`compute(metric, preds, gts, per_sample=False, verbose=False)`

Required Parameters:

Optional Parameters:

`plot_corr(metric, metric_scores, radiologist_error_counts, error_type="total", ax=None, **params)`

Required Parameters:

Optional Parameters:

Requirements

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Uh oh!

Languages

Movatterモバイル変換

jogihood/rrg-metric

Folders and files

Latest commit

History

Repository files navigation

rrg-metric

Features

TODO

Installation

Usage

Metric Computation

Visualization

Parameters

compute(metric, preds, gts, per_sample=False, verbose=False)

Required Parameters:

Optional Parameters:

plot_corr(metric, metric_scores, radiologist_error_counts, error_type="total", ax=None, **params)

Required Parameters:

Optional Parameters:

Requirements

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Uh oh!

Languages

`compute(metric, preds, gts, per_sample=False, verbose=False)`

`plot_corr(metric, metric_scores, radiologist_error_counts, error_type="total", ax=None, **params)`

Packages