- Notifications
You must be signed in to change notification settings - Fork2
jogihood/rrg-metric
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A Python package for evaluating Radiology Report Generation (RRG) using multiple metrics including:
BLEU, ROUGE, METEOR, BERTScore, F1RadGraph, F1CheXbert, and SembScore.
- Multiple evaluation metrics supported:
- BLEU
- ROUGE
- METEOR
- BERTScore
- F1 RadGraph
- F1 CheXbert
- SembScore (CheXbert vector similarity)
- RaTEScore (Entity-aware metric)
- Easy-to-use API
- Support for batch processing
- Detailed per-sample and aggregated results
- Visualization tools for correlation analysis
- Add CLI usage
- AddGREEN score
- Clone the repository:
git clone https://github.com/jogihood/rrg-metric.gitcd rrg-metric
- Create and activate a conda environment using the provided
environment.yml
:
conda env create -f environment.ymlconda activate rrg-metric
Alternatively, you can install the required packages using pip:
pip install -r requirements.txt
Here's a simple example of how to use the package:
importrrg_metric# Example usagepredictions= ["Normal chest x-ray","Bilateral pleural effusions noted"]ground_truth= ["Normal chest radiograph","Small bilateral pleural effusions present"]# Compute BLEU scoreresults=rrg_metric.compute(metric="bleu",preds=predictions,gts=ground_truth,per_sample=True,verbose=True)print(f"Total BLEU score:{results['total_results']}")ifresults['per_sample_results']:print(f"Per-sample scores:{results['per_sample_results']}")
The package provides visualization tools for correlation analysis between metric scores and radiologist error counts:
For preprocessing tools related to radiology error validation (ReXVal), please check:https://github.com/jogihood/rexval-preprocessor
importrrg_metricimportmatplotlib.pyplotasplt# Example datametric_scores= [0.8,0.7,0.9,0.6,0.85]error_counts= [1,2,0,3,1]# Create correlation plotax,tau,tau_ci=rrg_metric.plot_corr(metric="BLEU",metric_scores=metric_scores,radiologist_error_counts=error_counts,error_type="total",# or "significant"color='blue',# custom colorscatter_alpha=0.6,# scatter point transparencyshow_tau=True# show Kendall's tau in title)print(f"Kendall's tau:{tau:.3f}")print(f"95% CI: [{tau_ci[0]:.3f},{tau_ci[1]:.3f}]")plt.show()
metric
(str): The evaluation metric to use. Must be one of: ["bleu", "rouge", "meteor", "bertscore", "f1radgraph", "chexbert", "ratescore"]preds
(List[str]): List of model predictions/generated textsgts
(List[str]): List of ground truth/reference texts
per_sample
(bool, default=False): If True, returns scores for each individual prediction-reference pairverbose
(bool, default=False): If True, displays progress bars and loading messagesf1radgraph_model_type
/f1radgraph_reward_level
: Parameters for RadGraph. Recommend default valuescache_dir
:cache_dir
for huggingface model downloads
metric
(str): Name of the metric being visualizedmetric_scores
(List[float]): List of metric scoresradiologist_error_counts
(List[float]): List of radiologist error counts
error_type
(str, default="total"): Type of error to plot. Must be either "total" or "significant"ax
(matplotlib.axes.Axes, default=None): Matplotlib axes for plotting. If None, creates new figure and axes- Additional parameters for plot customization (see docstring for details)
- Python 3.10+
- Other dependencies listed in
requirements.txt
This repository is still under active development. If you encounter any issues or bugs, I would really appreciate if you could submit a Pull Request. Your contributions will help make this package more robust and useful for the community!
About
A Python package for evaluating radiology report generation using multiple standard and medical-specific metrics.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.