- Notifications
You must be signed in to change notification settings - Fork1
jogihood/rrg-metric
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A Python package for evaluating Radiology Report Generation (RRG) using multiple metrics including:
BLEU, ROUGE, METEOR, BERTScore, F1RadGraph, and F1CheXbert.
- Multiple evaluation metrics supported:
- BLEU (Bilingual Evaluation Understudy)
- ROUGE (Recall-Oriented Understudy for Gisting Evaluation)
- METEOR (Metric for Evaluation of Translation with Explicit ORdering)
- BERTScore
- F1 RadGraph (Specialized for radiology report graphs)
- F1 CheXbert (Specialized for chest X-ray reports)
- Easy-to-use API
- Support for batch processing
- Detailed per-sample and aggregated results
- Visualization tools for correlation analysis
- Add CLI usage
- Add SembScore (CheXbert Vector Similarity)
- Clone the repository:
git clone https://github.com/jogihood/rrg-metric.gitcd rrg-metric
- Create and activate a conda environment using the provided
environment.yml
:
conda env create -f environment.ymlconda activate rrg-metric
Alternatively, you can install the required packages using pip:
pip install -r requirements.txt
Here's a simple example of how to use the package:
importrrg_metric# Example usagepredictions= ["Normal chest x-ray","Bilateral pleural effusions noted"]ground_truth= ["Normal chest radiograph","Small bilateral pleural effusions present"]# Compute BLEU scoreresults=rrg_metric.compute(metric="bleu",preds=predictions,gts=ground_truth,per_sample=True,verbose=True)print(f"Total BLEU score:{results['total_results']}")ifresults['per_sample_results']:print(f"Per-sample scores:{results['per_sample_results']}")
The package provides visualization tools for correlation analysis between metric scores and radiologist error counts:
For preprocessing tools related to radiology error validation (ReXVal), please check:https://github.com/jogihood/rexval-preprocessor
importrrg_metricimportmatplotlib.pyplotasplt# Example datametric_scores= [0.8,0.7,0.9,0.6,0.85]error_counts= [1,2,0,3,1]# Create correlation plotax,tau,tau_ci=rrg_metric.plot_corr(metric="BLEU",metric_scores=metric_scores,radiologist_error_counts=error_counts,error_type="total",# or "significant"color='blue',# custom colorscatter_alpha=0.6,# scatter point transparencyshow_tau=True# show Kendall's tau in title)print(f"Kendall's tau:{tau:.3f}")print(f"95% CI: [{tau_ci[0]:.3f},{tau_ci[1]:.3f}]")plt.show()
metric
(str): The evaluation metric to use. Must be one of: ["bleu", "rouge", "meteor", "bertscore", "f1radgraph", "f1chexbert"]preds
(List[str]): List of model predictions/generated textsgts
(List[str]): List of ground truth/reference texts
per_sample
(bool, default=False): If True, returns scores for each individual prediction-reference pairverbose
(bool, default=False): If True, displays progress bars and loading messagesf1radgraph_model_type
/f1radgraph_reward_level
: Parameters for RadGraph. Recommend default values
metric
(str): Name of the metric being visualizedmetric_scores
(List[float]): List of metric scoresradiologist_error_counts
(List[float]): List of radiologist error counts
error_type
(str, default="total"): Type of error to plot. Must be either "total" or "significant"ax
(matplotlib.axes.Axes, default=None): Matplotlib axes for plotting. If None, creates new figure and axes- Additional parameters for plot customization (see docstring for details)
The package supports the following metrics:
bleu
: Basic BLEU score computationrouge
: ROUGE-L score for evaluating summary qualitymeteor
: METEOR score for machine translation evaluationbertscore
: Contextual embedding-based evaluation using BERTf1radgraph
: Specialized metric for evaluating radiology report graphsf1chexbert
: Specialized metric for chest X-ray report evaluation
You can check available metrics using:
print(rrg_metric.AVAILABLE_METRICS)
- Python 3.10+
- PyTorch
- Transformers
- Evaluate
- RadGraph
- F1CheXbert
- Matplotlib
- Seaborn
- Other dependencies listed in
requirements.txt
This repository is still under active development. If you encounter any issues or bugs, I would really appreciate if you could submit a Pull Request. Your contributions will help make this package more robust and useful for the community!