Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A Python package for evaluating radiology report generation using multiple standard and medical-specific metrics.

NotificationsYou must be signed in to change notification settings

jogihood/rrg-metric

Repository files navigation

A Python package for evaluating Radiology Report Generation (RRG) using multiple metrics including:
BLEU, ROUGE, METEOR, BERTScore, F1RadGraph, and F1CheXbert.

Features

  • Multiple evaluation metrics supported:
    • BLEU (Bilingual Evaluation Understudy)
    • ROUGE (Recall-Oriented Understudy for Gisting Evaluation)
    • METEOR (Metric for Evaluation of Translation with Explicit ORdering)
    • BERTScore
    • F1 RadGraph (Specialized for radiology report graphs)
    • F1 CheXbert (Specialized for chest X-ray reports)
  • Easy-to-use API
  • Support for batch processing
  • Detailed per-sample and aggregated results
  • Visualization tools for correlation analysis

TODO

  • Add CLI usage
  • Add SembScore (CheXbert Vector Similarity)

Installation

  1. Clone the repository:
git clone https://github.com/jogihood/rrg-metric.gitcd rrg-metric
  1. Create and activate a conda environment using the providedenvironment.yml:
conda env create -f environment.ymlconda activate rrg-metric

Alternatively, you can install the required packages using pip:

pip install -r requirements.txt

Usage

Metric Computation

Here's a simple example of how to use the package:

importrrg_metric# Example usagepredictions= ["Normal chest x-ray","Bilateral pleural effusions noted"]ground_truth= ["Normal chest radiograph","Small bilateral pleural effusions present"]# Compute BLEU scoreresults=rrg_metric.compute(metric="bleu",preds=predictions,gts=ground_truth,per_sample=True,verbose=True)print(f"Total BLEU score:{results['total_results']}")ifresults['per_sample_results']:print(f"Per-sample scores:{results['per_sample_results']}")

Visualization (Beta)

The package provides visualization tools for correlation analysis between metric scores and radiologist error counts:

For preprocessing tools related to radiology error validation (ReXVal), please check:https://github.com/jogihood/rexval-preprocessor

importrrg_metricimportmatplotlib.pyplotasplt# Example datametric_scores= [0.8,0.7,0.9,0.6,0.85]error_counts= [1,2,0,3,1]# Create correlation plotax,tau,tau_ci=rrg_metric.plot_corr(metric="BLEU",metric_scores=metric_scores,radiologist_error_counts=error_counts,error_type="total",# or "significant"color='blue',# custom colorscatter_alpha=0.6,# scatter point transparencyshow_tau=True# show Kendall's tau in title)print(f"Kendall's tau:{tau:.3f}")print(f"95% CI: [{tau_ci[0]:.3f},{tau_ci[1]:.3f}]")plt.show()

Parameters

compute(metric, preds, gts, per_sample=False, verbose=False)

Required Parameters:

  • metric (str): The evaluation metric to use. Must be one of: ["bleu", "rouge", "meteor", "bertscore", "f1radgraph", "f1chexbert"]
  • preds (List[str]): List of model predictions/generated texts
  • gts (List[str]): List of ground truth/reference texts

Optional Parameters:

  • per_sample (bool, default=False): If True, returns scores for each individual prediction-reference pair
  • verbose (bool, default=False): If True, displays progress bars and loading messages
  • f1radgraph_model_type /f1radgraph_reward_level: Parameters for RadGraph. Recommend default values

plot_corr(metric, metric_scores, radiologist_error_counts, error_type="total", ax=None, **params)

Required Parameters:

  • metric (str): Name of the metric being visualized
  • metric_scores (List[float]): List of metric scores
  • radiologist_error_counts (List[float]): List of radiologist error counts

Optional Parameters:

  • error_type (str, default="total"): Type of error to plot. Must be either "total" or "significant"
  • ax (matplotlib.axes.Axes, default=None): Matplotlib axes for plotting. If None, creates new figure and axes
  • Additional parameters for plot customization (see docstring for details)

Available Metrics

The package supports the following metrics:

  1. bleu: Basic BLEU score computation
  2. rouge: ROUGE-L score for evaluating summary quality
  3. meteor: METEOR score for machine translation evaluation
  4. bertscore: Contextual embedding-based evaluation using BERT
  5. f1radgraph: Specialized metric for evaluating radiology report graphs
  6. f1chexbert: Specialized metric for chest X-ray report evaluation

You can check available metrics using:

print(rrg_metric.AVAILABLE_METRICS)

Requirements

  • Python 3.10+
  • PyTorch
  • Transformers
  • Evaluate
  • RadGraph
  • F1CheXbert
  • Matplotlib
  • Seaborn
  • Other dependencies listed inrequirements.txt

Contributing

This repository is still under active development. If you encounter any issues or bugs, I would really appreciate if you could submit a Pull Request. Your contributions will help make this package more robust and useful for the community!

About

A Python package for evaluating radiology report generation using multiple standard and medical-specific metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp