- Notifications
You must be signed in to change notification settings - Fork4
A Python Data Valuation Package
License
NotificationsYou must be signed in to change notification settings
uvanlp/valda
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Valda is a Python package for data valuation in machine learning. If you are interested in
- analyzing the contribution of individual training examples to the final classification performance, or
- identifying some noisy examples in the training set,
you may be interested in the functions provided by this package.
The current version supports five different data valuation methods. It supports all the classifiers from Sklearn for valuation, and also user-defined classifier using PyTorch.
- Leave-one-out (LOO),
- Data Shapley with the TMC algorithm (TMC-Shapley) fromGhorbani and Zou (2019),
- Beta Shapley fromKwon and Zou (2022)
- Class-wise Shapley (CS-Shapley) fromSchoch et al. (2022)
- Influence Function (IF) fromKoh and Liang (2017)
- IF only works with the classifiers built with PyTorch, because it requires gradient computation.
- The current version only support the first-order gradient computation, and we will add the second-order computation soon.
Please checkout a simple tutorial onGoogle Colab, for how to use this package.
About
A Python Data Valuation Package
Topics
Resources
License
Stars
Watchers
Forks
Packages0
No packages published