Movatterモバイル変換

Regular Section

Improving Sliced Wasserstein Distance with Geometric Median for Knowledge Distillation

Hongyun LU, Mengmeng ZHANG, Hongyuan JING, Zhi LIU

Author information

Hongyun LU
North China University of Technology
Mengmeng ZHANG
Beijing Union University
Hongyuan JING
Beijing Union University
Zhi LIU
North China University of Technology

Corresponding author

Keywords:sliced Wasserstein,geometric median,knowledge distillation

JOURNALFREE ACCESS

2024 Volume E107.DIssue 7Pages 890-893

DOIhttps://doi.org/10.1587/transinf.2023EDL8083

Details

Published: July 01, 2024Manuscript Received: November 24, 2023Released on J-STAGE: July 01, 2024Accepted: -Advance online publication: -Manuscript Revised: -

Download PDF(5146K)

Download citationRIS

(compatible with EndNote, Reference Manager, ProCite, RefWorks)

BIB TEX

(compatible with BibDesk, LaTeX)

Text

How to download citation

Article overview

Abstract

Currently, the most advanced knowledge distillation models use a metric learning approach based on probability distributions. However, the correlation between supervised probability distributions is typically geometric and implicit, causing inefficiency and an inability to capture structural feature representations among different tasks. To overcome this problem, we propose a knowledge distillation loss using the robust sliced Wasserstein distance with geometric median (GMSW) to estimate the differences between the teacher and student representations. Due to the intuitive geometric properties of GMSW, the student model can effectively learn to align its produced hidden states from the teacher model, thereby establishing a robust correlation among implicit features. In experiment, our method outperforms state-of-the-art models in both high-resource and low-resource settings.

References (8)

Movatterモバイル変換

Corresponding author

Register with J-STAGE for free!