Movatterモバイル変換

3 April 2023Synthesizing audio from tongue motion during speech using tagged MRI via transformer

Xiaofeng Liu, Fangxu Xing, Jerry Prince,Maureen Stone, Georges El Fakhri, Jonghye Woo

Xiaofeng Liuhttps://orcid.org/0000-0002-4514-2016,^1,2 Fangxu Xinghttps://orcid.org/0000-0002-0517-0952,^1,2 Jerry Prince,³ Maureen Stone,⁴ Georges El Fakhri,^1,2 Jonghye Woohttps://orcid.org/0000-0002-5621-9218^1,2

¹Massachusetts General Hospital (United States)
²Harvard Univ. (United States)
³Johns Hopkins Univ. (United States)
⁴Univ. of Maryland School of Dentistry (United States)

Proceedings Volume 12464, Medical Imaging 2023: Image Processing; 1246410 (2023)https://doi.org/10.1117/12.2653345
Event:SPIE Medical Imaging, 2023, San Diego, California, United States

ARTICLE
FIGURES & TABLES
REFERENCES
CITED BY
DOWNLOAD PAPER SAVE TO MY LIBRARY

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account?Create one

;

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335ADD TO CART

25 downloads per 1-year subscription

Members: $145

Non-members: $250ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members:

Non-members:ADD TO CART

This will count as one of your downloads.

You will have access to both the presentation and article (if available).

DOWNLOAD NOW

Abstract

Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.

Conference Presentation

CitationDownload Citation

Xiaofeng Liu,Fangxu Xing,Jerry Prince,Maureen Stone,Georges El Fakhri, andJonghye Woo"Synthesizing audio from tongue motion during speech using tagged MRI via transformer", Proc. SPIE 12464, Medical Imaging 2023: Image Processing, 1246410 (3 April 2023);https://doi.org/10.1117/12.2653345

ACCESS THE FULL ARTICLE