Movatterモバイル変換


[0]ホーム

URL:


ORGANIZATIONAL
Sign in with credentials provided by your organization.
INSTITUTIONAL
Select your institution to access the SPIE Digital Library.
SELECT YOUR INSTITUTION
PERSONAL
Sign in with your personal SPIE Account.
PERSONAL SIGN IN
No SPIE Account?Create one
;
SPIE digital library
CONFERENCE PROCEEDINGS
Advanced Search
Home> Proceedings> Volume 12464>Article
Presentation + Paper
3 April 2023Synthesizing audio from tongue motion during speech using tagged MRI via transformer
Author Affiliations +
Xiaofeng Liuhttps://orcid.org/0000-0002-4514-2016,1,2 Fangxu Xinghttps://orcid.org/0000-0002-0517-0952,1,2 Jerry Prince,3 Maureen Stone,4 Georges El Fakhri,1,2 Jonghye Woohttps://orcid.org/0000-0002-5621-92181,2

1Massachusetts General Hospital (United States)
2Harvard Univ. (United States)
3Johns Hopkins Univ. (United States)
4Univ. of Maryland School of Dentistry (United States)
ORGANIZATIONAL
Sign in with credentials provided by your organization.
INSTITUTIONAL
Select your institution to access the SPIE Digital Library.
PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
No SPIE Account?Create one
;
PURCHASE THIS CONTENT
SUBSCRIBE TO DIGITAL LIBRARY
50 downloads per 1-year subscription
Members: $195
Non-members: $335ADD TO CART
25 downloads per 1-year subscription
Members: $145
Non-members: $250ADD TO CART
PURCHASE SINGLE ARTICLE
Includes PDF, HTML & Video, when available
Members:
Non-members:ADD TO CART
This will count as one of your downloads.
You will have access to both the presentation and article (if available).
This content is available for download via your institution's subscription. To access this item, please sign in to your personal account.
Forgot your username?
No SPIE account?Create an account
My Library
You currently do not have any folders to save your paper to! Create a new folder below.
Abstract
Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.
Conference Presentation
This content is available for download via your institution's subscription. To access this item, please sign in to your personal account.
Forgot your username?
No SPIE account?Create an account
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xiaofeng Liu,Fangxu Xing,Jerry Prince,Maureen Stone,Georges El Fakhri, andJonghye Woo"Synthesizing audio from tongue motion during speech using tagged MRI via transformer", Proc. SPIE 12464, Medical Imaging 2023: Image Processing, 1246410 (3 April 2023);https://doi.org/10.1117/12.2653345
ACCESS THE FULL ARTICLE
ORGANIZATIONAL
Sign in with credentials provided by your organization.
INSTITUTIONAL
Select your institution to access the SPIE Digital Library.
PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
No SPIE Account?Create one
;
PURCHASE THIS CONTENT
SUBSCRIBE TO DIGITAL LIBRARY
50 downloads per 1-year subscription
Members: $195
Non-members: $335ADD TO CART
25 downloads per 1-year subscription
Members: $145
Non-members: $250ADD TO CART
PURCHASE SINGLE ARTICLE
Includes PDF, HTML & Video, when available
Members:$17.00
Non-members:$21.00ADD TO CART
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Modeling

3D modeling

Magnetic resonance imaging

Tongue

Education and training

Deformation

Gallium nitride

Windows

Feature extraction

Erratum Email Alerts notify you when an article has been updated or the paper is withdrawn.
VisitMy Account to manage your email alerts.
The alert successfully saved.
VisitMy Account to manage your email alerts.
The alert did not successfully save. Please try again later.
Xiaofeng Liu, Fangxu Xing, Jerry Prince, Maureen Stone, Georges El Fakhri, Jonghye Woo, "Synthesizing audio from tongue motion during speech using tagged MRI via transformer," Proc. SPIE 12464, Medical Imaging 2023: Image Processing, 1246410 (3 April 2023); https://doi.org/10.1117/12.2653345
Include:
Format:
Back to Top

Keywords/Phrases

Keywords
in
Remove
in
Remove
in
Remove
+ Add another field

Search In:























Publication Years

Range
Single Year

Clear Form

[8]ページ先頭

©2009-2025 Movatter.jp