Movatterモバイル変換

Advance online publication

Advance online publication allows users to view and download accepted articles before the production process has been finalised.

Displaying 1-20 of 20 articles from this issue

Download citation

RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)

Bib TeX (compatible with BibDesk, LaTeX)

Text

Hide all abstracts Show all abstracts

Classifying Japanese fricative /s/ and affricate /ts/ at word-initial position using logarithmic durations
Shigeaki Amano, Kimiko Yamakawa, Mariko Kondo
Article ID: e24.126
Published: 2025
Advance online publication: July 11, 2025
DOIhttps://doi.org/10.1250/ast.e24.126
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
Recent studies have demonstrated that combinations of logarithmic durations can classify duration-sensitive phonemes, such as Japanese singleton and geminate consonants, at various speaking rates. The acoustic features of the Japanese fricative /s/ and affricate /ts/ are related to duration; therefore, a combination of logarithmic durations can likely classify these consonants. To examine this possibility, discriminant models using linear and logarithmic durations, with and without a speaking-rate-related variable, were compared in terms of their performance in classifying /s/ and /ts/ at word-initial position at various speaking rates. The results indicate that the discriminant model using logarithmic duration with a speaking-rate-related variable can classify the consonants better than the other models, indicating the importance of logarithmic duration. The results are considered in the framework of logarithmic information processing in the brain.
View full abstract
Download PDF (442K)
Bolt-Clamped Langevin Transducer (BLT) Type Ultrasonic Emitter with External Resonance Frequency Adjustment Mechanism
Hikaru Miura, Takashi Kasashima
Article ID: e25.45
Published: 2025
Advance online publication: July 11, 2025
DOIhttps://doi.org/10.1250/ast.e25.45
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
In general, the resonance frequencies of ultrasonic emitters that use bolt-clamped Langevin transducers differ slightly. However, when using these emitters in an arrayed device, the resonance frequencies of each emitter must be matched. In this paper, the arbitrary reduction of the resonance frequency by adding a small amount of mass after fabrication is examined. The resonance frequency was lowered by adding more mass, demonstrating the utility of this method.
View full abstract
Download PDF (681K)
Room sound field adjustment through active control of specific acoustic impedance
Renta Kushiro, Akira Omoto
Article ID: e25.28
Published: 2025
Advance online publication: July 09, 2025
DOIhttps://doi.org/10.1250/ast.e25.28
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
A method for adjusting room acoustics through active control using conventional loudspeakers as secondary sources is proposed to suppress inhomogeneity caused by standing waves and room modes. Unlike conventional active control, the control target is the specific acoustic impedance, which is the ratio of sound pressure to particle velocity. The aim is to approach a sound field with only direct waves and no reflections. First, to perform the control, the condition for the maximum absorption coefficient of the virtual boundary surface was determined from the perspective of impedance matching, and an error function was set. Next, a particle velocity measurement method was introduced to obtain the values of specific acoustic impedance, and the weighting of sound pressure and particle velocity was modified. Furthermore, using these tools, experiments were conducted using both a real sound field and simulations to verify the effect of impedance control. Finally, impedance control was reinterpreted from the perspective of microphone directivity, clarifying the control mechanism. The results confirm the method’s effectiveness in low-frequency sound field adjustment, where passive absorbers are insufficient.
View full abstract
Download PDF (1536K)
Aircraft type identification using convolutional neural network applied with Swarm Learning
Junichi Mori, Funa Kodomari, Takenobu Tsuchiya, Makoto Morinaga, Ippei ...
Article ID: e24.127
Published: 2025
Advance online publication: July 05, 2025
DOIhttps://doi.org/10.1250/ast.e24.127
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
At civil airports and military facilities, various aircraft types are operated. For the mitigation and management of aircraft noise, several surveys and studies have been conducted around airports. Generally, to build a valuable database from these aircraft noise observation data, it is necessary to identify the aircraft types. To achieve this, we are developing an AI model that identifies aircraft types by applying machine learning techniques and using measured acoustic aircraft noise data. In this study, to improve the generalization performance of the model by expanding the dataset for training used in machine learning, we applied Swarm Learning technology, which enables machine learning to be executed under distributed conditions without centralizing measured noise data, and verified its accuracy. In the study, the accuracy of convolutional neural networks using general procedures with all datasets was compared with the results analyzed using Swarm Learning, where the dataset was divided into several groups. As a result, although Swarm Learning showed a slight decrease in accuracy compared with convolutional neural networks, its accuracy remained very high at 94%, demonstrating that it is a sufficiently effective method considering the effort required to centralize measured data in one location.
View full abstract
Download PDF (928K)
Investigation of enhancement effect of underwater acoustic streaming by cavity structure located away from vibration source using simulation and experiment considering cavitation
Yimeng Wang, Manabu Aoyagi
Article ID: e25.34
Published: 2025
Advance online publication: July 02, 2025
DOIhttps://doi.org/10.1250/ast.e25.34
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
The effect of the cylinder with a cavity around the target location on underwater acoustic streaming at 28.2 kHz was investigated. Finite element analysis was performed to optimize the dimensions of the transparent acrylic cylinder by the resonance frequency analysis of the whole structure to increase sound pressure in the cavity. Simulation methods ignoring cavitation bubbles and considering bubbles were used to obtain distributions of sound pressure and acoustic streaming at the initial period and stable state separately. For comparison, particle image velocimetry experiments were conducted using the adjusted and original cylinders. The results showed that when the gap was smaller than 25 mm, the cavity had an obvious enhancement effect on streaming velocity, increasing it to twice the maximum value.
View full abstract
Download PDF (2380K)
Estimating articulatory gestures in singing using singing voice perception
Jun Takahashi, Itsuki Shishime, Natsuki Toda, Hironori Takemoto
Article ID: e25.22
Published: 2025
Advance online publication: June 28, 2025
DOIhttps://doi.org/10.1250/ast.e25.22
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
This study investigated whether individuals with musical experience could accurately perceive changes in the singing voice and underlying vocal tract movements of a singer who completed one year of vocal training, using only the singing voice. Vocal tract modifications were analyzed using real-time magnetic resonance imaging and evaluated by professional singers, instrumentalists, and students. Professional singers demonstrated more nuanced evaluations of vocal tract shape. Instrumentalists, while capable ofassessing voice quality, showed less differentiation across vocal tract features. Students used narrower rating ranges and struggled to assess both aspects. These findings indicate that musical background influences evaluative tendencies regarding voice quality and vocal tract configurations.
View full abstract
Download PDF (1915K)
Effects of auditory selective attention in depth direction on target sound detection
Ryo Teraoka, Yuki Tanaka, Wataru Teramoto
Article ID: e25.08
Published: 2025
Advance online publication: June 26, 2025
DOIhttps://doi.org/10.1250/ast.e25.08
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
Auditory spatial attention is crucial for extracting relevant sounds from background noise in noisy environments. Despite its significance in daily life, the effect of auditory spatial attention on the depth direction remains poorly understood. The present study aimed to investigate how auditory selective attention influences the detection of target sounds in the depth direction using sensitivity (d′), false alarm rates, and reaction time (RT) for the target sound. In each trial, either a target or distractor sound was presented from one of the five distances (32, 64, 96, 128, and 160 cm). The listeners were directed to respond as soon as they heard the target sound, while ignoring distractor sounds. The results indicated that directing attention to a specific distance significantly increased the sensitivity (d′) at that distance compared to other distances. Furthermore, the false alarm rate was the lowest at the attended position and progressively increased as sound positions deviated from the focus of attention. However, no significant effect of attention on the RT was observed. These findings suggest that auditory selective attention is not limited to the horizontal direction but can also operate along the depth direction in reverberant environments, expanding our understanding of auditory spatial attention.
View full abstract
Download PDF (698K)
Language-queried target speech extraction using para-linguistic and non-linguistic prompts
Kentaro Seki, Nobutaka Ito, Kazuki Yamauchi, Yuki Okamoto, Kouei Yamao ...
Article ID: e25.27
Published: 2025
Advance online publication: June 24, 2025
DOIhttps://doi.org/10.1250/ast.e25.27
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
This paper proposes a new language-queried target speech extraction (TSE) task called para-linguistic and non-linguistic text prompts-based TSE (PNTP-TSE), which uses text prompts that describe para-linguistic and non-linguistic information. This framework addresses the limitations of conventional TSE methods, such as privacy concerns in voiceprint-based systems and dependency on dedicated microphone arrays or video cameras. To support this framework, we construct and provide a new dataset, PromptTSE, which is specifically designed to facilitate various types of language-queried TSE, including PNTP-TSE. We develop a baseline method for PNTP-TSE and conduct experimental evaluations. The experimental results show that PNTP-TSE overcomes the performance degradation issue of voiceprint-based systems caused by the gap in speaking style between enrollment speech and target speech.
View full abstract
Download PDF (961K)
Gamma-von-Mises restricted Boltzmann machine and its application to audio modeling
Toru Nakashika, Kohei Yatabe
Article ID: e24.95
Published: 2025
Advance online publication: June 20, 2025
DOIhttps://doi.org/10.1250/ast.e24.95
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
To bypass phase estimation, complex-valued generative models have been developed to directly handle spectra of audio signals. The complex-valued restricted Boltzmann machine (CRBM) is one of such promising models proposed recently. However, similar to the other models, CRBM cannot treat the logarithmic nature of auditory perception important to realize a better model for audio application. This is because CRBM handles complex values in the rectangular coordinate (i.e., real and imaginary parts), which hinders applying the logarithmic transform to magnitude. To overcome this drawback of CRBM, we propose the gamma-von-Mises (GVM) RBM that models complex-valued spectra in the polar coordinate (i.e., magnitude and phase). GVM RBM handles magnitude by the gamma distribution using the logarithmic function and phase by the von Mises distribution. Our objective and subjective experiments showed that GVM RBM outperformed the other models including CRBM and complex-valued variational autoencoder (CVAE).
View full abstract
Download PDF (666K)
Efficient Tikhonov Regularization Parameter Selection for Multi-Zone Sound Field Reproduction
Tong Zhou, Kazuya Yasueda, Akitoshi Kataoka
Article ID: e25.12
Published: 2025
Advance online publication: June 17, 2025
DOIhttps://doi.org/10.1250/ast.e25.12
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
This study introduces two efficient methods for selecting Tikhonov regularization parameters in acoustical inverse problems. The first approach employs a binary search (BS) algorithm to identify the regularization parameter that satisfies a predefined power constraint. Compared to traditional iterative searches overN candidate values, BS reduces the number of iterations fromN to log₂N. The second method, Adaptive Normalized Tikhonov (ANT), combines the conventional L-curve and Normalized Tikhonov techniques. By fitting the ratio of the inverse system matrix’s largest eigenvalue to an exponential decay function during preprocessing at a few sample frequencies, ANT determines the regularization parameter with a single calculation for other frequencies. Both methods were experimentally validated in a multi-zone sound field reproduction scenario using a measured reverberant room impulse responses database. Results demonstrated that BS achieves a balance between reproduction accuracy and robustness while significantly improving efficiency. The ANT method provided the most stable system without iterative calculations. These improvements indicate that both approaches offer compelling solutions for real-time applications.
View full abstract
Download PDF (1358K)
Favorable clarity, sound strength, and spaciousness as well as overall acoustical quality of concert halls measured in a 3D synthesized sound field
Takayuki Hidaka, Noriko Nishihara, Kazunori Suzuki, Takehiko Nakagawa
Article ID: e24.107
Published: 2025
Advance online publication: June 14, 2025
DOIhttps://doi.org/10.1250/ast.e24.107
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
This paper is a companion to “Reexamination of the favorable reverberation time of concert halls measured in a 3D synthesized sound field,” A.S.T., 45, 204–215 [2024]. Anechoic music sources were reproduced by a virtual orchestra set on concert hall stages and were recorded at audience seats. Four music excerpts were chosen. By an Ambisonics playback in the laboratory, a series of psychological experiments were conducted. Twenty-one music experts judged the clarity, sound strength, spaciousness, and overall acoustical quality of the presented sound. Adding the results from the previous paper, a regression analysis on the relationships between contributing subjective attributes and objective parameters found that EDT_M and C_80,3 contributed to clarity, G_M (or G_L) and RT_M to sound strength, and BQI and G_L to spaciousness. Here, subscripts “L,” “M,” and “3” denote octave band averages at 125 and 250 Hz, 500 and 1000 Hz, and at 500, 1000, and 2000 Hz, respectively, and “E” designates early sound, i.e., less than 80 msec. Favorable ranges of physical parameters for each subjective attribute were determined. Reverberance, spaciousness, and clarity were identified as significant subjective attributes contributing to overall acoustic quality, with the corresponding physical metrics being RT_M, G_L, and BQI.
View full abstract
Download PDF (831K)
All-pass filter simulating cochlear delay characteristics and its musical applications
Hiroki Iida, Kohei Yatabe
Article ID: e25.07
Published: 2025
Advance online publication: June 14, 2025
DOIhttps://doi.org/10.1250/ast.e25.07
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
This study explores the design and implementation of an IIR all-pass filter that simulates cochlear delay characteristics. This paper contains three topics: filter design, implementation, and musical evaluation. First, we designed an IIR all-pass filter to simulate cochlear delay characteristics by optimizing its zeros and poles to achieve the desired group delay. Additionally, the filter was implemented as a VST plug-in for real-time applications and is publicity available. Next, subjective evaluations were conducted to assess the musical impact of this filter. We applied the filter to snare drum, bass drum, bass guitar, and electric guitar to explore its musical applicability. Participants compared the filtered and original sounds. Percussion instruments received mixed feedback, with the filter sometimes described as “artificial.” In contrast, string instruments like bass guitar and electric guitar were rated as “impressive” and “attractive,” suggesting greater relevance for these sounds. Finally, we investigated the impact of the filter on guitar performance. Performance deviations from a metronome were measured under 10 different conditions by varying the number of filters and delay times. The results indicated that excessive delay introduced by the filter could disrupt synchronization during performances.
View full abstract
Download PDF (1737K)
Reconstruction of the reverberation theory in a diffuse sound field by using reflection orders
Toshiki Hanyu
Article ID: e24.98
Published: 2025
Advance online publication: June 13, 2025
DOIhttps://doi.org/10.1250/ast.e24.98
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
Room acoustics is mainly based on the reverberation theories of Saine and Eyring. In Sabine's theory however, the reverberation time does not reach zero, even if the condition of absolute absorption is fulfilled. Eyring revised reverberation theory to resolve this contradiction. However, Eyring's theory has an inconsistency between the formulations of the steady-state and decay processes. Therefore, the author revised Sabine's theory, taking a different approach from that of Eyring. This revised theory was constructed by introducing the concept of "reverberation of a direct sound.” In this study, a new mathematical model of reverberation using reflection orders is proposed. This is a reconstruction of the author’s revised theory. The new model includes the temporal energy distribution in each reflection order and uses the concept of "reverberation of a direct sound” for the entire reverberation process. It shows that the concept is also essential for the reflected sounds. In addition, the reverberation decay agrees with the revised theory previously proposed by the author. Overall, the new model showed good agreement with the simulation results.
View full abstract
Download PDF (4244K)
Acoustic characteristics of nonwood baseball bats following revised Japanese Product Standards (Safe Goods (SG) Standards)
Mari Ueda, Kohei Naito, Hiroshi Tanaka, Takahiro Miura
Article ID: e25.13
Published: 2025
Advance online publication: June 07, 2025
DOIhttps://doi.org/10.1250/ast.e25.13
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
In this study, we measured the acoustic characteristics of nonwood baseball bats modified according to the Revised Japanese Product Standards (hereafter, Safe Goods (SG) Standards) enforced in 2024. New standard bats showed peak frequencies approximately 500 Hz higher than previous models. During Spring Koshien 2024, players reported differences in bat sound and ball travel distance, with the onomatopoeic description changing from “kakkin” to “kyu-in” following the revision, according to various media. The results of acoustic measurements conducted in compliance with the SG Standards confirm the observations of the players, indicating a tonal shift in the bats after the SG Standards were revised.
View full abstract
Download PDF (464K)
Japanese, Māori, NZ English: Prosodic comparison using delexicalized speech
Hansjörg Mixdorff, Takayuki Arai
Article ID: e25.39
Published: 2025
Advance online publication: May 29, 2025
DOIhttps://doi.org/10.1250/ast.e25.39
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
In this study we compare the prosody of Japanese with that of Maori and New Zealand English, the contact language, as impressionistic analysis of Māori and Japanese indicates prosodic similarities, despite many other differences. This may be due to the fact that proto-Japanese just like Māori stems from the Pacific region. However, Māori under the influence of English changed substantially. As an indirect way of comparing the prosody we devised a perception experiment using delexicalized speech employing Japanese listeners. Most listeners were able to differentiate between Japanese and NZ English, but did not place Maori closer to Japanese than English.
View full abstract
Download PDF (824K)
Orthonormality of Spherical Basis Functions for Interior Problems of the Helmholtz Equation
Takahiro Iwami, Naohisa Inoue, Akira Omoto
Article ID: e25.10
Published: 2025
Advance online publication: May 23, 2025
DOIhttps://doi.org/10.1250/ast.e25.10
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
We construct an orthonormal basis for interior problems of the Helmholtz equation, based on the properties of a reproducing kernel Hilbert space defined by the spectral characteristics of interior sound fields. The constructed basis coincides with what is commonly known as spherical basis functions. Furthermore, leveraging the structure of this space, we derive the addition theorem in a compact form. This facilitates the conversion between reproducing kernel representations and spherical harmonic expansions and provides insights into estimating spherical harmonic coefficients from sampled measurements.
View full abstract
Download PDF (230K)
The J-AESOP Corpus: Design, Application, and Future Directions of a Japanese-English Bilingual Speech Corpus
Kakeru Yazawa, Takayuki Konishi, Mariko Kondo
Article ID: e25.06
Published: 2025
Advance online publication: May 13, 2025
DOIhttps://doi.org/10.1250/ast.e25.06
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
This paper presents the current design of the J-AESOP corpus, a learner speech corpus featuring Japanese speakers’ English. It has been developed as part of the Asian English Speech cOrpus Project (AESOP), an international and multi-institutional project to construct a collection of Asian English speech databases. While the recording procedures and speech materials are standardized in the AESOP project, the J-AESOP corpus incorporates additional features not found in other AESOP corpora, such as data from native English speakers, Japanese reading materials (Japanese version of “The North Wind and the Sun”), manual correction of automatic forced alignment, and perceptual ratings of accentedness/nativelikeness and comprehensibility. These unique features allow an in-depth investigation of Japanese-English bilingual speech, as exemplified by our exploratory investigation of the production of voiceless coronal fricatives in Japanese (i.e., [s, ɕ]) and English (i.e., /s, ʃ, θ/) reported in this paper. The paper also discusses directions for further development of the corpus, including improvements in data availability.
View full abstract
Download PDF (1992K)
Low-frequency one-third octave band near-perfect acoustic metasurface absorber with imperfect component resonator design
Yuki Kimura, Takeshi Okuzono
Article ID: e24.110
Published: 2025
Advance online publication: May 03, 2025
DOIhttps://doi.org/10.1250/ast.e24.110
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
In this paper, we propose an easily designable low-frequency acoustic metasurface (AMS) absorber composed of multiple imperfect microslit resonators designed to achieve near-perfect sound absorption within a one-third-octave-band. Some specific designs of one-third-octave-band near-perfect absorbers at 125, 250, and 500 Hz are presented. We have developed a robust and efficient user-friendly absorber design method combining the transfer matrix method and a unique geometry design rule of component resonators. To develop this design method, we conducted extensive numerical and experiment-based examinations by thermoviscous acoustic simulation and impedance tube measurements, particularly addressing the number of component resonators and their peak sound absorption coefficient. The numerical and experimental results demonstrated the importance of creating a coupled resonator with the appropriate number of imperfect component resonators, each with a lower sound absorptivity peak. These features are crucially important for achieving thin sound absorbers without compromising the desired sound absorption properties. Numerical sound absorptivity evaluation revealed that using more component resonators to create a coupled resonator enables individual component resonators to operate as resonators with lower sound absorptivity peaks. This simple operation achieves robust sound absorption characteristics with less degradation.
View full abstract
Download PDF (3030K)
Improvement of throat speech quality for speech recognition by using Bayes' theorem
Hisako Orimoto, Akira Ikuta
Article ID: e25.11
Published: 2025
Advance online publication: April 17, 2025
DOIhttps://doi.org/10.1250/ast.e25.11
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
In general, a speech signal can be measured by a microphone, such as a throat microphone. However, the speech signal measured by a microphone often contains surrounding noise. On the other hand, although a throat microphone is effective for surrounding noise, the speech signal it measures includes body-conducted internal noise. In this study, we propose an improvement method for the sound quality of the speech signal measured by a throat microphone to achieve speech recognition well. The relationship between the original speech signal and the speech measured by the throat microphone is not clear. Therefore, we consider the relationship as a multiplicative and additive model of the original speech signal and noise components with unknown parameters. An algorithm is proposed to simultaneously estimate the original speech signal and the unknown parameters using Bayes’ theorem based on the speech signal measured by the throat microphone. Finally, a speech recognition experiment is conducted to confirm the effectiveness of the proposed algorithm.
View full abstract
Download PDF (667K)
Single-channel blind dereverberation based on sparse matrix recovery with reweighting and accelerated alternating direction method of multipliers
Fumiki Yohena, Kohei Yatabe
Article ID: e24.119
Published: 2025
Advance online publication: April 15, 2025
DOIhttps://doi.org/10.1250/ast.e24.119
JOURNALOPEN ACCESSADVANCE PUBLICATION
Show abstractHide abstract
Single-channel blind dereverberation aims to remove reverberation from a single-channel reverberant signal without using any prior knowledge. In acoustics, weighted prediction error (WPE), a method mainly used for a multi-channel signal, is often applied for this task. However, it is difficult to achieve well-performed dereverberation for a single-channel signal. In this paper, for better single-channel dereverberation, we propose to simultaneously estimate the source signal and the room impulse response (RIR) instead of only predicting reverberation. By modeling convolution using matrix lifting in the time-frequency domain, we formulate the dereverberation problem as a non-convex optimization problem of recovering a sparse rank-1 matrix. In sparse regularization, we introduce reweighting, enabling the improvement of sparse matrix recovery. The alternating direction method of multipliers (ADMM) with acceleration is applied to approximately solve the optimization problem, resulting in closed form updates. In our experiments, we confirmed that the proposed method outperforms existing methods in several reverberant conditions and is capable of removing both early reflection and late reverberation. MATLAB code of the proposed method is available online (https://doi.org/10.24433/CO.3541617.v1).
View full abstract
Download PDF (1010K)

Movatterモバイル変換

Register with J-STAGE for free!