RU2107950C1

Movatterモバイル変換

Info

Publication number: RU2107950C1
Application number: RU96116251A
Authority: RU
Inventors: Николай Владимирович Байчаров; Игорь Петрович Карлин; Надежда Борисовна Кураченкова; Андрей Николаевич Линьков; Николай Федорович Попов; Юрий Иванович Савельев; Игорь Николаевич Тимофеев; Анатолий Владимирович Фесенко
Original assignee: Николай Владимирович Байчаров; Игорь Петрович Карлин; Надежда Борисовна Кураченкова; Андрей Николаевич Линьков; Николай Федорович Попов; Юрий Иванович Савельев; Игорь Николаевич Тимофеев; Анатолий Владимирович Фесенко
Priority date: 1996-08-08
Filing date: 1996-08-08
Publication date: 1998-03-27

Abstract

FIELD: crime investigation. SUBSTANCE: method involves converting signals of unknown person and person under investigation into digital form, storing digital images in computer memory, subjecting this images to acoustic and linguistic analysis, calculation of acoustic characteristics of these persons and comparison of vectors of characteristics using results of preliminary analysis (signal-noise ratio of constituents of amplitude-frequency spectrum and frequency band). This results in possibility of combination of acoustic and linguistic analysis, use of statistic criteria for decision making, flexible tuning to quality and duration of records under investigation, possibility to tune decision rules for specific persons, use of computer database of sound standards in linguistic analysis. EFFECT: increased efficiency of identification in presence of noise and distortion. 1 dwg

Description

Translated fromRussian

Изобретение касается криминалистического исследования фонограмм устной русской речи. The invention relates to the forensic investigation of phonograms of oral Russian speech.

Известен способ идентификации личности по фонограммам, который проводится путем экстрагирования характерных особенностей диктора из произносимых однотипных фраз [2]. A known method of identifying a person by phonograms, which is carried out by extracting the characteristic features of the speaker from pronounced similar phrases [2].

В данном способе речевой сигнал фильтруют с помощью гребенки из 24 полосовых фильтров, затем детектируют, сглаживают, далее с помощью аналого-цифрового преобразователя и коммутатора сигнал вводят в цифровое обрабатывающее устройство, где автоматически выделяют и сохраняют индивидуализирующие признаки, связанные с интегральным спектром речи. In this method, the speech signal is filtered using a comb of 24 bandpass filters, then it is detected, smoothed, and then, using an analog-to-digital converter and switch, the signal is input into a digital processing device, where the individualizing features associated with the integrated spectrum of speech are automatically extracted and stored.

Данный способ теряет работоспособность на фонограммах устной речи, полученных в условиях повышенных искажений и помех из-за ограниченного набора индивидуализирующих признаков. Также этот способ имеет большой процент отказов от решения по идентификации, поскольку он требует фонограмм неизвестного и проверяемого с одинаковым контекстом. This method loses performance on the phonograms of oral speech obtained under conditions of increased distortion and interference due to the limited set of individualizing features. Also, this method has a large percentage of rejections of the identification decision, since it requires phonograms of the unknown and verified with the same context.

Известен способ, в котором для идентификации личности используются однотипные ключевые слова, выделяемые из фонограмм устной речи [3]. A known method in which to identify a person using the same type of keywords isolated from phonograms of oral speech [3].

В данном способе речевой сигнал подвергается кратковременному спектральному анализу, затем выделяются контуры особенностей спектра и основного тона в зависимости от времени. Полученные контуры являются индивидуализирующими. Решающее правило основано на сравнении полученных контуров для фонограмм проверяемого и неизвестного дикторов. In this method, the speech signal is subjected to short-term spectral analysis, then the contours of the characteristics of the spectrum and the fundamental tone are selected depending on the time. The resulting contours are individualizing. The decisive rule is based on a comparison of the resulting contours for the phonograms of the tested and unknown speakers.

Недостатком способа является зависимость результатов идентификации от качества фонограмм, полученных в условиях повышенных искажений и помех. Также этот способ имеет большой процент отказов от решения по идентификации, поскольку он требует фонограмм неизвестного и проверяемого с одинаковыми словами. The disadvantage of this method is the dependence of the identification results on the quality of the phonograms obtained under conditions of increased distortion and interference. Also, this method has a large percentage of rejections of the identification decision, since it requires phonograms of the unknown and verified with the same words.

Прототип способа идентификации личности основан на спектрально-полосно-временном анализе речи произвольного контекста [1]. Для исключения зависимости результатов идентификации от смыслового содержания произносимого текста из речевого сигнала выделяют звонкие участки, усредняют по времени их существования значения энергии в каждом из 24 спектральных фильтров в области высших формантных участков. Основной тон определяют на основе экстракции первой гармоники сигнала. Также определяют темп речи. Перечисленные параметры используют в качестве индивидуализирующих признаков. The prototype of the method of identification is based on spectral-band-time analysis of speech of an arbitrary context [1]. To exclude the dependence of the identification results on the semantic content of the spoken text, voiced sections are extracted from the speech signal, the energy values in each of the 24 spectral filters in the region of the higher formant sections are averaged over the time of their existence. The fundamental tone is determined based on the extraction of the first harmonic of the signal. Also determine the pace of speech. The listed parameters are used as individualizing signs.

Данный способ неработоспособен на фонограммах устной речи, полученных в условиях повышенных искажений из-за потери надежности выделения набора индивидуализирующих признаков. This method is inoperative on phonograms of oral speech obtained under conditions of increased distortion due to loss of reliability of the selection of a set of individualizing features.

Целью изобретения является повышение эффективности идентификации лиц по фонограммам их устной речи в условиях наличия помех и искажений исследуемых сигналов. The aim of the invention is to increase the efficiency of identification of persons by phonograms of their oral speech in the presence of interference and distortion of the studied signals.

Для этого по предлагаемому способу фонограммы речи неизвестного и проверяемого дикторов преобразовывают в цифровую форму с помощью аналого-цифрового преобразователя и хранят их оцифрованные образы в памяти ПЭВМ. To do this, according to the proposed method, phonograms of speech of unknown and verified speakers are converted into digital form using an analog-to-digital converter and their digitized images are stored in a PC memory.

Сущность предлагаемого изобретения поясняется блок-схемой изображенной на чертеже. The essence of the invention is illustrated by the block diagram shown in the drawing.

В памяти ПЭВМ для каждой из фонограмм отделяют речевой сигнал интересующего диктора от речи оппонента и импульсных помех (блок 1). Фонограммы подвергают предварительному анализу (блок 2). С помощью данного анализа измеряют отношение сигнал/шум компонент амплитудно-частотного спектра и частотный диапазон речевых сигналов на имеющейся звукозаписи речи. Определение частотного диапазона речевых сигналов и отношения сигнал/шум компонент амплитудно-частотного спектра необходимо для адаптивного выбора рабочей полосы измерения спектральных признаков. Для обеспечения сопоставимости результатов идентификационного исследования по спектральным акустическим признакам речи неизвестного и проверяемого лица выбирают сопоставимый диапазон частот. In the PC memory for each of the phonograms, the speech signal of the speaker of interest is separated from the opponent’s speech and impulse noise (block 1). Phonograms are subjected to preliminary analysis (block 2). Using this analysis, the signal-to-noise ratio of the components of the amplitude-frequency spectrum and the frequency range of the speech signals on the available voice recording are measured. The determination of the frequency range of speech signals and the signal-to-noise ratio of the components of the amplitude-frequency spectrum is necessary for the adaptive choice of the working band for measuring spectral features. To ensure comparability of the results of the identification study according to the spectral acoustic features of speech of an unknown and verified person, a comparable frequency range is selected.

Также измеряют отклонение от номинала скорости звукозаписи магнитофона. В случае значительного отклонения скорости звукозаписи от номинала фонограмму подвергают перезаписи с коррекцией скорости. Also measure the deviation from the nominal recording speed of the tape recorder. In the case of a significant deviation of the recording speed from the nominal value, the phonogram is subjected to dubbing with speed correction.

Затем фонограммы речи подвергают двум видам анализа - акустическому и лингвистическому. Then phonograms of speech are subjected to two types of analysis - acoustic and linguistic.

По оцифрованным образам речи в памяти ПЭВМ вычисляют акустические интегральные признаки, признаки, характеризующие отдельные фразы и слова (сопоставимый контекст), а также признаки, характеризующие отдельные звуки исследуемых дикторов (блоки 3 и 4). Acoustic integral signs, signs characterizing separate phrases and words (comparable context), as well as signs characterizing individual sounds of the studied speakers (blocks 3 and 4) are calculated from the digitized speech images in the PC memory.

В качестве акустических интегральных признаков и признаков для сопоставимого контекста выбраны следующие группы индивидуализирующих признаков, являющиеся оценками параметров статистического распределения компонент текущего спектра и основного тона анализируемого отрезка произвольной речи:
среднее значение спектра;
относительное время пребывания сигнала в полосах спектра;
медианные значения спектра речи в полосах;
относительная мощность спектра речи в полосах;
величины вариации огибающих спектра речи;
значения коэффициентов кросскорреляции спектральных огибающих между полосами спектра;
значения компонент гистограммы распределения длительности периодов основного тона;
значения компонент гистограммы распределения частоты основного тона.The following groups of individualizing signs, which are estimates of the parameters of the statistical distribution of the components of the current spectrum and the fundamental tone of the analyzed segment of arbitrary speech, are selected as acoustic integral signs and signs for a comparable context:
average value of the spectrum;
relative signal residence time in the spectral bands;
median speech spectrum values in bands;
relative power of the speech spectrum in the bands;
the magnitude of the variation of the envelopes of the spectrum of speech;
the values of the cross-correlation coefficients of the spectral envelopes between the bands of the spectrum;
the values of the components of the histogram of the distribution of the duration of the periods of the fundamental tone;
values of the components of the histogram of the distribution of the frequency of the fundamental tone.

В качестве признаков, характеризующих отдельные звуки исследуемых дикторов, выбраны следующие группы индивидуализирующих признаков, являющиеся оценками акустической модели речеобразования отдельных звуков:
значения частоты основного тона на гласных;
значения четырех форматных частот гласных звуков;
величина длительности гласных;
величины длительности согласных, окружающих гласный.The following groups of individualizing features, which are estimates of the acoustic model of speech formation of individual sounds, are selected as the characteristics characterizing the individual sounds of the studied speakers.
values of the frequency of the fundamental tone in vowels;
the values of the four format frequencies of vowels;
the length of the vowels;
durations of consonants surrounding a vowel.

Вычислительные значения признаков нормируются таким образом, чтобы их значения не зависили от общего уровня речевого сигнала, а также от линейных (частотных) искажений при прохождении речевых сигналов по реальным трактам звукозаписи, имеющих различные передаточные характеристики. Computational values of features are normalized so that their values do not depend on the general level of the speech signal, as well as on linear (frequency) distortions during the passage of speech signals through real recording paths having different transfer characteristics.

Затем сравнивают разность значений соответствующих акустических признаков речи неизвестного и проверяемого дикторов с порогами решения. В случае установления сходства акустических признаков принимают решение о принадлежности фонограмм одному и тому же лицу, в случае несовпадения признаков - разным лицам (блок 7). Then, the difference in the values of the corresponding acoustic features of the speech of the unknown and verified speakers is compared with the thresholds of the solution. In the case of establishing the similarity of acoustic signs, a decision is made on whether the phonograms belong to the same person, in case of mismatch of the signs to different persons (block 7).

Пороги принятия решения получают посредством статистической обработки массивов разностей акустических признаков, вычисленных на обучающих массивах фонограмм речи при условии их принадлежности одному и тому же или разным дикторам. В обучающий массив включаются также фонограммы речи проверяемого лица. Оценивается информативность каждого из акустических признаков и выбирается группа признаков, наиболее информативная для фонограммы речи данного проверяемого лица (блоки 5 и 6), а также оценивается порог принятия решения по группе акустических признаков на адаптивно выбранной полосе частот сравнения речи проверяемого и неизвестного лица. Decision thresholds are obtained by statistical processing of arrays of differences of acoustic features calculated on the training arrays of phonograms of speech, provided they belong to the same or different speakers. Phonograms of speech of the checked person are also included in the training array. The information content of each of the acoustic signs is evaluated and the group of signs that is most informative for the phonogram of the speech of the person being tested is selected (blocks 5 and 6), and the decision threshold for the group of acoustic signs on the adaptively selected frequency band for comparing the speech of the tested and the unknown person is estimated.

Лингвистические исследования фонограмм речи неизвестного и проверяемого проводятся методом слухового анализа и предназначены для выделения тембральных, интонационных, темпоральных, фонетических, лексико-грамматических и других особенностей речи (блоки 8 и 9). Linguistic studies of phonograms of speech of an unknown and verified are carried out by the method of auditory analysis and are intended to highlight timbral, intonational, temporal, phonetic, lexical and grammatical and other features of speech (blocks 8 and 9).

Для исключения случайных ошибок эксперта, а также для объективизации его оценок в памяти ПЭВМ создается опорный эталонный звучащий массив диалектных, акцентных и дефектных особенностей русской устной речи, обеспечивающий быстрый и удобный доступ к различного рода справочной информации (блок 10). To eliminate the expert’s accidental errors, as well as to objectify his assessments, a reference sounding array of dialect, accent, and defective features of Russian spoken language is created in the PC memory, which provides quick and convenient access to various kinds of reference information (block 10).

Достоинство лингвистических признаков заключается в том, что эксперт при их выделении способен улавливать тончайшие нюансы звучания речи, оценивать различия, не подающиеся в настоящее время инструментальному измерению. При этом эксперт сравнительно легко адаптируется к достаточно высокому уровню помех и искажений, которые зачастую имеют место в спорных фонограммах. The advantage of linguistic features is that the expert, when singled out, is able to capture the subtle nuances of the sound of speech, to evaluate differences that are not currently available to instrumental measurement. Moreover, the expert is relatively easy to adapt to a fairly high level of interference and distortion, which often occur in controversial phonograms.

Результаты раздельного лингвистического исследования представляются в виде перечня признаков, характеризующих анализируемую фонограмму, который запоминается в памяти ПЭВМ, где автоматически формируется соответствующий протокол для определения совпадающих и не совпадающих лингвистических признаков речи неизвестного и проверяемого лица (блок 11). The results of a separate linguistic study are presented in the form of a list of features characterizing the analyzed phonogram, which is stored in the PC memory, where the corresponding protocol is automatically generated to determine the matching and not matching linguistic features of the speech of an unknown and verified person (block 11).

По результатам акустического и лингвистического исследований принимается решение по комплексному исследованию. Based on the results of acoustic and linguistic studies, a decision is made on a comprehensive study.

Предлагаемый способ идентификации лиц по устной речи был испытан в криминалистической лаборатории в/ч 34435. Результаты испытаний предлагаемого способа идентификации лиц по устной речи подтвердили правильность выбранного подхода для достижения поставленных целей. The proposed method for identifying people by spoken language was tested in the forensic laboratory of military unit 34435. The test results of the proposed method for identifying people by spoken language confirmed the correctness of the chosen approach to achieve the goals.

Источники информации, принятые во внимание при экспертизе
1. Рамишвили Г. С. Способ идентификации личности по голосу (описание изобретения), кл. G 10 L 1/00, 1976.Sources of information taken into account during the examination
1. Ramishvili G. S. The method of identification by voice (description of the invention), class. G 10 L 1/00, 1976.

2. Offenlegungsschrift DT 2431458. Verfahren zur automatischen Sprecherkennung, Bunge, Ernst, Int. Cl. G 10 L 1/04, OT:05.02.76. 2. Offenlegungsschrift DT 2431458. Verfahren zur automatischen Sprecherkennung, Bunge, Ernst, Int. Cl. G 10 L 1/04, OT: 05.02.76.

3. United States Patent Office 3,466,394. Voice verification system, Walter K. French, Montrose N.Y., Int Cl H04m 1/24, Patented Sept. 9,1969. 3. United States Patent Office 3,466,394. Voice verification system, Walter K. French, Montrose N.Y., Int Cl H04m 1/24, Patented Sept. 9.1969.

Claims

Translated fromRussian

Способ идентификации личности по фонограммам произвольной устной речи, основанный на спектрально-полосно-временном анализе речевого сигнала, выделении характеристик индивидуальности устной речи и в сравнении этих характеристик с эталонными, отличающийся тем, что в качестве характеристик индивидуальности устной речи используют акустические интегральные признаки, являющиеся оценками парамтеров статистического распределения компонент текущего спектра и гистограмм распределения периодов и частот основного тона, измеренных на фонограммах речи как с произвольным, так и фиксированным контекстом, среди которых при адаптивном переобучении выбирают наиболее информативные признаки для речи данного проверяемого лица и независимые от помех и искажений, присутствующих в сравниваемых фонограммах, а также используют лингвистические признаки, фиксирумые экспертом при слуховом анализе фонограмм с применением автоматизированного банка опорных звуковых эталонов диалектных, акцентных и дефектных особенностей устной речи. A method of identifying a person from phonograms of arbitrary oral speech, based on spectral-band-time analysis of a speech signal, highlighting the characteristics of the individuality of oral speech and comparing these characteristics with the reference ones, characterized in that acoustic characteristics are used as characteristics of the individuality of oral speech, which are estimates paramters of the statistical distribution of the components of the current spectrum and histograms of the distribution of periods and frequencies of the fundamental tone, measured on the phonogram Ammah of speech with both an arbitrary and a fixed context, among which, with adaptive retraining, the most informative features for the speech of the person being tested are selected and independent of interference and distortion present in the compared phonograms, and they also use linguistic features recorded by the expert during auditory analysis of phonograms with using an automated bank of reference sound standards for dialect, accent, and defective features of oral speech.