DE19857070A1

Movatterモバイル変換

Info

Publication number: DE19857070A1
Application number: DE1998157070
Authority: DE
Inventors: Michael Mende; Ralph Ostwald; Thomas Pachunke
Original assignee: Individual
Current assignee: Individual
Priority date: 1998-12-10
Filing date: 1998-12-10
Publication date: 2000-06-15

Abstract

The method involves providing a detection vocabulary containing words and word parts as word constituents to be detected; determining a sequence of constituent hypotheses by comparing the digital data with the detector vocabulary for representing the text with the determined constituent hypothesis sequence; rule-based processing of the constituent hypothesis sequence to group it into individual words to determine a word sequence hypothesis; determining a constituent of the word sequence hypothesis requiring correction; and providing at least on further constituent hypothesis for the constituents requiring correction. Independent claims are also included for an arrangement for determining an orthographic reproduction of a text and for a computer program product.

Description

Translated fromGerman

Die vorliegende Erfindung betrifft ein Verfahren und eine Vorrichtung zur Ermittlung einer orthographischen Wiedergabe eines Textes, und insbesondere eine Anwendung eines derartigen Verfahrens für ein System zur automatischen Spracherkennung (ASE).The present invention relates to a method and an apparatus forDetermination of an orthographic representation of a text, and in particularan application of such a method for an automatic systemSpeech recognition (ASE).

ASE-Systeme arbeiten üblicherweise so, daß gesprochene Texte mittels eines Mikrofons in elektronische Signale umgewandelt werden, die dann gespeichert und analysiert werden, um aus diesen Sprachsignalen eine orthographische Wiedergabe bzw. Repräsentation des gesprochenen Textes zu ermitteln. Dabei wird üblicherweise ein Abgleich des Sprachsignals bzw. von aus diesem ermittelten Daten mit einem Wörterbuch durchgeführt, in dem ein sogenanntes Erkennervokabular gespeichert ist. Abhängig von dem Grad der Ähnlichkeit zwischen den digitalen Daten und den Inhalten des Erkennervokabulars werden dann Begriffe aus dem Vokabular ausgewählt, die eine Repräsentation des gesprochenen Textes darstellen sollen.ASE systems usually work in such a way that spoken texts by means ofof a microphone can be converted into electronic signals, which thencan be stored and analyzed in order to convert these speech signals into oneorthographic reproduction or representation of the spoken textdetermine. This is usually a comparison of the speech signal or fromthis determined data carried out with a dictionary in which aso-called recognizer vocabulary is stored. Depending on the degree ofSimilarity between the digital data and the content of theRecognizer vocabulary is then used to select terms from the vocabulary thatshould represent the spoken text.

Die Probleme bei einem derartigen ASE-System sind außerordentlich vielfältig, was im wesentlichen auf die Komplexität, Vielfalt und hohe Ambiguität der gesprochenen Sprache zurückzuführen ist. Beispielsweise kann das Wort "arbeiten" ein Verb sein oder ein Substantiv, was jedoch anhand des gesprochenen Textes praktisch nicht erkennbar ist, für die orthographische Wiedergabe aber Konsequenzen hat. Ein weiteres Problem besteht auch in der hohen Anzahl von Komposita der deutschen Sprache, denn für ein ASE-System ist es außerordentlich schwierig zu unterscheiden, ob zwei getrennte Worte oder ein Kompositum vorliegen.The problems with such an ASE system are extraordinarydiverse, essentially due to the complexity, diversity and high ambiguity of thespoken language. For example, the word"work" to be a verb or a noun, which, however, is based on thespoken text is practically undetectable for orthographicPlayback has consequences. Another problem is thathigh number of composites of the German language, because for an ASE systemit is extremely difficult to distinguish between two separate words or oneCompositum are available.

Herkömmliche Systeme zur Automatischen Spracherkennung (ASE) arbeiten wortformenorientiert. Damit ist die kleinste für den Benutzer sichtbare Einheit im Erkennungsprozeß die flektierte bzw. derivierte Form eines Lexems, letzteres vergleichbar einem herkömmlichen Wörterbucheintrag. Somit ist dies auch die kleinste dem ASE-Benutzer zugängliche Einheit, der Benutzer muß eine erkannte Form eines Lexems entweder als richtig akzeptieren oder als falsch verwerfen. Insbesondere besteht bei herkömmlichen Systemen keine Möglichkeit, sich die Tatsache zunutze zu machen, daß zu einem Lexem verschiedene Wortformen gehören können, beispielsweise die Wortformen Wand, Wände, Wänden, - evtl. Wandung - zu dem Lexem Wand. Herkömmliche ASE-Systeme sind auf die Erkennung der konkreten Wortformen beschränkt, sie sind nicht in der Lage, sich auf eine flexible Art und Weise mit der Tatsache auseinanderzusetzen, daß möglicherweise zwar eine erkannte derivierte Wortform falsch, das zugehörige nicht derivierte Lexem aber korrekt erkannt ist.Conventional systems for automatic speech recognition (ASE)work word-oriented. This is the smallest visible to the user Unit in the recognition process the inflected or derived form of a lexeme,the latter comparable to a conventional dictionary entry. So this is iteven the smallest unit accessible to the ASE user, the user must have onerecognized form of a lexeme either as right or wrongdiscard. In particular, with conventional systems there is no possibilityto take advantage of the fact that a lexeme is differentWord forms can include, for example, the word forms wall, walls,Walls, - possibly wall - to the Lexem wall. Conventional ASE systemsare limited to recognizing the specific word forms, they are not in theAble to deal with the fact in a flexible waythat a recognized derived word form may be wrong, the corresponding onenot derived lexeme but correctly recognized.

Ein weiterer Grund für die teilweise unbefriedigenden Leistungen herkömmlicher ASE-Systeme besteht in der prinzipiellen Beschränkung auf eine maximale Menge von erkennbaren Wortformen, im folgenden als Wortformen-Vokabular (= WfVok) bezeichnet. Auch wenn solche Vokabulare Zehn- oder sogar Hunderttausende von Wortformen umfassen, sind sie doch niemals eine Obermenge des Wortschatzes eines menschlichen Sprechers. Dazu trägt neben der Fülle an Fachwörtern, Neologismen (Neuschöpfungen) vor allem Namen maßgeblich die Flexion (grammatisch-regelhafte Änderung der Wortendung) und, als spezielle Erscheinung des Deutschen, die Komposition (Bildung von Wörtern aus Teilwörtern) bei. Die Komposition ist dabei ein linguistisch produktiver Prozeß, der ständigem Wandel unterworfen ist. So sind z. B. nicht nur Komposita durch Verknüpfung von (zwei oder mehr) Teilwörtern historisch entstanden, sondern dies kann aktuell jederzeit durch den Sprachbenutzer geschehen. Solche Ad-hoc-Komposita sind für andere Sprachbenutzer meist unmittelbar dekodierbar, nicht jedoch für ein wortformorientiertes ASE-System, da sie naturgemäß in einem Erkennervokabular nicht vorkommen können.Another reason for the sometimes unsatisfactory performanceconventional ASE systems consist in the principle limitation to onemaximum number of recognizable word forms, hereinafter referred to as word forms -Vocabulary (= WfVok) called. Even if such vocabularies are ten or evenInclude hundreds of thousands of word forms, as they are never oneSuperset of a human speaker's vocabulary. Add to thatthe abundance of technical terms, neologisms (new creations) especially namesdecisive is the inflection (grammatical regular change of the word ending) and,as a special phenomenon of German, the composition (formation of wordsfrom partial words). The composition is a linguistically productive process,is subject to constant change. So z. B. not only through compositesLinking (two or more) sub-words historically emerged, but thiscan currently be done by the language user at any time. Such ad hocComposites are usually immediately decodable for other language users, nothowever, for a word-form-oriented ASE system, since they are naturally in oneRecognizer vocabulary cannot occur.

Wegen der beschriebenen Beschränktheit der Vokabulare und der prinzipiell fehlerbehafteten, da stochastisch orientierten Erkennung besteht für den Benutzer eines ASE-Systems die ständige Notwendigkeit der Wortkorrektur. (Eine Erkennungsrate von 98% muß als optimal gewertet werden.) Diese erfolgt bei den gängigen ASE-Systemen konsequenterweise wortformenorientiert: Zu jeder erkannten Wortform wird dem Benutzer auf Wunsch eine Liste von statistisch ermittelten wahrscheinlichsten Alternativen präsentiert, aus der er die korrekte Form auswählen kann. Ist diese in der Liste nicht enthalten (etwa weil sie nicht zum Vokabular gehört), so muß sie per Tastatur eingegeben werden, selbst wenn sie in einem üblicherweise vorhandenen Hintergrundvokabular des ASE-Systems enthalten ist. Ein solches Hintergrundvokabular liefert im Korrekturfall bei herkömmlichen ASE-Systemen lediglich die phonetische Transkription, aber erst nach der manuellen Eingabe der orthographischen Form durch den Benutzer.Because of the limited vocabulary and the describedprincipally error-prone, since stochastically oriented recognition exists for theASE system users the constant need for word correction. (ADetection rate of 98% must be rated as optimal.) This takes place at theCommon ASE systems consistently word-form-oriented: for everyoneRecognized word form will give the user a list of statistical if desireddetermined most likely alternatives, from which he presents the correct oneCan choose shape. If this is not in the list (perhaps because it is notpart of the vocabulary), it must be entered using the keyboard, even ifit in a commonly available background vocabulary of the ASE systemis included. Such a background vocabulary is provided in the event of a correctionconventional ASE systems only the phonetic transcription, but onlyafter manual entry of the orthographic form by the user.

Die eigentliche Spracherkennung erfolgt herkömmlicherweise durch probabilistische Verfahren, sowohl was die Abfolge der Laute oder kleinerer Einheiten innerhalb eines gesprochenen Wortes angeht (d. h. den Vergleich dieser ermittelten Abfolge mit idealtypischen Aussprachen der Wortformen des Erkennervokabulars), als auch die Wortfolgen (d. h. den Vergleich von Wortkandidatenfolgen mit Wortfolgen aus offline analysierten Textkorpora). Wegen der sehr hohen Zahl kombinatorisch möglicher Bi- und Trigramme von Wortformen können auch in sehr großen Korpora diese nicht alle beobachtet werden. Es ergibt sich somit die Unmöglichkeit, auf der Basis dieser Verfahren irreguläre Wortfolgen von vornherein auszuschließen, was bedeutet, daß grundsätzlich eine mehr oder weniger umfangreiche Korrektur eines erkannten Textes nötig ist.The actual speech recognition conventionally takes place throughprobabilistic procedures, both as to the order of the sounds or smallerUnits within a spoken word (i.e. comparing themdetermined sequence with ideally typical pronunciations of the word forms of the recognizervocabulary), as well as the word sequences (i.e. the comparison of word candidate sequenceswith word sequences from text corpora analyzed offline). Because of the very high numberCombinatorial bi- and trigrams of word forms can also be used in verylarge corpora these are not all observed. The result isImpossibility of irregular word sequences fromexclude in advance, which means that basically one more orless extensive correction of a recognized text is necessary.

Es ist daher eine Aufgabe der vorliegenden Erfindung, ein Verfahren und eine Vorrichtung zur Ermittlung einer orthographischen Wiedergabe eines Textes zu schaffen, bei denen insbesondere eine Korrektur von Fehlern in der orthographischen Wiedergabe des Textes auf besonders einfache, flexible und vielseitige Art und Weise möglich ist.It is therefore an object of the present invention to provide a methodand a device for determining an orthographic reproduction of aTo create text, in particular a correction of errors in theOrthographic rendering of the text in a particularly simple, flexible andversatile way is possible.

Ein Verfahren und eine Vorrichtung gemäß der vorliegenden Erfindung sind in den unabhängigen Ansprüchen definiert. Die abhängigen Ansprüche definieren besondere Ausführungsformen der Erfindung.A method and an apparatus according to the present inventionare defined in the independent claims. The dependent claimsdefine particular embodiments of the invention.

Die Aufgabe der Erfindung wird im wesentlichen gelöst, indem ein Erkenner-Vokabular vorgesehen ist, das aus Wörtern und Wortteilen als Wortkonstituenten besteht, ferner daß basierend auf diesem Vokabular eine Konstituentenhypothesenfolge ermittelt wird, welche dann unter Anwendung von Regeln in eine Wortfolgenhypothese umgewandelt wird. Schließlich wird eine korrekturbedürftige Konstituente dieser Wortfolgenhypothese ermittelt und für diese korrekturbedürftige Konstituente wird mindestens eine weitere Konstituenten-Hypothese geliefert.The object of the invention is essentially achieved by aRecognizer vocabulary is provided that consists of words and parts of words asWord constituents exist, further that based on this vocabularyConstituent hypothesis sequence is determined, which then usingRules is converted into a word sequence hypothesis. Eventually one willConstituents of this word sequence hypothesis in need of correction are determined and forthis constituent in need of correction is at least one further constituentHypothesis delivered.

In einer bevorzugten Ausführungsform erfolgt die Lieferung der weiteren Konstituenten-Hypothesen basierend auf einem phonetisch-akustischen Ähnlichkeitskalkül. Hierzu wird beispielsweise im Vokabular, wobei auch das Hintergrundvokabular einbezogen werden kann, nach Einträgen gesucht, deren phonetische Transkription der der Konstituenten-Hypothese ähnelt. Dadurch wird vermieden, daß für die Lieferung weiterer ähnlicher Hypothesen die Audiodaten der Erkennungssitzung gespeichert bleiben müssen, vielmehr kann ein derartiges Suchen nach weiteren Hypothesen auch "offline" erfolgen, d. h. basierend auf dem ersten Erkennungsresultat als zugrundeliegender Hypothese unter der Anwendung von Algorithmen, die phonetisch ähnliche Resultate liefern und lediglich auf die akustischen Repräsentationen der Konstituenten durch deren phonetische Transkription zugreifen.In a preferred embodiment, the others are deliveredConstituent hypotheses based on a phonetic-acousticSimilarity calculus. This is done, for example, in the vocabulary, whereby that tooBackground vocabulary can be included, searched for entries whosephonetic transcription similar to the constituent hypothesis. This willavoided the audio data for the delivery of further similar hypothesesthe recognition session must remain stored, rather suchSearches for further hypotheses can also take place "offline", d. H. based on thefirst recognition result as the underlying hypothesis under the applicationof algorithms that deliver phonetically similar results and only on theacoustic representations of the constituents by their phoneticAccess transcription.

Vorteilhafterweise kann die Lieferung weiterer Hypothesen auch unter der Anwendung von Flexions- und/oder Derivationsparadigmen erfolgen, das heißt es werden beispielsweise als weitere Hypothesen mögliche weitere Flexionen der ersten Hypothese geliefert. In diesem Fall kann man sich die Tatsache zunutze machen, daß möglicherweise zwar das Lexem "Haus" korrekt erkannt wurde, allerdings in fehlerhafter Weise im Nominativ statt im Dativ "Hause".Advantageously, the delivery of further hypotheses can also underthe use of inflection and / or derivation paradigms, that ispossible further inflections of thefirst hypothesis. In this case, you can take advantage of the fact make sure that the lexeme "house" may have been recognized correctly,however, incorrectly in the nominative instead of in the "home" dative.

In einer weiteren vorteilhaften Ausführungsform kann die Lieferung weiterer Hypothesen auch die Segmentierung von Wörtern einer Wortfolgenhypothese in deren Konstituenten umfassen oder umgekehrt, um so ein falsch erkanntes Kompositum in seine Bestandteile aufzutrennen oder ein auf fehlerhafte Weise nicht als Kompositum erkanntes Wort aus seinen Bestandteilen zu bilden.In a further advantageous embodiment, the deliveryfurther hypotheses include the segmentation of words oneWord sequence hypothesis in their constituents or vice versa, so oneincorrectly recognized compound to separate into its components or oneincorrect word from its components that is not recognized as a compoundto build.

Besonders vorteilhaft ist es, wenn verschiedene Wege zur Lieferung einer weiteren Hypothese vorgesehen sind und der Benutzer - z. B. durch eine Menüsteuerung - zwischen den unterschiedlichen Möglichkeiten wählen kann. Vorteilhafterweise ist ein solches Menü hierarchisch aufgebaut, so daß auf der ersten Ebene z. B. zwischen phonetischer Ähnlichkeit und Flexion/Derivation gewählt werden kann, während in einer zweiten Ebene der Menüpunkt Flexion/Derivation z. B. noch unterteilt ist in Flexion, Affigierung, Segmentierung wobei Segmentierung beispielsweise für die Trennung oder Bildung von Komposita steht.It is particularly advantageous if different ways of deliveryAnother hypothesis is provided and the user - e.g. B. by aMenu control - can choose between the different options.Such a menu is advantageously structured hierarchically, so that on thefirst level z. B. between phonetic similarity and inflection / derivationcan be selected while the menu item Flexion /Derivation e.g. B. is still divided into inflection, affigation, segmentationSegmentation stands for example for the separation or formation of composites.

Ein weiterer Weg zur Lieferung von weiteren Hypothesen umfaßt die Unterteilung von Konstituenten-Hypothesen in Wörter und/oder Morphe und/oder Silben, wobei jeweils für diese Teileinheiten neue Hypothesen geliefert werden können. Dies ermöglicht beispielsweise die Fokussierung auf einzelne fehlerhaft erkannte Silben bei der Korrektur, während korrekt erkannte Silben (oder Wörter oder Morphe) ohne Veränderung beibehalten werden können. Die Lieferung von weiteren Silben- oder Wort- oder Morph-Hypothesen kann dann wiederum auf verschiedene Arten erfolgen, beispielsweise basierend auf einem phonetisch-akustischen Ähnlichkeitskalkül.Another way to deliver further hypotheses includes theSubdivision of constituent hypotheses into words and / or morphs and / orSyllables, whereby new hypotheses are provided for these subunitscan. This enables, for example, focusing on individual errorsrecognized syllables when correcting while correctly recognized syllables (or wordsor Morphe) can be maintained without change. The delivery offurther syllable or word or morph hypotheses can then turn updifferent types occur, for example based on a phoneticacoustic similarity calculation.

In einer bevorzugten Ausführungsform erfolgt die Ermittlung von korrekturbedürftigen Wörtern oder Konstituenten durch eine regelbasierte Bewertung der n-Gramme der Konstituentenhypothesenfolge und/oder der Wortfolgenhypothese. Dadurch kann die Ermittlung korrekturbedürftiger Elemente automatisch ohne Eingriff des Benutzers erfolgen. In einer weiteren vorteilhaften Ausführungsform kann eine derartige regelbasierte Bewertung getriggert werden durch eine geringe Erkennungswahrscheinlichkeit, so daß bei einer wahrscheinlich richtig erkannten Hypothesenfolge eine Erzeugung von Fehlern durch die regelbasierte Bewertung der n-Gramme verhindert wird.In a preferred embodiment, the determination ofWords or constituents in need of correction by a rule-based Evaluation of the n-grams of the constituent hypothesis sequence and / or theWord order hypothesis. This enables the identification of elements in need of correctiondone automatically without user intervention. In another advantageousIn one embodiment, such a rule-based evaluation can be triggeredby a low probability of detection, so that with a probablecorrectly recognized hypothesis sequence a generation of errors by therule-based evaluation of the n-grams is prevented.

Vorzugsweise sind im Erkenner-Vokabular Wortteile nur dann als Konstituenten aufgenommen, wenn ihnen ein gewisses Mindestmaß an akustischem Gewicht zukommt. Dies verhindert eine "Zersplitterung" des Erkennervokabulars in eine zu große Zahl von Konstituenten, was wiederum die Anzahl möglicher Permutationen erhöht und somit die Anforderungen an die Erkennung vergrößert und damit deren Trefferquote senkt.Word parts are preferably only in the recognizer vocabulary asConstituents added when given a certain minimumacoustic weight. This prevents "fragmentation" of theRecognizer vocabulary in too many constituents, which in turnNumber of possible permutations increased and thus the requirements forDetection increases and thus their hit rate is reduced.

Vorteilhafterweise ist ein Hintergrundvokabular zusätzlich zum Erkennervokabular vorgesehen, in dem eine Vielzahl von Wörtern mit ihren Transkriptionen gespeichert ist und in dem zum Auffinden weiterer Hypothesen nachgeschlagen werden kann. Vorzugsweise ist dabei dieses Hintergrundvokabular - ebenso wie das Erkennervokabular - auf Konstituentenbasis aufgebaut, um insbesondere Hypothesen für Konstituenten liefern zu können.A background vocabulary is advantageous in addition toRecognizer vocabulary provided in which a variety of words with theirTranscriptions is stored and used to find further hypothesescan be looked up. This is preferably thisBackground vocabulary - just like the recognizer vocabularyConstituent base built up in particular hypotheses for constituentsto be able to deliver.

Weiter ist es vorteilhaft, wenn ein nicht erkanntes Wort oder eine Wortform oder Konstituente, die durch Anwendung eines der Mechanismen zur Lieferung weiterer Hypothesen aufgefunden wurde, in das Hintergrundvokabular (oder auch das Erkennervokabular) übernommen werden kann. Dadurch wird letztlich das zur Verfügung stehende Vokabular dynamisch erweiterbar und die Grenzen des Vokabularumfangs bei herkömmlichen ASE-Systemen können bedeutend erweitert werden. Dieser neu aufgefundene Eintrag kann dann z. B. selbst wieder Konstituente eines wiederum neuen Kompositums sein.It is also advantageous if an unrecognized word or aWord form or constituent, which can be achieved by using one of the mechanisms forDelivery of further hypotheses was found in the background vocabulary(or the recognizer vocabulary) can be adopted. This willultimately the available vocabulary is dynamically expandable and theVocabulary limits in conventional ASE systems cansignificantly expanded. This newly found entry can then z. B.are constituents of a new compound.

Besonders vorteilhaft ist es, wenn die verschiedenen Elemente, die zur Korrektur einer Hypothese dienen können, sämtlich implementiert sind und ferner eine durch Korrektur aufgefundene neue Hypothese selbst wiederum erneut als Grundlage für eine weitere Korrektur dient. Dadurch kann auf iterative Weise die letztlich korrekte orthographische Wiedergabe des Textes ermittelt bzw. erzeugt werden.It is particularly advantageous if the various elements used forCorrection of a hypothesis can serve, all are implemented and furthera new hypothesis found by correction itself again asServes as the basis for further correction. This allows theultimately determined or generated correct orthographic rendering of the textbecome.

In einer besonders bevorzugten Ausführungsform besteht die vorliegende Erfindung in einem Verfahren, das den orthographischen Output eines gegebenen, mit erfindungsgemäß d. h. auf Konstituentenbasis aufgebautem Sprachmodell (= die Gesamtheit aus WfVok und Wort[formen]folgestatistik) arbeitenden automatischen Spracherkennungssystems (ASE) für kontinuierliches Diktieren automatisch so bearbeitet, daß unabhängig von verwendeter Spracherkennungs-Engine auf der Basis linguistischer Wortumgebungsregeln (1) irreguläre Wortfolgen korrigiert werden, (2) beliebige nicht im Erkenner-Vokabular, wohl aber in einem beliebig großen Hintergrundvokabular enthaltene Komposita erkannt werden und (3) dem Benutzer im Bedarfsfall eine Korrektur ermöglicht wird, die ihm - unabhängig von vorhandenen Audiodaten und unabhängig von korrekt oder falsch erkannten Wortgrenzen - dynamisch-approximativ Wortkandidaten bzw. Wortteil-Kandidaten anbietet, die aus einer beliebig großen, vom verwendeten Erkenner-Vokabular unabhängigen Menge von Wortformen bzw. Teilwortformen, dem dynamisch variablen Hintergrundvokabular, automatisch ausgewählt werden.In a particularly preferred embodiment, thepresent invention in a method that the orthographic output of agiven, according to the invention d. H. built on a constituent basisLanguage model (= the total of WfVok and word [form] follow-up statistics)working automatic speech recognition system (ASE) for continuousDictation automatically processed so that regardless of which one is usedSpeech recognition engine based on linguistic word environment rules (1)irregular word sequences are corrected, (2) any not in the recognizer vocabulary,but it is probably in composites contained in an arbitrarily large background vocabularyare recognized and (3) enables the user to correct them if necessarythat is - independent of existing audio data and independent ofcorrectly or incorrectly recognized word boundaries - dynamic-approximateOffers word candidates or word part candidates that consist of an arbitrarily large,set of word forms independent of the recognizer vocabulary used orSubword forms, the dynamically variable background vocabulary, automaticallyto be selected.

Die Erfindung wird nachfolgend im Detail anhand mehrerer Ausführungsbeispiele unter Bezugnahme auf beiliegende Zeichnungen beschrieben. Es zeigen:The invention is described in detail below with reference to severalEmbodiments with reference to the accompanying drawingsdescribed. Show it:

Fig. 1 den schematischen Aufbau eines erfindungsgemäßen ASE-Systems bzw. Verfahrens, das eine Konstituentenhypothesenfolge liefert, mit einem zugehörigen Korrekturmodul;Fig. 1 shows the schematic structure of an ASE system or method according to the invention, which provides a Konstituentenhypothesenfolge, with an associated correction module;

Fig. 2 schematisch den Aufbau des Korrekturmoduls bzw. den Ablauf des Korrekturverfahrens.Fig. 2 shows schematically the structure of the correction module or the course of the correction process.

Einem ASE-System gemäß einem ersten Ausführungsbeispiel der Erfindung liegen auf der Seite des zu erkennenden Vokabulars sprachliche Einheiten zugrunde, hier Konstituenten genannt, die Wortformen oder aber Teilen von Wortformen entsprechen. Diese Konstituenten sind die primären Elemente des Sprachsignals, sie bilden beispielsweise durch Zusammenfügung die Komposita.An ASE system according to a first embodiment of theInvention lie on the side of the vocabulary to be recognized linguisticallyBased on units, here called constituents, the word forms or partsof word forms. These constituents are the primary elements of theSpeech signal, they form the composites for example by assembly.

Damit ergibt sich zwangsweise, daß das System in der Lage sein muß, ohne diskretes Diktieren die Wortgrenzen der gesprochenen Sprache zu erkennen. Während der Benutzer bei ASE-Systemen für diskretes Diktieren die Leerzeichen zwischen den orthographischen Wortformen durch eine Pause im Sprechfluß "wiedergeben" muß, besteht die Notwendigkeit bei Systemen zum kontinuierlichen Diktieren, wie auch dem hier beschriebenen, nicht. Bei der hier möglichen pausenfreien Sprechweise gibt es im resultierenden Sprachsignal keine signalphonetisch nachweisbare segmentale Entsprechung für Wortgrenzen. Der Unterschied zwischen Kompositum und entsprechender Wortgruppe, wie z. B. Steuererklärung und Steuer-Erklärung, kann als Ergebnis rein orthographischer Konvention und damit als lautsprachlich irrelevant betrachtet. Die Bestandteile der Komposita, die Konstituenten, sind die primären Elemente des Sprachsignals, auf denen die Wortfolgestatistik der Basis-ASE beruht.This inevitably means that the system must be able toto recognize the word limits of the spoken language without discreet dictation.While the user in ASE systems for discrete dictation, the spacesbetween the orthographic word forms through a pause in the flow of speech"Play" must exist in continuous systemsDon't dictate like the one described here. With the possible herethere is no non-stop speech in the resulting speech signalDetectable segmental correspondence for word boundaries. TheDifference between compound and corresponding phrase, such as B.Tax return and tax return can be purely orthographic as a resultConvention and thus regarded as irrelevant in terms of speech. The components of theComposites, the constituents, are the primary elements of the speech signal, onwhich are based on the word sequence statistics of the basic ASE.

Dafür sind vorzugsweise zwei Voraussetzungen gegeben: Zum einen ist das zugrundeliegende Sprachmodell so vorzubereiten, daß es auf den Konstituenten statt auf herkömmlichen Wortformen basiert. Zum anderen müssen vorher die Konstituenten der zu erkennenden Wortformen sinnvoll bestimmt werden. Eine vollständige automatische Zerlegung aller Komposita ist nicht zweckmäßig, da sinnlose Zerlegungen (Bsp.: Verbraucher) sicher ausgeschlossen werden müssen, was auch bei Verwendung von morphologisch basierten Grammatiken nicht zu leisten ist (siehe hierzu z. B. T. Pachunke et al., "Broad Coverage Automatic Morphological Segmentation of German Words", Proceedings of the Fifteenth International Conference on Computational Linguistics, Nantes, France, July 1992, Vol. IV, S. 1218-122). Würden inadäquate Wortformenzerlegungen zugelassen, wäre eine Verfälschung des Korpus und damit der Wort[formen]folgestatistik die Folge.There are preferably two prerequisites for this: First isto prepare the underlying language model so that it is based on theConstituents rather than traditional word forms. The other mustpreviously the constituents of the word forms to be recognized meaningfully determined become. A complete automatic disassembly of all composites is notExpedient, as senseless decompositions (e.g. consumers) are definitely excludedhave to be what even when using morphologically basedGrammars cannot be provided (see, for example, T. Pachunke et al., "BroadCoverage Automatic Morphological Segmentation of German Words ", Proceedingsof the Fifteenth International Conference on Computational Linguistics, Nantes,France, July 1992, Vol. IV, pp. 1218-122). Would inadequate word formsdisassembling would be a falsification of the body and thus theWord [form] follow statistics the result.

Konstituenten sollten Wortcharakter haben, d. h. sie müssen lautlich mindestens durch eine Silbe repräsentiert werden und mindestens ein Stamm-Morph enthalten, welches einen Wortneben- oder Hauptakzent trägt. Erkennerabhängig sollte jeder Konstituente ein gewisses "akustisches Gewicht" zukommen, d. h. ein bestimmtes erwartbares Energiequantum im Sprachsignal. Außerdem sollten die Konstituenten im jeweiligen Kontext syntagmenbildend interpretiert werden können. So ist beispielsweise die Zerlegung Vor-Wand für Vorwand zwar sowohl orthographisch als auch lautlich korrekt, sie führt jedoch für die Teilwörter vor und Wand zu einer verfälschten Wort[formen]folgestatistik, da diese dem Wort Vorwand nicht zugrundeliegen. Betrachtet man den häufigen Fall von hypotaktisch aus Grundwort und Bestimmungsteil aufgebauten Determinativkomposita vom Typ Hauswand oder auch Hinterwand, so wird deutlich, daß die Umgebung des Grundwortes -wand syntaktisch vergleichbar sein wird mit der von Wand, jedoch grundsätzlich anders als die des Wortes Vorwand.Constituents should be word-based, i.e. H. they have to be noisyare represented by at least one syllable and at least one parentContain morph, which has a minor or main accent.Depending on the recognizer, each constituent should have a certain "acoustic weight"to come d. H. a certain expected amount of energy in the speech signal.In addition, the constituents should form syntagmas in the respective contextcan be interpreted. For example, the pre-wall disassembly is forThe pretext is correct both orthographically and in terms of sound, but it leads tothe partial words in front and wall to a falsified word [form] sequence statistics, becausethese do not underlie the pretext. Consider the common caseof hypotactically constructed from basic word and part of determinationDeterminative composites of the type house wall or also rear wall, so isclearly that the surroundings of the basic word -wand be syntactically comparablewith the wall, but fundamentally different from that of the word pretext.

Die Konstituentengrenze ist vorzugsweise immer zugleich Morph- und Silbengrenze; damit sind die gefundenen Wortsegmente die kleinste Einheit, auf der sowohl morphologische als auch phonetisch-silbenorientierte Mechanismen operieren können.The constituent boundary is preferably always morph andSyllable boundary; thus the word segments found are the smallest unit, upof both morphological and phonetic-syllable-oriented mechanismscan operate.

Nach dem Ermitteln 'zerfallskritischer' Komposita, d. h. derjenigen Komposita für die eine Zerlegung in ihre Bestandteile sinnvoll erscheint, erfolgt eine entsprechende Vokabularanpassung, d. h. eine Ersetzung von Komposita der Form ab durch die Wortfolge 'a-b' in den Korpora für die Wort[formen]folgestatistik.After identifying 'decay-critical' composites, i.e. H. of thoseComposites for which a breakdown into their constituents seems sensible occura corresponding vocabulary adjustment, d. H. a replacement of composites ofForm ab through the word sequence 'a-b' in the corpora for the word [form] sequencestatistics.

Es wird ein Lexikon mit gemäß den oben beschriebenen Prinzipien konstituenten-segmentierten Wortformen verwendet, das die zur Vokabulargenerierung notwendigen Korpora abdeckt. Da sich aus den verwendeten Konstituenten fast beliebig viele weitere Komposita bilden lassen, von denen ein gewisser Anteil real verwendete Wörter sind, ergibt sich für den Benutzer eine faktische Vergrößerung des Vokabulars gegenüber einem wortformen-orientierten ASE-System. Dieses in das ASE-System integrierte Wortformen-Lexikon ist erweiterbar gestaltet, und zwar sowohl um Komposita-Neubildungen aus bereits gespeicherten Konstituenten, als auch um neue Konstituenten. Diese werden durch Aufnahme ins Konstituentenvokabular verfügbar. Für den Benutzer sind diese beiden verschiedenen Aufnahmeprozesse nicht als unterschiedlich erkennbar, er bedient über ein entsprechendes Front-End nur das dynamische Wortformenlexikon, das intern auf Konstituentenbasis aufgebaut ist.It becomes a lexicon with the principles described aboveuses constituent-segmented word forms, which are used forVocabulary generation covers necessary corpora. Because of thelet constituents used form almost any number of other composites, fromof which a certain proportion are actually used words results for theUsers actually increase the vocabulary compared to oneword form-oriented ASE system. This integrated into the ASE systemWord forms lexicon is designed to be expandable, both with compositesNew formations from already saved constituents, as well as new onesConstituents. These are included in the constituent vocabularyavailable. For the user, these are two different recording processesnot recognizable as different, it operates via a corresponding front endonly the dynamic word form lexicon, which is internal on a constituent basisis constructed.

Basierend auf diesem Umfeld liefert das ASE-System zunächst eine erste Hypothese für einen zu erkennenden Text. Diese Hypothese ist üblicherweise korrekturbedürftig, die Korrektur erfolgt dann durch einen Komplex aus Regelapparaten und Programm-Modulen, der die erste Hypothese weiter verarbeitet.Based on this environment, the ASE system initially delivers onefirst hypothesis for a text to be recognized. This hypothesis isUsually in need of correction, the correction is then made by a complexfrom control apparatus and program modules, which continues the first hypothesisprocessed.

Der eigentliche Komplex aus Regelapparaten und Programm-Modulen, der den Kontituenten-Output der ASE nachbereitet, im folgenden "Nachbereitungs-Modul" genannt, ist wie inFig. 1 gezeigt der Basiserkennung nachgeschaltet:The actual complex of control apparatus and program modules, which postprocesses the ASE's continent output, hereinafter referred to as "postprocessing module", is connected downstream of the basic recognition as shown inFIG. 1:

Die einzelnen Komponenten des "Nachbereitungs-Moduls" sind inFig. 2 schematisch dargestellt:The individual components of the "postprocessing module" are shown schematically inFIG. 2:

Der Input des "Nachbereitungs-Moduls" ist eine Folge von n erkannten Wortkonstituenten inklusive der von der ASE-Engine mit Wahrscheinlichkeitswerten versehenen nächsten Hypothesen. In einer beliebig wiederholbaren Abfolge von Stufen wird dieser Input (im folgenden nur "Konstituenten" genannt) modifiziert und mit Information aus externen Quellen angereichert. Eine potentielle, jedoch nicht notwendige Quelle ist dabei die Wortkorrektur durch den Benutzer.The input of the "postprocessing module" is a sequence of n recognizedWord constituents including those from the ASE engineProbability values provided next hypotheses. In any oneRepeatable sequence of levels is this input (in the following only"Constituents") modified and with information from external sourcesenriched. A potential but not necessary source is theWord correction by the user.

In einer ersten (Analyse-)Stufe erfolgt eine Silbensegmentierung der Konstituenten. Ergebnis ist eine gewichtete Analyse der phonetischen Konstituentenstruktur. Im nächsten Schritt wird eine morphologische Analyse der Konstituenten hinzugefügt, die zum einen die eigentliche Komposition (Zusammensetzung der Konstituenten zu Komposita) durchführt und zum anderen für spätere Schritte Informationen über weitere flektierte bzw. affigierte oder derivierte Formen sowie die Wortart liefert.In a first (analysis) stage, syllable segmentation takes placeConstituents. The result is a weighted analysis of the phoneticConstituent structure. The next step is a morphological analysis of theConstituents added, on the one hand the actual composition(Composition of constituents to composites) and the otherfor later steps information about further inflected or afflicted orderives derived forms as well as the part of speech.

In einer syntaktischen Stufe werden Wortfolgeregeln durch partielles Parsing auf das n-Gramm angewendet, wobei u. a. auf die zuvor gewonnene Wortartinformation zugegriffen wird. Diese Regeln können getriggert werden durch "Zonen geringer Wahrscheinlichkeit", d. h. sie werden erst abgearbeitet, wenn die von der ASE-Engine ermittelte Konstituentenfolge unwahrscheinlich ist bzw. ihre Wahrscheinlichkeit unterhalb eines bestimmten Schwellenwertes liegt. In einer integrativ-phonetischen Stufe werden mittels eines engine-abhängigen optimierten Ähnlichkeits-Kalküls wiederum Hypothesen zu den Konstituenten gebildet.In a syntactic level, word order rules are replaced by partialParsing applied to the n-gram, where u. a. on the previously wonPart of speech information is accessed. These rules can be triggered by"Low probability zones", i. H. they are only processed when thethe sequence of constituents determined by the ASE engine is unlikely or theirsProbability is below a certain threshold. In aintegrative-phonetic level are optimized by means of an engine-dependentSimilarity calculus in turn formed hypotheses about the constituents.

Das Korrekturmodul stellt dem Benutzer die Ergebnisse der Wort-, Silben- und Morphsegmentierung zur Verfügung, so daß er über geeignete graphische Interfaces folgende Aktionen ausführen kann:
The correction module provides the user with the results of the word, syllable and morph segmentation, so that he can carry out the following actions via suitable graphical interfaces:

- beliebiges Zusammenfügen oder Trennen von Konstituenten- Any combination or separation of constituents
- Beeinflussung von Groß-/Kleinschreibung- Case sensitivity
- Flexion/Derivation/Präfigierung- Flexion / Derivation / Prefecture

Im Zusammenspiel dieser Operationen mit einer geeigneten Präsentation der Hypothesen kann der Benutzer Word Morphing betreiben: Dabei wählt er eine Wortformen- oder Konstituenten-Hypothese aus, die dann an die Stelle des ursprünglich Erkannten rückt und erhält in Abhängigkeit von seiner Auswahl neu generierte, spezifische Hypothesen, aus denen er wiederum die passendste auswählt. Diesen Vorgang kann er wiederholen, bis das gewünschte Wort erreicht ist. Damit steht eine tastaturfreie, beispielsweise auch per Sprache komfortabel zu bediendende Korrektur zur Verfügung.In the interaction of these operations with a suitable oneWord morphing can be performed by the user to present the hypotheses:he selects a word form or constituent hypothesis, which is then passed on to theThe place of what was originally recognized moves and maintains depending on itSelection of newly generated, specific hypotheses from which, in turn, theselects the most suitable. He can repeat this process until the desired oneWord is reached. This is a keyboard-free, for example, also by voiceCorrection is easy to use.

Für den Fachmann ergeben sich auf zwanglose Weise weitere Modifikationen des geschilderten Ausführungsbeispiels. Beispielsweise können die einzelnen Korrekturmechanismen in unterschiedlicher Reihenfolge durchgeführt werden. Des weiteren kann das beschriebene Verfahren nicht nur in einem ASE-System angewendet werden, sondern beispielsweise auch in einem System zur automatischen Rechtschreibprüfung und -korrektur.For those skilled in the art, there are more easilyModifications of the described embodiment. For example, theindividual correction mechanisms in different ordersbecome. Furthermore, the method described cannot only be used in an ASESystem are applied, but also for example in a system forautomatic spell checking and correction.

Das geschilderte Ausführungsbeispiel läßt sich auf verschiedene Weisen in die Tat umsetzen, so können beispielsweise die geschilderten Module mittels auf einem Rechner ablaufender Programme und somit rein durch Software realisiert werden, desgleichen ist aber auch eine hybride Realisierung teils mittels Software teils mittels Hardware ohne weiteres für den Fachmann vorstellbar.The described embodiment can be differentImplement the wise, for example, the modules describedby means of programs running on a computer and thus purely by softwarecan be realized, but also a hybrid realization is partly by means ofSome of the software is easily imaginable for the specialist using hardware.

Für die Anwendung auf ein Spracherkennungssystem sind implizit natürlich verschiedene Hardwarekomponenten, beispielsweise ein Mikrofon und entsprechende Komponenten zur Umwandlung der Sprache in digitale Signale erforderlich, die Bereitstellung und Realisierung dieser Mittel bereiten jedoch dem Fachmann keine Probleme. Es versteht sich ferner, daß die Erfindung neben ihrer Realisierung als Verfahren auch durch eine entsprechende Vorrichtung sowie durch einen Datenträger realisiert werden kann, der Programmcode enthält, welcher einen Computer zur Durchführung des erfindungsgemäßen Verfahrens veranlaßt.For application to a speech recognition system are implicitof course, different hardware components, such as a microphone andcorresponding components for converting speech into digital signalsnecessary, but the provision and implementation of these funds prepare the Professional no problems. It is further understood that the invention in addition to itsRealization as a method also by means of a corresponding devicecan be realized by a data medium that contains program code,which a computer for performing the method according to the inventionprompted.