Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Deterministic finite automaton

From Wikipedia, the free encyclopedia
Finite-state machine

"DFSA" redirects here. The term may also refer todrug-facilitated sexual assault.

An example of a deterministic finite automaton that accepts only binary numbers that are multiples of 3. The state S0 is both the start state and an accept state. For example, the string "1001" leads to the state sequence S0, S1, S2, S1, S0, and is hence accepted.

In thetheory of computation, a branch oftheoretical computer science, adeterministic finite automaton (DFA)—also known asdeterministic finite acceptor (DFA),deterministic finite-state machine (DFSM), ordeterministic finite-state automaton (DFSA)—is afinite-state machine that accepts or rejects a givenstring of symbols, by running through a state sequence uniquely determined by the string.[1]Deterministic refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines,Warren McCulloch andWalter Pitts were among the first researchers to introduce a concept similar to finite automata in 1943.[2][3]

The figure illustrates a deterministic finite automaton using astate diagram. In this example automaton, there are three states: S0, S1, and S2 (denoted graphically by circles). The automaton takes a finitesequence of 0s and 1s as input. For each state, there is a transition arrow leading out to a next state for both 0 and 1. Upon reading a symbol, a DFA jumpsdeterministically from one state to another by following the transition arrow. For example, if the automaton is currently in state S0 and the current input symbol is 1, then it deterministically jumps to state S1. A DFA has astart state (denoted graphically by an arrow coming in from nowhere) where computations begin, and aset ofaccept states (denoted graphically by a double circle) which help define when a computation is successful.

A DFA is defined as an abstract mathematical concept, but is often implemented in hardware and software for solving various specific problems such aslexical analysis andpattern matching. For example, a DFA can model software that decides whether or not online user input such as email addresses are syntactically valid.[4]

DFAs have been generalized tonondeterministic finite automata (NFA) which may have several arrows of the same label starting from a state. Using thepowerset construction method, every NFA can be translated to a DFA that recognizes the same language. DFAs, and NFAs as well, recognize exactly the set ofregular languages.[1]

Formal definition

[edit]

A deterministic finite automatonM is a 5-tuple,(Q, Σ,δ,q0,F), consisting of

Letw =a1a2...an be a string over the alphabetΣ. The automatonM accepts the stringw if a sequence of states,r0,r1, ...,rn, exists inQ with the following conditions:

  1. r0 =q0
  2. ri+1 =δ(ri,ai+1), fori = 0, ...,n − 1
  3. rnF{\displaystyle r_{n}\in F}.

In words, the first condition says that the machine starts in the start stateq0. The second condition says that given each character of stringw, the machine will transition from state to state according to the transition functionδ. The last condition says that the machine acceptsw if the last input ofw causes the machine to halt in one of the accepting states. Otherwise, it is said that the automatonrejects the string. The set of strings thatM accepts is thelanguagerecognized byM and this language is denoted byL(M).

A deterministic finite automaton without accept states and without a starting state is known as atransition system orsemiautomaton.

For more comprehensive introduction of the formal definition seeautomata theory.

Example

[edit]

The following example is of a DFAM, with a binary alphabet, which requires that the input contains an even number of 0s.

Thestate diagram forM

M = (Q, Σ,δ,q0,F) where

0
1
S1S2S1
S2S1S2

The stateS1 represents that there has been an even number of 0s in the input so far, whileS2 signifies an odd number. A 1 in the input does not change the state of the automaton. When the input ends, the state will show whether the input contained an even number of 0s or not. If the input did contain an even number of 0s,M will finish in stateS1, an accepting state, so the input string will be accepted.

The language recognized byM is theregular language given by theregular expression(1*) (0 (1*) 0 (1*))*, where* is theKleene star, e.g.,1* denotes any number (possibly zero) of consecutive ones.

Variations

[edit]

Complete and incomplete

[edit]

According to the above definition, deterministic finite automata are alwayscomplete: they define from each state a transition for each input symbol.

While this is the most common definition, some authors use the term deterministic finite automaton for a slightly different notion: an automaton that definesat most one transition for each state and each input symbol; the transition function is allowed to bepartial.[5] When no transition is defined, such an automaton halts.

Local automata

[edit]

Alocal automaton is a DFA, not necessarily complete, for which all edges with the same label lead to a single vertex. Local automata accept the class oflocal languages, those for which membership of a word in the language is determined by a "sliding window" of length two on the word.[6][7]

AMyhill graph over an alphabetA is adirected graph withvertex setA and subsets of vertices labelled "start" and "finish". The language accepted by a Myhill graph is the set of directed paths from a start vertex to a finish vertex: the graph thus acts as an automaton.[6] The class of languages accepted by Myhill graphs is the class of local languages.[8]

Randomness

[edit]

When the start state and accept states are ignored, a DFA ofn states and an alphabet of sizek can be seen as adigraph ofn vertices in which all vertices havek out-arcs labeled1, ...,k (ak-out digraph). It is known that whenk ≥ 2 is a fixed integer, with high probability, the largeststrongly connected component (SCC) in such ak-out digraph chosen uniformly at random is of linear size and it can be reached by all vertices.[9] It has also been proven that ifk is allowed to increase asn increases, then the whole digraph has a phase transition for strong connectivity similar toErdős–Rényi model for connectivity.[10]

In a random DFA, the maximum number of vertices reachable from one vertex is very close to the number of vertices in the largestSCC with high probability.[9][11] This is also true for the largestinduced sub-digraph of minimum in-degree one, which can be seen as a directed version of1-core.[10]

Closure properties

[edit]
The upper left automaton recognizes the language of all binary strings containing at least one occurrence of "00". The lower right automaton recognizes all binary strings with an even number of "1". The lower left automaton is obtained as product of the former two, it recognizes the intersection of both languages.

If DFAs recognize the languages that are obtained by applying an operation on the DFA recognizable languages then DFAs are said to beclosed under the operation. The DFAs are closed under the following operations.

For each operation, an optimal construction with respect to the number of states has been determined instate complexity research.Since DFAs areequivalent tonondeterministic finite automata (NFA), these closures may also be proved using closure properties of NFA.

As a transition monoid

[edit]

A run of a given DFA can be seen as a sequence of compositions of a very general formulation of the transition function with itself. Here we construct that function.

For a given input symbolaΣ{\displaystyle a\in \Sigma }, one may construct a transition functionδa:QQ{\displaystyle \delta _{a}:Q\rightarrow Q} by definingδa(q)=δ(q,a){\displaystyle \delta _{a}(q)=\delta (q,a)} for allqQ{\displaystyle q\in Q}. (This trick is calledcurrying.) From this perspective,δa{\displaystyle \delta _{a}} "acts" on a state in Q to yield another state. One may then consider the result offunction composition repeatedly applied to the various functionsδa{\displaystyle \delta _{a}},δb{\displaystyle \delta _{b}}, and so on. Given a pair of lettersa,bΣ{\displaystyle a,b\in \Sigma }, one may define a new functionδ^ab=δaδb{\displaystyle {\widehat {\delta }}_{ab}=\delta _{a}\circ \delta _{b}}, where{\displaystyle \circ } denotes function composition.

Clearly, this process may be recursively continued, giving the following recursive definition ofδ^:Q×ΣQ{\displaystyle {\widehat {\delta }}:Q\times \Sigma ^{\star }\rightarrow Q}:

δ^(q,ϵ)=q{\displaystyle {\widehat {\delta }}(q,\epsilon )=q}, whereϵ{\displaystyle \epsilon } is the empty string and
δ^(q,wa)=δa(δ^(q,w)){\displaystyle {\widehat {\delta }}(q,wa)=\delta _{a}({\widehat {\delta }}(q,w))}, wherewΣ,aΣ{\displaystyle w\in \Sigma ^{*},a\in \Sigma } andqQ{\displaystyle q\in Q}.

δ^{\displaystyle {\widehat {\delta }}} is defined for all wordswΣ{\displaystyle w\in \Sigma ^{*}}. A run of the DFA is a sequence of compositions ofδ^{\displaystyle {\widehat {\delta }}} with itself.

Repeated function composition forms amonoid. For the transition functions, this monoid is known as thetransition monoid, or sometimes thetransformation semigroup. The construction can also be reversed: given aδ^{\displaystyle {\widehat {\delta }}}, one can reconstruct aδ{\displaystyle \delta }, and so the two descriptions are equivalent.

Advantages and disadvantages

[edit]

DFAs are one of the most practical models of computation, since there is a trivial linear time, constant-space,online algorithm to simulate a DFA on a stream of input. Also, there are efficient algorithms to find a DFA recognizing:

  • the complement of the language recognized by a given DFA.
  • the union/intersection of the languages recognized by two given DFAs.

Because DFAs can be reduced to acanonical form (minimal DFAs), there are also efficient algorithms to determine:

  • whether a DFA accepts any strings (Emptiness Problem)
  • whether a DFA accepts all strings (Universality Problem)
  • whether two DFAs recognize the same language (Equality Problem)
  • whether the language recognized by a DFA is included in the language recognized by a second DFA (Inclusion Problem)
  • the DFA with a minimum number of states for a particular regular language (Minimization Problem)

DFAs are equivalent in computing power tonondeterministic finite automata (NFAs). This is because, firstly any DFA is also an NFA, so an NFA can do what a DFA can do. Also, given an NFA, using thepowerset construction one can build a DFA that recognizes the same language as the NFA, although the DFA could have exponentially larger number of states than the NFA.[15][16] However, even though NFAs are computationally equivalent to DFAs, the above-mentioned problems are not necessarily solved efficiently also for NFAs. The non-universality problem for NFAs isPSPACE complete since there are small NFAs with shortest rejecting word in exponential size. A DFA is universal if and only if all states are final states, but this does not hold for NFAs. The Equality, Inclusion and Minimization Problems are also PSPACE complete since they require forming the complement of an NFA which results in an exponential blow up of size.[17]

On the other hand, finite-state automata are of strictly limited power in the languages they can recognize; many simple languages, including any problem that requires more than constant space to solve, cannot be recognized by a DFA. The classic example of a simply described language that no DFA can recognize is bracket orDyck language, i.e., the language that consists of properly paired brackets such as word "(()())". Intuitively, no DFA can recognize the Dyck language because DFAs are not capable of counting: a DFA-like automaton needs to have a state to represent any possible number of "currently open" parentheses, meaning it would need an unbounded number of states. Another simpler example is the language consisting of strings of the formanbn for some finite but arbitrary number ofa's, followed by an equal number ofb's.[18]

DFA identification from labeled words

[edit]
Main article:Induction of regular languages

Given a set ofpositive wordsS+Σ{\displaystyle S^{+}\subset \Sigma ^{*}} and a set ofnegative wordsSΣ{\displaystyle S^{-}\subset \Sigma ^{*}} one can construct a DFA that accepts all words fromS+{\displaystyle S^{+}} and rejects all words fromS{\displaystyle S^{-}}: this problem is calledDFA identification (synthesis, learning).Whilesome DFA can be constructed in linear time, the problem of identifying a DFA with the minimal number of states is NP-complete.[19]The first algorithm for minimal DFA identification has been proposed by Trakhtenbrot and Barzdin[20] and is called theTB-algorithm.However, the TB-algorithm assumes that all words fromΣ{\displaystyle \Sigma } up to a given length are contained in eitherS+S{\displaystyle S^{+}\cup S^{-}}.

Later, K. Lang proposed an extension of the TB-algorithm that does not use any assumptions aboutS+{\displaystyle S^{+}} andS{\displaystyle S^{-}}, theTraxbar algorithm.[21]However, Traxbar does not guarantee the minimality of the constructed DFA.In his work[19] E.M. Gold also proposed a heuristic algorithm for minimal DFA identification.Gold's algorithm assumes thatS+{\displaystyle S^{+}} andS{\displaystyle S^{-}} contain acharacteristic set of the regular language; otherwise, the constructed DFA will be inconsistent either withS+{\displaystyle S^{+}} orS{\displaystyle S^{-}}.Other notable DFA identification algorithms include the RPNI algorithm,[22] the Blue-Fringe evidence-driven state-merging algorithm,[23] and Windowed-EDSM.[24]Another research direction is the application ofevolutionary algorithms: the smart state labeling evolutionary algorithm[25] allowed to solve a modified DFA identification problem in which the training data (setsS+{\displaystyle S^{+}} andS{\displaystyle S^{-}}) isnoisy in the sense that some words are attributed to wrong classes.

Yet another step forward is due to application ofSAT solvers byMarjin J. H. Heule and S. Verwer: the minimal DFA identification problem is reduced to deciding the satisfiability of a Boolean formula.[26] The main idea is to build an augmented prefix-tree acceptor (atrie containing all input words with corresponding labels) based on the input sets and reduce the problem of finding a DFA withC{\displaystyle C} states tocoloring the tree vertices withC{\displaystyle C} states in such a way that when vertices with one color are merged to one state, the generated automaton is deterministic and complies withS+{\displaystyle S^{+}} andS{\displaystyle S^{-}}.Though this approach allows finding the minimal DFA, it suffers from exponential blow-up of execution time when the size of input data increases.Therefore, Heule and Verwer's initial algorithm has later been augmented with making several steps of the EDSM algorithm prior to SAT solver execution: the DFASAT algorithm.[27]This allows reducing the search space of the problem, but leads to loss of the minimality guarantee.Another way of reducing the search space has been proposed by Ulyantsev et al.[28] by means of new symmetry breaking predicates based on thebreadth-first search algorithm:the sought DFA's states are constrained to be numbered according to the BFS algorithm launched from the initial state. This approach reduces the search space byC!{\displaystyle C!} by eliminating isomorphic automata.

Equivalent models

[edit]

Read-only right-moving Turing machines

[edit]

Read-only right-moving Turing machines are a particular type ofTuring machine that only moves right; theseare almost exactly equivalent to DFAs.[29]The definition based on a singly infinite tape is a 7-tuple

M=Q,Γ,b,Σ,δ,q0,F,{\displaystyle M=\langle Q,\Gamma ,b,\Sigma ,\delta ,q_{0},F\rangle ,}

where

Q{\displaystyle Q} is a finite set ofstates;
Γ{\displaystyle \Gamma } is a finite set of thetape alphabet/symbols;
bΓ{\displaystyle b\in \Gamma } is theblank symbol (the only symbol allowed to occur on the tape infinitely often at any step during the computation);
Σ{\displaystyle \Sigma }, a subset ofΓ{\displaystyle \Gamma } not includingb, is the set ofinput symbols;
δ:Q×ΓQ×Γ×{R}{\displaystyle \delta :Q\times \Gamma \to Q\times \Gamma \times \{R\}} is afunction called thetransition function,R is a right movement (a right shift);
q0Q{\displaystyle q_{0}\in Q} is theinitial state;
FQ{\displaystyle F\subseteq Q} is the set offinal oraccepting states.

The machine always accepts a regular language. There must exist at least one element of the setF (aHALT state) for the language to be nonempty.

Example of a 3-state, 2-symbol read-only Turing machine

[edit]
Current stateACurrent stateBCurrent stateC
tape symbolWrite symbolMove tapeNext stateWrite symbolMove tapeNext stateWrite symbolMove tapeNext state
01RB1RA1RB
11RC1RB1NHALT
Q={A,B,C,HALT};{\displaystyle Q=\{A,B,C,{\text{HALT}}\};}
Γ={0,1};{\displaystyle \Gamma =\{0,1\};}
b=0{\displaystyle b=0}, "blank";
Σ={\displaystyle \Sigma =\varnothing }, empty set;
δ={\displaystyle \delta =} see state-table above;
q0=A{\displaystyle q_{0}=A}, initial state;
F={\displaystyle F=} the one element set of final states:{HALT}{\displaystyle \{{\text{HALT}}\}}.

See also

[edit]

Notes

[edit]
  1. ^abHopcroft, Motwani & Ullman 2006.
  2. ^McCulloch & Pitts 1943.
  3. ^Rabin & Scott 1959.
  4. ^Bai, Gina R.; Clee, Brian; Shrestha, Nischal; Chapman, Carl; Wright, Cimone; Stolee, Kathryn T. (2019)."Exploring tools and strategies used during regular expression composition tasks". In Guéhéneuc, Yann-Gaël; Khomh, Foutse; Sarro, Federica (eds.).Proceedings of the 27th International Conference on Program Comprehension, ICPC 2019, Montreal, QC, Canada, May 25-31, 2019. IEEE / ACM. pp. 197–208.doi:10.1109/ICPC.2019.00039.ISBN 978-1-7281-1519-1.
  5. ^Mogensen, Torben Ægidius (2011). "Lexical Analysis".Introduction to Compiler Design. Undergraduate Topics in Computer Science. London: Springer. p. 12.doi:10.1007/978-0-85729-829-4_1.ISBN 978-0-85729-828-7.
  6. ^abLawson 2004, p. 129.
  7. ^Sakarovitch 2009, p. 228.
  8. ^Lawson 2004, p. 128.
  9. ^abGrusho, A. A. (1973). "Limit distributions of certain characteristics of random automaton graphs".Mathematical Notes of the Academy of Sciences of the USSR.4:633–637.doi:10.1007/BF01095785.S2CID 121723743.
  10. ^abCai, Xing Shi; Devroye, Luc (October 2017). "The graph structure of a deterministic automaton chosen at random".Random Structures & Algorithms.51 (3):428–458.arXiv:1504.06238.doi:10.1002/rsa.20707.S2CID 13013344.
  11. ^Carayol, Arnaud; Nicaud, Cyril (February 2012).Distribution of the number of accessible states in a random deterministic automaton. STACS'12 (29th Symposium on Theoretical Aspects of Computer Science). Vol. 14. Paris, France. pp. 194–205.
  12. ^Hopcroft & Ullman 1979, pp. 59–60.
  13. ^abcRose, Gene F. (1968). "Closures which Preserve Finiteness in Families of Languages".Journal of Computer and System Sciences.2 (2):148–168.doi:10.1016/S0022-0000(68)80029-7.
  14. ^abSpanier, E. (1969). "Grammars and languages".American Mathematical Monthly.76 (4):335–342.doi:10.1080/00029890.1969.12000214.JSTOR 2316423.MR 0241205.
  15. ^Sakarovitch 2009, p. 105.
  16. ^Lawson 2004, p. 63.
  17. ^Esparza Estaun, Francisco Javier; Sickert, Salomon; Blondin, Michael (16 November 2016)."Operations and tests on sets: Implementation on DFAs"(PDF).Automata and Formal Languages 2017/18. Archived fromthe original(PDF) on 8 August 2018.
  18. ^Lawson 2004, p. 46.
  19. ^abGold, E. M. (1978). "Complexity of Automaton Identification from Given Data".Information and Control.37 (3):302–320.doi:10.1016/S0019-9958(78)90562-4.
  20. ^De Vries, A. (28 June 2014).Finite Automata: Behavior and Synthesis. Elsevier.ISBN 9781483297293.
  21. ^Lang, Kevin J. (1992). "Random DFA's can be approximately learned from sparse uniform examples".Proceedings of the fifth annual workshop on Computational learning theory - COLT '92. pp. 45–52.doi:10.1145/130385.130390.ISBN 089791497X.S2CID 7480497.
  22. ^Oncina, J.; García, P. (1992). "Inferring Regular Languages in Polynomial Updated Time".Pattern Recognition and Image Analysis. Series in Machine Perception and Artificial Intelligence. Vol. 1. pp. 49–61.doi:10.1142/9789812797902_0004.ISBN 978-981-02-0881-3.
  23. ^Lang, Kevin J.; Pearlmutter, Barak A.; Price, Rodney A. (1998). "Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm".Grammatical Inference(PDF). Lecture Notes in Computer Science. Vol. 1433. pp. 1–12.doi:10.1007/BFb0054059.ISBN 978-3-540-64776-8.
  24. ^Adriaans, Pieter; Fernau, Henning; Zaanen, Menno van (23 September 2002).Beyond EDSM | Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications. Springer. pp. 37–48.ISBN 9783540442394.
  25. ^Lucas, S.M.; Reynolds, T.J. (2005). "Learning deterministic finite automata with a smart state labeling evolutionary algorithm".IEEE Transactions on Pattern Analysis and Machine Intelligence.27 (7):1063–1074.doi:10.1109/TPAMI.2005.143.PMID 16013754.S2CID 14062047.
  26. ^Heule, M. J. H. (2010). "Exact DFA Identification Using SAT Solvers".Grammatical Inference: Theoretical Results and Applications. Grammatical Inference: Theoretical Results and Applications. ICGI 2010. Lecture Notes in Computer Science. Lecture Notes in Computer Science. Vol. 6339. pp. 66–79.doi:10.1007/978-3-642-15488-1_7.ISBN 978-3-642-15487-4.
  27. ^Heule, Marijn J. H.; Verwer, Sicco (2013)."Software model synthesis using satisfiability solvers".Empirical Software Engineering.18 (4):825–856.doi:10.1007/s10664-012-9222-z.hdl:2066/103766.S2CID 17865020.
  28. ^Ulyantsev, Vladimir; Zakirzyanov, Ilya; Shalyto, Anatoly (2015). "BFS-Based Symmetry Breaking Predicates for DFA Identification".Language and Automata Theory and Applications. Lecture Notes in Computer Science. Vol. 8977. pp. 611–622.doi:10.1007/978-3-319-15579-1_48.ISBN 978-3-319-15578-4.
  29. ^Davis, Martin; Ron Sigal; Elaine J. Weyuker (1994).Second Edition: Computability, Complexity, and Languages and Logic: Fundamentals of Theoretical Computer Science (2nd ed.). San Diego: Academic Press, Harcourt, Brace & Company.ISBN 0-12-206382-1.

References

[edit]

Further reading

[edit]
Each category of languages, except those marked by a*, is aproper subset of the category directly above it.Any language in each category is generated by a grammar and by an automaton in the category in the same line.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Deterministic_finite_automaton&oldid=1285525393"
Category:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp