@@ -438,3 +438,36 @@ BOS/BOL/EOS/EOL adjacent to the pre-state and post-state. So a finished
438438NFA for a pattern without anchors or adjacent-character constraints will
439439have pre-state outarcs for RAINBOW (all possible character colors) as well
440440as BOS and BOL, and likewise post-state inarcs for RAINBOW, EOS, and EOL.
441+ Also note that LACON arcs will never connect to the pre-state
442+ or post-state.
443+
444+
445+ Look-around constraints (LACONs)
446+ --------------------------------
447+
448+ The regex compiler doesn't have much intelligence about LACONs; it just
449+ constructs a sub-NFA representing the pattern that the constraint says to
450+ match or not match, and puts a LACON arc referencing that sub-NFA into the
451+ main NFA. At runtime, the executor applies the sub-NFA at each point in
452+ the string where the constraint is relevant, and then traverses or doesn't
453+ traverse the arc. ("Traversal" means including the arc's to-state in the
454+ set of NFA states that are considered active at the next character.)
455+
456+ The actual basic matching cycle of the executor is
457+ 1. Identify the color of the next input character, then advance over it.
458+ 2. Apply the DFA to follow all the matching "plain" arcs of the NFA.
459+ (Notionally, the previous DFA state represents the set of states the
460+ NFA could have been in before the character, and the new DFA state
461+ represents the set of states the NFA could be in after the character.)
462+ 3. If there are any LACON arcs leading out of any of the new NFA states,
463+ apply each LACON constraint starting from the new next input character
464+ (while not actually consuming any input). For each successful LACON,
465+ add its to-state to the current set of NFA states. If any such
466+ to-state has outgoing LACON arcs, process those in the same way.
467+ (Mathematically speaking, we compute the transitive closure of the
468+ set of states reachable by successful LACONs.)
469+
470+ Thus, LACONs are always checked immediately after consuming a character
471+ via a plain arc. This is okay because the NFA's "pre" state only has
472+ plain out-arcs, so we can always consume a character (possibly a BOS
473+ pseudo-character as described above) before we need to worry about LACONs.