tc39/proposal-regexp-buffer-boundariesPublic

NotificationsYou must be signed in to change notification settings
Fork3
Star11

Regular Expression Buffer Boundaries for ECMAScript

tc39.es/proposal-regexp-buffer-boundaries

License

BSD-3-Clause license

11 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
.vscode		.vscode
spec		spec
.gitattributes		.gitattributes
.gitignore		.gitignore
.yo-rc.json		.yo-rc.json
LICENSE		LICENSE
README.md		README.md
gulpfile.js		gulpfile.js
package-lock.json		package-lock.json
package.json		package.json

Repository files navigation

Regular Expression Buffer Boundaries for ECMAScript

This proposal seeks to introduce\A and\z character escapes to Unicode-mode regular expressions as synonyms for^ and$ that are not affected by them (multiline) flag.

Status

Stage: 2
Champion: Ron Buckton (@rbuckton)

For detailed status of this proposal seeTODO, below.

Authors

Ron Buckton (@rbuckton)

Motivations

NOTE: Seehttps://github.com/rbuckton/proposal-regexp-features for an overview ofhow this proposal fits into other possible future features for Regular Expressions.

Buffer Boundaries are a common feature across a wide array of regular expression engines thatallow you to match the start or end of the entire input regardless of whether them (multiline) flaghas been set. Buffer Boundaries also allow you to match the start/end of a lineand the start/end ofthe input in a single RegExp using them flag.

While its possible to emulate\A and\z using existing patterns, the alternatives are harder tofar read, and require a more comprehensive working understanding of regular experssions to interpret.

For example, compare the following approaches:

// emulate `m`-mode `^` outside of `m`-mode:consta=/^foo|(?<=^|[\u000A\u000D\u2028\u2029])bar/u;// emulate non-`m`-mode `^` inside of `m`-mode using modifiers (proposed):constb=/(?-m:^)foo|^bar/mu;// using `\A`:constc=/\Afoo|^bar/mu;

In the example above, it is far less likely that a reader will readily understand the expression inexample (a). Not only is the content of the regular expression much harder to read, but understandingits purpose requires interpreting howsix different features of regular expressions interact:grouping, positive lookbehind, the^ metacharacer, disjunctions, character classes, and unicode escapes.

Example (b) is a an improvement, but still requires the reader to visually balance the parentheses aswell as to interpret howfour different regular expression features interact: grouping, modifiers(proposed), them flag, and the^ metacharacter.

In comparison, example (c) is far easier to read. It consists of a terse escape sequence consistingof only two characters (\A), which makes it far easier to distinguish between special pattern syntaxand plain text segments likefoo andbar.

The\A and\z escapes have broad support across multiple other languages and regular expressionengines. As a result it has the benefit of extensive existing documentation online, includingWikipedia, numerous tutorial websites, aswell as the documentation from other languages. This significantly lessens the learning curve for\Aover its alternatives.

Prior Art

Seehttps://rbuckton.github.io/regexp-features/features/buffer-boundaries.html for additional information.

Syntax

Buffer boundaries are similar to the^ and$ anchors, except that they are not affected by them (multiline) flag:

\A — Matches the start of the input.
\z — Matches the end of the input.
~~\Z — A zero-width assertion consisting of an optional newline at the end of the buffer. Equivalent to(?=\R?\z).~~

NOTE: Requires theu orv flag, as\A,\z, and\Z are currently just escapes forA,z andZ without theu orv flag.

NOTE: Not supported inside of a character class.

NOTE: The\Z assertion is no longer being considered as part of this proposal as of December 15th, 2021, but hasbeen reserved for possible future use.

For more information about thev flag, seehttps://github.com/tc39/proposal-regexp-set-notation.

~~For more information about the\R escape sequence, seehttps://github.com/tc39/proposal-regexp-r-escape.~~

Examples

// without buffer boundariesconstpattern=String.raw`^foo$`;constre1=newRegExp(pattern,"u");re1.test("foo");// truere1.test("foo\nbar");// falseconstre2=newRegExp(pattern,"um");re1.test("foo");// truere1.test("foo\nbar");// true// with buffer boundariesconstpattern=String.raw`\Afoo\z`;constre1=newRegExp(pattern,"u");re1.test("foo");// truere1.test("foo\nbar");// falseconstre2=newRegExp(pattern,"um");re1.test("foo");// truere1.test("foo\nbar");// false// mixing buffer boundaries and anchorsconstre=/\Afoo|^bar$|baz\z/um;re.test("foo");// truere.test("foo\n");// truere.test("\nfoo");// falsere.test("bar");// truere.test("bar\n");// truere.test("\nbar");// truere.test("baz");// truere.test("baz\n");// falsere.test("\nbaz");// true

History

October 28, 2021 — Proposed for Stage 1 (slides)
- Outcome: Advanced to Stage 1
December 15, 2021 — Proposed for Stage 2 (slides)
- Outcome:\A and\z advanced to Stage 2 (\Z did not advance, but will be reserved)
- Stage 2 Reviewers: Richard Gibson, Waldemar Horwat

TODO

The following is a high-level list of tasks to progress through each stage of theTC39 proposal process:

Stage 1 Entrance Criteria

Identified a "champion" who will advance the addition.
Prose outlining the problem or need and the general shape of a solution.
Illustrativeexamples of usage.
~~High-levelAPI.~~

Stage 2 Entrance Criteria

Initial specification text.
~~Transpiler support (Optional).~~

Stage 3 Entrance Criteria

Complete specification text.
Designated reviewers havesigned off on the current spec text.
The ECMAScript editor hassigned off on the current spec text.

Stage 4 Entrance Criteria

Test262 acceptance tests have been written for mainline usage scenarios andmerged.
Two compatible implementations which pass the acceptance tests:[1],[2].
Apull request has been sent to tc39/ecma262 with the integrated spec text.
The ECMAScript editor has signed off on thepull request.

About

Regular Expression Buffer Boundaries for ECMAScript

tc39.es/proposal-regexp-buffer-boundaries

Resources

Readme

License

BSD-3-Clause license

Code of conduct

Contributing

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Regular Expression Buffer Boundaries for ECMAScript

Status

Authors

Motivations

Prior Art

Syntax

Examples

History

TODO

Stage 1 Entrance Criteria

Stage 2 Entrance Criteria

Stage 3 Entrance Criteria

Stage 4 Entrance Criteria

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

Movatterモバイル変換

License

tc39/proposal-regexp-buffer-boundaries

Folders and files

Latest commit

History

Repository files navigation

Regular Expression Buffer Boundaries for ECMAScript

Status

Authors

Motivations

Prior Art

Syntax

Examples

History

TODO

Stage 1 Entrance Criteria

Stage 2 Entrance Criteria

Stage 3 Entrance Criteria

Stage 4 Entrance Criteria

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages