You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pyparsing/ai/best_practices.md
+30-23Lines changed: 30 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,61 +8,68 @@ when generating Python code using pyparsing.
8
8
- Before developing the pyparsing expressions, define a Backus-Naur Form definition and save this in docs/grammar.md. Update this document as changes are made in the parser.
9
9
10
10
##Implementing
11
-
- Import pyparsing using "import pyparsing as pp", and use that for all pyparsing references.
12
-
- If referencing anything from pyparsing.common, follow the pyparsing import with "ppc = pp.common" and use ppc as the namespace to access pyparsing.common; same for pyparsing.unicode.
13
-
- When writing parsers that contain recursive elements (using Forward() or infix_notation()), immediately enable packrat parsing for performance:`pp.ParserElement.enable_packrat()` (call this right after importing pyparsing). Seehttps://pyparsing-docs.readthedocs.io/en/latest/HowToUsePyparsing.html.
11
+
- Import pyparsing using`import pyparsing as pp`, and use that for all pyparsing references.
12
+
- If referencing names from`pyparsing.common`, follow the pyparsing import with "ppc = pp.common" and use`ppc` as the namespace to access`pyparsing.common`.
13
+
- If referencing names from`pyparsing.unicode`, follow the pyparsing import with "ppu = pp.unicode" and use`ppu` as the namespace to access`pyparsing.unicode`.
14
+
- When writing parsers that contain recursive elements (using`Forward()` or`infix_notation()`), immediately enable packrat parsing for performance:`pp.ParserElement.enable_packrat()` (call this right after importing pyparsing). Seehttps://pyparsing-docs.readthedocs.io/en/latest/HowToUsePyparsing.html.
14
15
- For recursive grammars, define placeholders with`pp.Forward()` and assign later using the`<<=` operator; give Forwards meaningful names with`set_name()` to improve errors.
15
-
- Use PEP8 method and argument names in the pyparsing API ("parse_string", not"parseString").
16
+
- Use PEP8 method and argument names in the pyparsing API (`parse_string`, not`parseString`).
16
17
- Do not include expressions for matching whitespace in the grammar. Pyparsing skips whitespace by default.
17
18
- For line-oriented grammars where newlines are significant, set skippable whitespace to just spaces/tabs early:`pp.ParserElement.set_default_whitespace_chars(" \t")`, and define`NL = pp.LineEnd().suppress()` to handle line ends explicitly.
18
19
- Prefer operator forms for readability: use +, |, ^,~, etc., instead of explicit And/MatchFirst/Or/Not classes (see Usage notes inhttps://pyparsing-docs.readthedocs.io/en/latest/HowToUsePyparsing.html).
19
-
- Use set_name() on all major grammar elements to support railroad diagramming and better error/debug output.
20
+
- Use`set_name()` on all major grammar elements to support railroad diagramming and better error/debug output.
20
21
- The grammar should be independently testable, without pulling in separate modules for data structures, evaluation, or command execution.
21
22
- Use results names for robust access to parsed data fields; results names should be valid Python identifiers to support attribute-style access on returned ParseResults.
22
23
- Results names should take the place of numeric indexing into parsed results in most places.
23
-
- Define results names using call format not set_results_name(), example: full_name = Word(alphas)("first_name") + Word(alphas)("last_name").
24
-
- Prefer Keyword over Literal for reserved words to avoid partial matches (e.g., Keyword("for") will not match the leading "for" in "format").
24
+
- Define results names using call format not`set_results_name()`, example:`full_name = Word(alphas)("first_name") + Word(alphas)("last_name")`
25
+
- If adding results name to an expression that is contains one more sub-expressions with results names, the expression must be inclused in a Group.
26
+
- Prefer`Keyword` over`Literal` for reserved words to avoid partial matches (e.g.,`Keyword("for")` will not match the leading "for" in "format").
25
27
- Use`pp.CaselessKeyword`/`pp.CaselessLiteral` when keywords should match regardless of case.
26
-
- When the full input must be consumed, call parse_string with parse_all=True.
27
-
- If the grammar must handle comments, define an expression for them and use the ignore() method to skip them.
28
+
- When the full input must be consumed, call`parse_string` with`parse_all=True`.
29
+
- If the grammar must handle comments, define an expression for them and use the`ignore()` method to skip them.
28
30
- Prefer built-ins like`pp.cpp_style_comment` and`pp.python_style_comment` for common comment syntaxes.
29
-
- Use pyparsingGroups to organize sub-expressions. Groups are also important for preserving results names when a sub-expression is used in a OneOrMore or ZeroOrMore expression.
31
+
- Use pyparsing`Group` to organize sub-expressions. Groups are also important for preserving results names when a sub-expression is used in a`OneOrMore` or`ZeroOrMore` expression.
30
32
- Suppress punctuation tokens to keep results clean; a convenient pattern is`LBRACK, RBRACK, LBRACE, RBRACE, COLON = pp.Suppress.using_each("[]{}:")`.
31
33
- For comma-separated sequences, prefer`pp.DelimitedList(...)`; wrap with`pp.Optional(...)` to allow empty lists or objects where appropriate.
32
34
- For helper sub-expressions used only to build larger expressions, consider`set_name(None)` to keep result dumps uncluttered.
33
-
- Use pyparsing Each() to define a list of elements that may occur in any order.
35
+
- Use pyparsing`Each()` to define a list of elements that may occur in any order.
34
36
- The '&' operator is the operator form of Each and is often more readable when combining order-independent parts.
35
37
- Use parse actions to do parse-time conversion of data from strings to useful data types.
36
38
- Use objects defined in pyparsing.common for common types like integer, real — these already have their conversion parse actions defined.
37
39
- For quoted strings, use`pp.dbl_quoted_string().set_parse_action(pp.remove_quotes)` to unquote automatically.
38
-
- Map reserved words to Python constantsusing`pp.Keyword("true").set_parse_action(pp.replace_with(True))` (and similarly for false/null/etc.).
40
+
- Map reserved words to Python constantsper this example for parsing "true" to auto-convert to a Python True:`pp.Keyword("true").set_parse_action(pp.replace_with(True))` (and similarly for false/null/etc.).
39
41
- When you want native Python containers from the parse, use`pp.Group(..., aslist=True)` for lists and`pp.Dict(..., asdict=True)` for dict-like data.
40
42
- Use "using_each" with a list of keywords to define keyword constants, instead of separate assignments.
41
43
- Choose the appropriate matching method:
42
-
- parse_string() parses from the start; search_string() searches anywhere in the text; scan_string() yields all matches with positions.
44
+
-`parse_string()` parses from the start
45
+
-`search_string()` searches anywhere in the text
46
+
-`scan_string()` yields all matches with positions
47
+
-`transform_string()` is a convenience wrapper around`scan_string` to apply filters or transforms defined in parse actions, to perform batch transforms or conversions of expressions within a larger body of text
43
48
- For line suffixes or directives, combine lookahead and slicing helpers:`pp.FollowedBy(...)` with`pp.rest_of_line`; when reusing a base expression with a different parse action, call`.copy()` before applying the new action to avoid side effects.
44
49
- When defining a parser to be used in a REPL:
45
-
- add pyparsing Tag() elements of the form`Tag("command", <command-name>)` to each command definition to support model construction from parsed commands.
50
+
- add pyparsing`Tag()` elements of the form`Tag("command", <command-name>)` to each command definition to support model construction from parsed commands.
46
51
- define model classes using dataclasses, and use the "command" attribute in the parsed results to identify which model class to create. The model classes can then be used to construct the model from the ParseResults returned by parse_string(). Define the models in a separate parser_models.py file.
47
52
- If defining the grammar as part of a Parser class, only the finished grammar needs to be implemented as an instance variable.
48
-
- ParseResults support "in" testing for results names. Use "in" tests for the existence of results names, not hasattr().
53
+
-`ParseResults` support "in" testing for results names. Use "in" tests for the existence of results names, not`hasattr()`.
49
54
- Avoid left recursion where possible. If you must support left-recursive grammars, enable it with`pp.ParserElement.enable_left_recursion()` and do not enable packrat at the same time (these modes are incompatible).
50
55
- Use`pp.SkipTo` as a skipping expression to skip over arbitrary content.
51
56
- For example,`pp.SkipTo(pp.LineEnd())` will skip over all content until the end of the line; add a stop_on argument to SkipTo to stop skipping when a particular string is matched.
52
57
- Use`...` in place of simple SkipTo(expression)
53
58
54
59
##Testing
55
-
- Use the pyparsing ParserElement.run_tests method to run mini validation tests.
56
-
- You can add comments starting with "#" within the string passed to run_tests to document the individual test cases.
57
-
- Prefer parse_string(parse_all=True) in tests that should consume the entire input.
60
+
- Use the pyparsing`ParserElement.run_tests` method to run mini validation tests.
61
+
- Pass a single multiline string to`run_tests` to test the parser on multiple test input strings, each line is a separate test.
62
+
- You can add comments starting with "#" within the string passed to`run_tests` to document the individual test cases.
63
+
- To pass test input strings that span multiple lines, pass the test input strings as a list of strings.
64
+
- Pass`parse_all=True` to`run_tests` to test that the entire input is consumed.
58
65
- When generating unit tests for the parser:
59
66
- generate tests that include presence and absence of optional elements
60
67
- use the methods in the mixin class pyparsing.testing.TestParseResultsAsserts to easily define expression, test input string, and expected results
61
68
- do not generate tests for invalid data
62
69
63
70
##Debugging
64
-
- If troubleshooting parse actions, use pyparsing's trace_parse_action decorator to echo arguments and return value
65
-
- During development, call`pp.autoname_elements()` to auto-assign names to unnamed expressions to improve dump() and error messages.
66
-
- Sub-expressions can be tested in isolation using ParserElement.matches()
67
-
- When defined out of order, Literals can mistakenly match fragments: Literal("for") will match the leading "for" in "format". Can be corrected by using Keyword instead of Literal.
68
-
- Dump the parsed results using ParseResults.dump(),ParseResults.pprint(), or repr(ParseResults).
71
+
- If troubleshooting parse actions, use pyparsing's`trace_parse_action` decorator to echo arguments and return value
72
+
- During development, call`pp.autoname_elements()` to auto-assign names to unnamed expressions to improve`dump()` and error messages.
73
+
- Sub-expressions can be tested in isolation using`ParserElement.matches()`
74
+
- When defined out of order, Literals can mistakenly match fragments:`Literal("for")` will match the leading "for" in "format". Can be corrected by using`Keyword` instead of`Literal`.
75
+
- Dump the parsed results using`ParseResults.dump()`,`ParseResults.pprint()`, or`repr(ParseResults)`.