I wrote a toy s-expression parser, and I'd like to know if I can make it more Rusty.I'm not terribly worried about the functionality. It's only a Rust exercise for me.
enum Token { Number(u64), Identifier(String), ParOpen, ParClose,}fn tokenize (source: &str) -> Result<Vec<Token>, String> { let mut tokens: Vec<Token> = Vec::new(); let regex = Regex::new(r"[0-9][-+*?!_a-zA-Z0-9]*|[-+*?!_a-zA-Z][-+*?!_a-zA-Z0-9]*|\(|\)").unwrap(); for candidate_match in regex.find_iter(source) { let candidate_string = candidate_match.as_str(); match candidate_string.chars().next().unwrap() { '-' | '+' | '*' | '?' | '!' | '_' | 'a'..='z' | 'A'..='Z' => tokens.push(Token::Identifier(candidate_string.to_owned())), '0'..='9' => { match candidate_string.parse::<u64>() { Ok(n) => tokens.push(Token::Number(n)), Err(_) => return Err(format!("Invalid number: {}", candidate_string)), } }, '(' => tokens.push(Token::ParOpen), ')' => tokens.push(Token::ParClose), _ => (), } } Ok(tokens)}enum Node { Number(u64), Identifier(String), List(Vec<Node>),}fn parse (tokens: &[Token]) -> Result<Node, String> { let mut index: usize = 0; let mut stack: Vec<Vec<Node>> = vec![Vec::new()]; loop { match &tokens[index] { Token::Number(value) => { stack.last_mut().unwrap().push(Node::Number(value.clone())); }, Token::Identifier(name) => { stack.last_mut().unwrap().push(Node::Identifier(name.clone())); } Token::ParOpen => { stack.push(Vec::new()); }, Token::ParClose => { if stack.len() <= 1 { return Err("Unexpected ')'".to_owned()); } let list = Node::List(stack.pop().unwrap()); stack.last_mut().unwrap().push(list); }, } if index >= tokens.len() - 1 { if stack.len() > 1 { return Err("Expected ')'".to_owned()); } return Ok(Node::List(stack.pop().unwrap())); } index += 1; }}fn stringify (node: &Node) -> String { fn stringify (node: &Node) -> String { match node { Node::Number(value) => value.to_string(), Node::Identifier(name) => name.clone(), Node::List(entries) => format!("({})", entries.iter().map(stringify).collect::<Vec<String>>().join(" ")), } } match node { Node::Number(value) => value.to_string(), Node::Identifier(name) => name.clone(), Node::List(entries) => entries.iter().map(stringify).collect::<Vec<String>>().join(" "), }}https://godbolt.org/z/fP6fd41Tr
I am not concerned whether this is a good s-expression parser or not. This is just Rust exercise for me, not a parsing one. I am interested in Rust-related feedback.
I'm a bit new to the language, and my code looks verbose to me. Perhaps I can simplify this? Any of it? Am I copying data around without good reason? Is there a nice way to compile that regex once and store it somewhere?
1 Answer1
Your questions
Is there a nice way to compile that regex once and store it somewhere?
Yep,OnceLock:
use std::sync::OnceLock;static REGEX: OnceLock<Regex> = OnceLock::new();fn tokenize (source: &str) -> Result<Vec<Token>, String> { let mut tokens: Vec<Token> = Vec::new(); let regex = REGEX.get_or_init( || Regex::new(r"[0-9][-+*?!_a-zA-Z0-9]*|[-+*?!_a-zA-Z][-+*?!_a-zA-Z0-9]*|\(|\)").unwrap() ); // ... Ok(vec![])}Parsing approach
You chose regex as a way to find the relevant symbols. It may fit your design well, we don't know the context, but it's quite unusual for a parser to simply discard anything it does not understand.
For example, you will parsea & b as two identifiersa andb, ignoring the ampersand altogether. Again, maybe that's what you really want, but it would be confusing for me.
The more standard approach to manual (without defining a grammar) parsing would be to read characters one by one, rejecting invalid ones immediately and classifying the rest as expected. You'll have to record current state to know, for example, whether a symbola continues a previous identifier or starts a new one, but that can be implemented wit a simple enum, quite doable for very simple grammars.
Strings as errors
Your parser returns errors as human-readable strings. It would make interfacing with your parser extremely challenging for other programs. Consider defining aParseError enum instead and implementingDisplay for it for human-readable output. Or just usethiserror to to the same in declarative fashion, reducing boilerplate to the minimum.
Error context
It would be nice to say "Unexpected ')' at char 12" instead of just "Unexpected ')'" - precise location can help your users find the unbalanced parenthesis much easier.
Run some checks
Please consider writing some basic tests for your parser. Rust ships with a "testing framework" out of the box, so you can just do
#[test]fn test_something() { // ...}and thencargo test.
Use your tools
Cargo ships with two amazing tools:rustfmt andclippy. Your code is obviously manually formatted, and this will be a problem when many developers work on a codebase. Aim for a standard style that can be easily enforced, this will reduce your merge conflicts and make it easy for everyone to follow the code.
clippy is a linter - built-in yet powerful. Running it on your Godbolt example (you can even do that online in the official playground at play.rust-lang.org), I get
warning: using `clone` on type `u64` which implements the `Copy` trait --> src/main.rs:51:61 |51 | stack.last_mut().unwrap().push(Node::Number(value.clone())); | ^^^^^^^^^^^^^ help: try dereferencing it: `*value` | = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#clone_on_copy = note: `#[warn(clippy::clone_on_copy)]` on by defaultwarning: this expression creates a reference which is immediately dereferenced by the compiler --> src/main.rs:116:20 |116 | match tokenize(&s1) { | ^^^ help: change this to: `s1` | = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#needless_borrow = note: `#[warn(clippy::needless_borrow)]` on by defaultconsider addressing both warnings (using*value and removing the&, resp.). You can also make clippymuch stricter with smth like#![warn(clippy::all, clippy::pedantic)], that will report a few more style-ish issues:
warning: consider adding a `;` to the last statement for consistent formatting --> src/main.rs:25:17 |25 | tokens.push(Token::Identifier(candidate_string.to_owned())) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: add a `;` here: `tokens.push(Token::Identifier(candidate_string.to_owned()));` | = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#semicolon_if_nothing_returnednote: the lint level is defined here --> src/main.rs:1:22 | 1 | #![warn(clippy::all, clippy::pedantic)] | ^^^^^^^^^^^^^^^^ = note: `#[warn(clippy::semicolon_if_nothing_returned)]` implied by `#[warn(clippy::pedantic)]`warning: variables can be used directly in the `format!` string --> src/main.rs:29:38 |29 | Err(_) => return Err(format!("Invalid number: {}", candidate_string)), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#uninlined_format_args = note: `#[warn(clippy::uninlined_format_args)]` implied by `#[warn(clippy::pedantic)]`help: change this to |29 - Err(_) => return Err(format!("Invalid number: {}", candidate_string)),29 + Err(_) => return Err(format!("Invalid number: {candidate_string}")), |warning: variables can be used directly in the `format!` string --> src/main.rs:123:28 |123 | Err(reason) => println!("Error: {}", reason), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#uninlined_format_argshelp: change this to |123 - Err(reason) => println!("Error: {}", reason),123 + Err(reason) => println!("Error: {reason}"), |warning: variables can be used directly in the `format!` string --> src/main.rs:125:24 |125 | Err(reason) => println!("Error: {}", reason), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.91.0/index.html#uninlined_format_argshelp: change this to |125 - Err(reason) => println!("Error: {}", reason),125 + Err(reason) => println!("Error: {reason}"), |```- 1\$\begingroup\$+1 for OnceLock thanks! I'm not trying to make a good s-expression parser, just good <anything> Rust code.\$\endgroup\$adrianton3– adrianton32025-10-30 19:43:07 +00:00CommentedOct 30 at 19:43
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.
