Logos vs Nom: Choosing a Lexical Analyzer for Rust
Logos is a high-performance lexical analyzer library for Rust, designed to tokenize input streams with minimal overhead and maximum clarity. It leverages Rust’s type system and macro expansion to generate efficient, compile-time optimized tokenizers using a declarative syntax.
At its core, Logos defines token types as Rust enums annotated with attributes that specify matching patterns. These patterns can be literal strings, regular expressions, or special directives like skip to ignore whitespace. The generated lexer operates as a Deterministic Finite Automaton (DFA), enabling near-optimal scanning speed with predictable memory usage.
Basic Logos Example
Here’s a tokenizer for arithmetic expressions:
use logos::Logos;
#[derive(Logos, Debug, PartialEq)]
enum Token {
#[token("+")]
Add,
#[token("-")]
Subtract,
#[token("*")]
Multiply,
#[token("/")]
Divide,
#[regex(r"[0-9]+")]
Integer,
#[regex(r"[ \t\r\n]+", logos::skip)]
Whitespace,
#[error]
Invalid,
}
fn main() {
let source = "10 + 25 * 3 - 8";
let mut lexer = Token::lexer(source);
while let Some(token) = lexer.next() {
println!("{:?}", token);
}
}
Output:
Integer
Add
Integer
Multiply
Integer
Subtract
Integer
Notice how whitespace is automatically discarded via the logos::skip directive, and malformed input triggers the Invalid variant without panicking.
Comparison with Nom
Logos excels when you need a fast, predictable way to convert raw text into discrete tokens. Nom, by contrast, allows you to define grammars directly in code using functional combinators like recognize, many1, or delimited, enabling full parser construction without external tools.
Integration with LALRPOP
LALRPOP is a parser generator that accepts a grammar specification and outputs Rust code for an LALR(1) parser. It does not include a lexer. Logos is a natural pairing because:
- Logos produces a stream of typed tokens that LALRPOP expects.
- Logos’s error tolerance allows malformed input to be gracefully handled before reaching the grammar layer.
- Both tools are zero-dependency and compile-time efficient.
A typical workflow:
- Define tokens using Logos (e.g.,
Ident,Number,Keyword). - Use the same token enum in LALRPOP’s grammar file:
use crate::Token;
grammar;
Pub Expr: i32 = {
<n:Integer> => n,
<left:Expr> "+" <right:Expr> => left + right,
<left:Expr> "*" <right:Expr> => left * right,
};
Then feed the Logos lexer’s output directly into the LALRPOP parser:
let tokens: Vec<Token> = Token::lexer(source).collect();
let parser = ExprParser::new();
let result = parser.parse(tokens).unwrap();
When to Choose Which?
Use Logos when your goal is to separate concerns: tokenize cleanly, then pass tokens to a dedicated parser like LALRPOP or even a hand-written recursive descent parser. This separation improves maintainability and enables reuse across multiple parser backends.
Use Nom if you want to define the entire parsing pipeline — from character-by-character scanning to AST construction — in a single, cohesive codebase. Nom’s combinators are powerful for languages with context-sensitive syntax or when you need fine-grained control over backtracking and error messages.
For most embedded DSLs or configuration parsers, Logos + LALRPOP offers the best balance of speed, clarity, and separation of concerns. For complex programming languages or where parsing logic is tightly coupled with lexical structure, Nom may be preferable.