Case-Insensitive SQL Keyword Lexer with Nom
When writing a SQL tokenizer with the nom parser-combinator library, you often need to recognize reserved words regardless of their letter casing. The naive approach that uses tag fails on inputs such as SeLeCt or from. Replacing every tag("LITERAL") with tag_no_case("LITERAL") solves the problem while keeping the emitted token in canonical uppercase form.
use nom::{
branch::alt,
bytes::complete::tag_no_case,
combinator::{map, value},
IResult,
};
#[derive(Debug, PartialEq, Eq)]
pub enum SqlToken {
Reserved(String),
Ident(String),
Whitespace,
}
fn reserved_keyword(src: &str) -> IResult<&str, SqlToken> {
alt((
value(SqlToken::Reserved("SELECT".into()), tag_no_case("SELECT")),
value(SqlToken::Reserved("FROM".into()), tag_no_case("FROM")),
value(SqlToken::Reserved("WHERE".into()), tag_no_case("WHERE")),
value(SqlToken::Reserved("ON".into()), tag_no_case("ON")),
value(SqlToken::Reserved("ROWS".into()), tag_no_case("ROWS")),
value(SqlToken::Reserved("COLUMNS".into()), tag_no_case("COLUMNS")),
))(src)
}
The tag_no_case parser performs a byte-wise case-insensitive comparison against the input slice, so SELECT, select, and Select all succeed and yield SqlToken::Reserved("SELECT").
To keep the lexer tidy, wrap the list of keywords in a helper macro that automatically emits the uppercase string:
macro_rules! kw {
($i:expr, $kw:expr) => {
value(SqlToken::Reserved($kw.to_uppercase()), tag_no_case($kw))
};
}
fn reserved_keyword(src: &str) -> IResult<&str, SqlToken> {
alt((
kw!(src, "SELECT"),
kw!(src, "FROM"),
kw!(src, "WHERE"),
kw!(src, "ON"),
kw!(src, "ROWS"),
kw!(src, "COLUMNS"),
))
}
With this change, the lexer accepts any mixture of upper- and lowercase letters for SQL keywords while producing consistent tokens downstream.