[antlr-interest] Parsing quoted phrases and non-quoted keywords
Gavin Lambert
antlr at mirality.co.nz
Fri Jul 31 13:21:14 PDT 2009
At 04:31 1/08/2009, Scott Van Wart wrote:
>1) When antlr gives me the quoted string, I lose the whitespace
>associated with it, which is significant for me only in a quoted
>string. "foo bar" becomes <">, <foo>, <bar> and <">. So if
>I'm searching, say, a database, and the amount of whitespace is
>significant in a column (not that this isn't a silly idea), then
>I'm out of luck.
[...]
> DOUBLE_QUOTE='"';
Remove this.
> quoted_value : DOUBLE_QUOTE ( options {greedy=false;} : . )*
>DOUBLE_QUOTE ;
Make this a lexer rule (QUOTED_VALUE). See the example string
rule in the wiki.
> NQUOTED_VALUE : ~( INCLUSION | EXCLUSION | DOUBLE_QUOTE |
>LEFT_SQB
>| RIGHT_SQB | ' ' | '\r' | '\t' | '\u000C' | '\n' )* ;
You must at least use + here, not *. (It's very very bad to
create a lexer rule that can successfully match zero characters.)
Another alternative here is to just use this instead:
OTHER: . ;
You can't use a loop, though (without doing something similar to
what you already had), otherwise it will consume things that you
want as other tokens as well. The downside of this is that it
will generate a token for each character rather than grouping
them.
You could mitigate this by defining more tokens for specific types
of things you're expecting (operators, sequences of alphanumeric
characters, etc).
More information about the antlr-interest
mailing list