[antlr-interest] Parsing free text
Curtis Clauson
NOSPAM at TheSnakePitDev.com
Thu Nov 8 15:47:06 PST 2007
In order to snarf up all of the characters, you must make the character
match "greedy". Apparently, it is not true that the greedy option is on
by default in all circumstances. The following lexer rules work with
AntLR v3.0.1 and ANTLRWorks v1.1.4 for both of your examples.
----------
ErrorStatement : 'ERR' WS (options {greedy = true;} : .)*;
WarningStatement: 'WRN' WS (options {greedy = true;} : .)*;
Identifier : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')+;
Space : WS {$channel = HIDDEN;};
fragment WS: (' ' | '\t')+;
----------
If you do not set greedy = true, or you set it false, you get
Unreachable Alternative warnings.
Combined with appropriate parser rules for the conditional and
expression statements, both of your examples parse correctly.
The only thing that still bothers me is that it will also snarf up any
trailing whitespace. You might want to trim your input of trailing
whitespace before you parse.
I hope that helps.
-- Curtis
Bolek Vrany wrote:
> How do I create either a lexer or parser rule that would read all text
> starting with WRN or ERR until the end of file to a single token. The
> language is case sensitive. For example
>
> IF color=white AND size=big THEN ERR Not in stock
> IF color=white AND size=big THEN WRN [[43:WR12345]]
>
> Both identifiers and the text after WRN or ERR can be arbitrarily long.
> Identifiers can contain 'a'..'z'|'A'..'Z'|'_'|'0'..'9' (numeric literals
> are enclosed in $$, ie. $50.0$). WRN [[43:WR12345]] means look up the
> text of warning [[43:WR12345]] in a database and display it, while the
> first form simply display 'Not in stock'. The message is delimited only
> by ERR and EOF.
More information about the antlr-interest
mailing list