[antlr-interest] Parsing free text

Curtis Clauson NOSPAM at TheSnakePitDev.com
Thu Nov 8 15:47:06 PST 2007


In order to snarf up all of the characters, you must make the character 
match "greedy". Apparently, it is not true that the greedy option is on 
by default in all circumstances. The following lexer rules work with 
AntLR v3.0.1 and ANTLRWorks v1.1.4 for both of your examples.

----------
ErrorStatement  : 'ERR' WS (options {greedy = true;} : .)*;
WarningStatement: 'WRN' WS (options {greedy = true;} : .)*;
Identifier      : ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')+;
Space           : WS {$channel = HIDDEN;};

fragment WS: (' ' | '\t')+;
----------

If you do not set greedy = true, or you set it false, you get 
Unreachable Alternative warnings.

Combined with appropriate parser rules for the conditional and 
expression statements, both of your examples parse correctly.

The only thing that still bothers me is that it will also snarf up any 
trailing whitespace. You might want to trim your input of trailing 
whitespace before you parse.

I hope that helps.
-- Curtis


Bolek Vrany wrote:
> How do I create either a lexer or parser rule that would read all text 
> starting with WRN or ERR until the end of file to a single token. The 
> language is case sensitive. For example
> 
> IF color=white AND size=big THEN ERR Not in stock
> IF color=white AND size=big THEN WRN [[43:WR12345]]
> 
> Both identifiers and the text after WRN or ERR can be arbitrarily long. 
> Identifiers can contain 'a'..'z'|'A'..'Z'|'_'|'0'..'9' (numeric literals 
> are enclosed in $$, ie. $50.0$). WRN [[43:WR12345]] means look up the 
> text of warning [[43:WR12345]] in a database and display it, while the 
> first form simply display 'Not in stock'. The message is delimited only 
> by ERR and EOF.



More information about the antlr-interest mailing list