[antlr-interest] grammar for folded lines
Mark Eggers
mdeggers at gmail.com
Fri Dec 4 15:27:00 PST 2009
I'm just starting Antlr after running into a wall trying to use
a state pattern with regular expressions to implement a DSL.
I have the first Antlr book, and this has been quite helpful so far.
One problem that I've run into is folded lines. The specification that
I'm trying to write a grammar for says in part:
Any sequence of CRLF followed immediately by a single linear white space
character is ignored (i.e., removed) when processing the content type.
When parsing a content line, folded lines MUST first be unfolded
according to the unfolding procedure described above.
So, the way I'm reading this is that a folding token (' '|'\t') CRLF can
come anywhere in the input stream and needs to be ignored before
processing.
I did the following to discard a folding token between other tokens in a
parsing rule.
id: (FOLD)=>
| ID '=' ID ';' NEWLINE
| NEWLINE
;
FOLD: (' '|'\t') NEWLINE {skip();} ;
NEWLINE: '\r'? '\n' ;
ID: ('a' .. 'z' | 'A' .. 'Z')+ ;
WS: (' '|'\t'|'\r'|'\n')+ {skip();} ;
This works fine when typing in:
cat=dog;
cat = dog;
cat
= dog;
It fails when typing in:
ca
t=dog;
I'm trying to get two ID tokens out of the last entry.
I'm obviously not understanding something fundamental. Hopefully I can
accomplish this without filtering the input before the Antlr-generated
code is used.
Pointers welcome.
Thanks in advance - /mde/
More information about the antlr-interest
mailing list