[antlr-interest] Trouble parsing a language where '{' has too many meanings
Felix Schmid
felix at belugalounge.net
Sat Jul 7 03:42:09 PDT 2007
Richard, thanks for the reply. However, why should this prevent the
lexer from seeing
{
Type = Hash
ShortHelp = "A short comment"
LongHelp = {
Some other comment ending with a dot.
}.
as a comment (where the input is
blubber {
Type = Hash
ShortHelp = "A short comment"
LongHelp = {
Some other comment ending with a dot.
}.
Items {
FirstName {
Type = String, ShortHelp = "Hallo"
LongHelp = {
Long Explanatory test spanning
over multiple lines
}.
}
LastName {
Type = String
Default = "Blah"
ShortHelp = "(not so) interesting comment"
}
}
}
??
I think my problem is that I have to match the '{' in the parser rules
because it can occur in so many situations.
What would help was a predicate in the lexer whose value I could set
from a parser rule. Does ANTLR support this directly?
felix
Richard Clark wrote:
> Try changing the definition for ML_TEXT to put the closing element in
> a single string.
>
> ML_TEXT
> : '{'
> ( options {greedy=false;} : . )*
> '}.'
> ;
>
> The lexer doesn't do backtracking, so under the old definition it
> would see {...} and match it before seeing the "." Automatic error
> recovery would throw awayy the dot as unrecognized (and give an
> error.)
>
> Pulling the closing bracket and dot together '}.' means they'll be
> recognized as a unit.
>
> Run the following in ANTLRWorks' debugger to see it working:
>
> grammar multiBlock;
>
> top : (block | comment)* ;
>
> comment : ML_TEXT;
>
> block : BLOCK ;
>
> ML_TEXT
> : '{'
> ( options {greedy=false;} : . )*
> '}.'
> ;
>
> BLOCK : '{' ('A'..'Z'|'a'..'z'|' ')* '}' ;
>
>
> ...Richard
>
> P.S. Remember that the first rule to match in the lexer wins.
More information about the antlr-interest
mailing list