[antlr-interest] Empty ifs in Java
Bart Kiers
bkiers at gmail.com
Sat Nov 5 08:56:56 PDT 2011
Hi,
On Sat, Nov 5, 2011 at 4:16 PM, Patrick Zimmermann <patrick at zakweb.de>wrote:
> Hi,
>
> thank you a lot.
> Using a lexer rule does in fact solve this problem.
>
> And now I am already on the next:
> stripped down to:
>
> start : ('{' 'ab' '}')* '{a}';
>
> using input:
> {ab}{a}
>
> Will not list '{ab' on the input stream in AntlrWorks and thus fails to
> parse
> the input. I suspect this is another "should be done with the lexer"-thing.
>
No, the literals in your parser rule are implicit lexer rules, although
it's better to create explicit rules instead of mixing them inside your
parser rules:
ABraced : '{a}';
OBrace : '{';
CBrace : '}';
AB : 'ab';
A : 'a';
If the lexer now tries to tokenize the input "{ab", then the lexer will see
"{a" and expects a "}" but there's a "b" instead: and an error is emitted.
> I'm currently thinking about whether ANTLR is the right tool for my job:
>
> In many cases the input I have is character wise context sensitive. I have
> some areas (the free text area) where '(' and ')' have a specific meaning
> and
> others (the note area) where '(' ')' are simply normal text. Or whitespace
> which is important in the text and to be ignored in tags and similar
> constructs.
>
> If I'm not mistaken the lexer runs completely before the parser and
> constructs
> tokens. Those tokens are then matched by the parser. So if an input would
> match several tokens (e.g. text not containing parenthesis) and the "wrong"
> one is chosen by the lexer the parser is screwed, right?
>
Yes, the parser has no control over what tokens the lexer produces.
> I currently realize that I am forced to use lexer rules for certain
> constructs
> (like ..) because I need character ranges to define the chars that are
> allowed
> (unicode, only certain languages).
>
>
> Do you think ANTLR is the right tool for for this job and I'm just not
> seeing
> the point in how to do it, or should I better use something else? What?
>
You could let the lexer simply create single tokens and create parser rules
that match a certain range of tokens (like the `ab` rule below):
start
: OBrace ab CBrace OBrace A CBrace EOF
;
ab
: A B
;
OBrace : '{';
CBrace : '}';
A : 'a';
B : 'b';
> Thanks so far,
> Patrick
Regards,
Bart.
PS. could you use the list for communication please?
More information about the antlr-interest
mailing list