[antlr-interest] Empty ifs in Java

Sat Nov 5 08:56:56 PDT 2011

Hi,

On Sat, Nov 5, 2011 at 4:16 PM, Patrick Zimmermann <patrick at zakweb.de>wrote:

> Hi,
>
> thank you a lot.
> Using a lexer rule does in fact solve this problem.
>
> And now I am already on the next:
> stripped down to:
>
> start   :       ('{' 'ab' '}')* '{a}';
>
> using input:
> {ab}{a}
>
> Will not list '{ab' on the input stream in AntlrWorks and thus fails to
> parse
> the input. I suspect this is another "should be done with the lexer"-thing.
>

No, the literals in your parser rule are implicit lexer rules, although
it's better to create explicit rules instead of mixing them inside your
parser rules:

ABraced : '{a}';
OBrace  : '{';
CBrace  : '}';
AB      : 'ab';
A       : 'a';

If the lexer now tries to tokenize the input "{ab", then the lexer will see
"{a" and expects a "}" but there's a "b" instead: and an error is emitted.

> I'm currently thinking about whether ANTLR is the right tool for my job:
>
> In many cases the input I have is character wise context sensitive. I have
> some areas (the free text area) where '(' and ')' have a specific meaning
> and
> others (the note area) where '(' ')' are simply normal text. Or whitespace
> which is important in the text and to be ignored in tags and similar
> constructs.
>
> If I'm not mistaken the lexer runs completely before the parser and
> constructs
> tokens. Those tokens are then matched by the parser. So if an input would
> match several tokens (e.g. text not containing parenthesis) and the "wrong"
> one is chosen by the lexer the parser is screwed, right?
>

Yes, the parser has no control over what tokens the lexer produces.

> I currently realize that I am forced to use lexer rules for certain
> constructs
> (like ..) because I need character ranges to define the chars that are
> allowed
> (unicode, only certain languages).
>
>
> Do you think ANTLR is the right tool for for this job and I'm just not
> seeing
> the point in how to do it, or should I better use something else? What?
>

You could let the lexer simply create single tokens and create parser rules
that match a certain range of tokens (like the `ab` rule below):

start
  :  OBrace ab CBrace OBrace A CBrace EOF
  ;

ab
  :  A B
  ;

OBrace  : '{';
CBrace  : '}';
A       : 'a';
B       : 'b';

> Thanks so far,
> Patrick

Regards,

Bart.

PS. could you use the list for communication please?