[antlr-interest] Parameters in fragment lexer rules

Jean-Christophe Bach jeanchristophe.bach at inria.fr
Thu Aug 5 01:55:56 PDT 2010


Hi,

* Jim Idle <jimi at temporal-wave.com> [04.08.2010. @10:43:31 -0700]:

> Your language is broken basically :-(
> 
> The lexer runs first and creates all the tokens THEN the parser runs against
> the tokens - you cannot tell the lexer what to do from the parser. Now,
> there is an unbuffered token stream by default in the latest source code,
> but you still cannot pass parameters to the lexer rules from the parser, you
> would have to set some flag in the lexer instance to tell it what to do.
> When you mention the LEXER rule in your parser, you are not calling the
> lexer rule, you are indicating that you expect a token of that type to occur
> at that point in the parser. You should look at the generated source code to
> see this.

Thank you for the explanation.

> However, if you can work out that you need to parse a java expression when
> you are parsing, then you should be able to work out the same when you are
> lexing. At some point, you must be able to distinguish the cases? However,
> it would be better to change the syntax (if you can) to use something like
> {{ }} if you have a Java expression.

Indeed, I am able to distinguish cases, but not with a character just before the
left brace. We have constructs composed of sub-constructs based on this model :

construct : '%construct1' ID ID '(' args ')' '{' subconstruct* '}' ;

subconstructN : 'keywordN' '(' args ')' '{' <block parsed by another parser> '}';

All left braces which appear in subconstructs are the trigger to another parser.
For that, I used the official island grammar example. Therefore the
subconstructs rules look like this in reality :

subconstructN : 'keywordN' '(' args ')' LBRACE

LBRACE : '{' { <another parser is called here> } ;

The others '{' (for instance those which are in construct rules) do not trigger
a parser. We only count them, as in the island grammar example : we also have
"double embedded blocks" for few constructs (e.g. : HostLanguage -> SubLanguage
-> HostLanguage).

Of course, modifying the syntax would be the easiest way and would solve this
problem immediately, but we would have a lot of retro-compatibility problems.
I am not sure what to do, but I will try to find an other way.
Thanks a lot for your answers.

Regards,

JC

> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Jean-Christophe Bach
> > Sent: Wednesday, August 04, 2010 12:34 AM
> > To: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Parameters in fragment lexer rules
> > 
> > Hi,
> > 
> > > You need to make the distinction in the lexer via island grammars.
> > > Your case looks like it will be easy enough but you might need to
> > > formulate your lexer rules to avoid ambiguous cases. Look at the
> > > example island grammar in the downloadable example set.
> > > For your lexer you will need something like this:
> > >
> > > ARROWLBRACE
> > >    : '->'
> > >       (
> > >              (WS* '{')=> WS* '{' SPECIFICPARSER { $type = SPECIFICBLOCK;
> }
> > >           |
> > >       )
> > >    ;
> > >
> > > COLON
> > >   : ':'
> > >        (
> > >              (WS* '{')=> WS* '{' DIFFERENTPARSER { $type =
> DIFFERENTBLOCK; }
> > >           |
> > >       )
> > >   ;
> > >
> > > LBRACE
> > >   : '{' // Either the parser that calls this lexer knows what to do
> > > with Java, or you call a java parser here
> > >   ;
> > 
> > Thank you for your answer. I have already used this example as you
> advised,
> > and it is OK.
> > 
> > Now, how would you handle the last case which contains an ambiguity ? :
> > there are two different situations when a simple '{' is detected.
> Sometimes I
> > have to call another parser, sometimes I have to use a simple Java block
> > code.
> > When I encounter a such situation, no specific character is detected just
> > before the left brace (contrary to the colon and arrow cases). Passing a
> > parameter to LBRACE (an int for instance) would be great, but it does not
> > work very well when used in a parser rule :
> > 
> > anAmbiguousParserRule :
> >   ... <no specific character> LBRACE[3]... -> ...
> >   | ... <no specific character> LBRACE[4] ... -> ...
> >   ;
> > 
> > I obtain this error : "token reference LBRACE may not have parameters"
> > 
> > Regards,
> > 
> > JC
> > 
> > 
> > > > -----Original Message-----
> > > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > > bounces at antlr.org] On Behalf Of Jean-Christophe Bach
> > > > Sent: Tuesday, August 03, 2010 5:50 AM
> > > > To: antlr-interest at antlr.org
> > > > Subject: [antlr-interest] Parameters in fragment lexer rules
> > > >
> > > > Hi list,
> > > >
> > > > I am rewriting our old parser and I use antlr3 for that.
> > > > Since I have few problem to handle the '{' and to call the
> > > > appropriate
> > > parser
> > > > depending on the context, I am wondering if fragment lexer rule + a
> > > > parameter could help me.
> > > > There are many situations, but I write here 3 cases :
> > > > ... '->' '{' ... : I need to call a specific parser (#1) ... ':'  '{'
> ...
> > > : I need to call
> > > > ainother specific parser (#2)
> > > > ...      '{' ... : I need to do a simple Java treatment
> > > >
> > > > I read few articles and the antlr book, and I saw that it was
> > > > possible to
> > > do give
> > > > parameters to a fragment lexer rule. I am wondering if something
> > > > like that
> > > is
> > > > OK :
> > > >
> > > > ARROWLBRACE : '->' LBRACE[2] ;
> > > > ...
> > > > <other rules with LBRACE[n]>
> > > > ...
> > > > fragment
> > > > LBRACE[int lbtype] : '{'
> > > >   {
> > > >   switch(lbtype) {
> > > >   case 1:
> > > >     <Java code1>
> > > >   case 2:
> > > >     <Java code2>
> > > >   case 3:
> > > >     <Java code3>
> > > >     ...
> > > >   }
> > > >   }
> > > >   ;
> > > >
> > > > But am I also allowed to write a parser rule containing a LBRACE[n]
> > > > or is
> > > it
> > > > totally illegal ?
> > > > e.g. :
> > > >
> > > > myRule :
> > > >  ... LBRACE[1] ... -> ...
> > > >  |... ARROWLBRACE ... -> ...
> > > >  ;
> > > >
> > > > When attempting to do that, I have errors :
> > > > "token reference LBRACE may not have parameters"
> > > >
> > > > Is there any good way to solve this type of problem ?
> > > >
> > > > Thanks in advance,
> > > >
> > > > JC
> > > >
> > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > > Unsubscribe:
> > > > http://www.antlr.org/mailman/options/antlr-interest/your-
> > > > email-address
> > >
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe:
> > > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> > 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list