[antlr-interest] Parameters in fragment lexer rules

Thu Aug 5 09:28:04 PDT 2010

OK - it looks to me like you can have a class member variable in the lexer
that is set to false at the start. When you see the '%xxxxxx' token, switch
it on and you know that the next '{' triggers a subconstruct, otherwise call
the other parser. Reset that flag both after you process the '{' and when
you see any '}' as then you guard against syntax errors throwing things too
far out of whack.

@lexer::members {
boolean isConstruct = false;
}

CONSTRUCT: '%' ('A'..'Z'|'a'..'z')+ { isConstruct = true; } ;

LBRACE : '{'
                   ( {isConstruct}=> // whatever
                      | // the other
                  )
               { isConstruct = false;}
           ;

RBRACE : '}' { isConstruct = false;} ;

Jim
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Jean-Christophe Bach
> Sent: Thursday, August 05, 2010 1:56 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Parameters in fragment lexer rules
> 
> Hi,
> 
> * Jim Idle <jimi at temporal-wave.com> [04.08.2010. @10:43:31 -0700]:
> 
> > Your language is broken basically :-(
> >
> > The lexer runs first and creates all the tokens THEN the parser runs
> > against the tokens - you cannot tell the lexer what to do from the
> > parser. Now, there is an unbuffered token stream by default in the
> > latest source code, but you still cannot pass parameters to the lexer
> > rules from the parser, you would have to set some flag in the lexer
instance
> to tell it what to do.
> > When you mention the LEXER rule in your parser, you are not calling
> > the lexer rule, you are indicating that you expect a token of that
> > type to occur at that point in the parser. You should look at the
> > generated source code to see this.
> 
> Thank you for the explanation.
> 
> > However, if you can work out that you need to parse a java expression
> > when you are parsing, then you should be able to work out the same
> > when you are lexing. At some point, you must be able to distinguish
> > the cases? However, it would be better to change the syntax (if you
> > can) to use something like {{ }} if you have a Java expression.
> 
> Indeed, I am able to distinguish cases, but not with a character just
before
> the left brace. We have constructs composed of sub-constructs based on
this
> model :
> 
> construct : '%construct1' ID ID '(' args ')' '{' subconstruct* '}' ;
> 
> subconstructN : 'keywordN' '(' args ')' '{' <block parsed by another
parser> '}';
> 
> All left braces which appear in subconstructs are the trigger to another
> parser.
> For that, I used the official island grammar example. Therefore the
> subconstructs rules look like this in reality :
> 
> subconstructN : 'keywordN' '(' args ')' LBRACE
> 
> LBRACE : '{' { <another parser is called here> } ;
> 
> The others '{' (for instance those which are in construct rules) do not
trigger a
> parser. We only count them, as in the island grammar example : we also
have
> "double embedded blocks" for few constructs (e.g. : HostLanguage ->
> SubLanguage
> -> HostLanguage).
> 
> Of course, modifying the syntax would be the easiest way and would solve
> this problem immediately, but we would have a lot of retro-compatibility
> problems.
> I am not sure what to do, but I will try to find an other way.
> Thanks a lot for your answers.
> 
> Regards,
> 
> JC
> 
> > > -----Original Message-----
> > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > bounces at antlr.org] On Behalf Of Jean-Christophe Bach
> > > Sent: Wednesday, August 04, 2010 12:34 AM
> > > To: antlr-interest at antlr.org
> > > Subject: Re: [antlr-interest] Parameters in fragment lexer rules
> > >
> > > Hi,
> > >
> > > > You need to make the distinction in the lexer via island grammars.
> > > > Your case looks like it will be easy enough but you might need to
> > > > formulate your lexer rules to avoid ambiguous cases. Look at the
> > > > example island grammar in the downloadable example set.
> > > > For your lexer you will need something like this:
> > > >
> > > > ARROWLBRACE
> > > >    : '->'
> > > >       (
> > > >              (WS* '{')=> WS* '{' SPECIFICPARSER { $type =
> > > > SPECIFICBLOCK;
> > }
> > > >           |
> > > >       )
> > > >    ;
> > > >
> > > > COLON
> > > >   : ':'
> > > >        (
> > > >              (WS* '{')=> WS* '{' DIFFERENTPARSER { $type =
> > DIFFERENTBLOCK; }
> > > >           |
> > > >       )
> > > >   ;
> > > >
> > > > LBRACE
> > > >   : '{' // Either the parser that calls this lexer knows what to
> > > > do with Java, or you call a java parser here
> > > >   ;
> > >
> > > Thank you for your answer. I have already used this example as you
> > advised,
> > > and it is OK.
> > >
> > > Now, how would you handle the last case which contains an ambiguity ?
:
> > > there are two different situations when a simple '{' is detected.
> > Sometimes I
> > > have to call another parser, sometimes I have to use a simple Java
> > > block code.
> > > When I encounter a such situation, no specific character is detected
> > > just before the left brace (contrary to the colon and arrow cases).
> > > Passing a parameter to LBRACE (an int for instance) would be great,
> > > but it does not work very well when used in a parser rule :
> > >
> > > anAmbiguousParserRule :
> > >   ... <no specific character> LBRACE[3]... -> ...
> > >   | ... <no specific character> LBRACE[4] ... -> ...
> > >   ;
> > >
> > > I obtain this error : "token reference LBRACE may not have parameters"
> > >
> > > Regards,
> > >
> > > JC
> > >
> > >
> > > > > -----Original Message-----
> > > > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > > > bounces at antlr.org] On Behalf Of Jean-Christophe Bach
> > > > > Sent: Tuesday, August 03, 2010 5:50 AM
> > > > > To: antlr-interest at antlr.org
> > > > > Subject: [antlr-interest] Parameters in fragment lexer rules
> > > > >
> > > > > Hi list,
> > > > >
> > > > > I am rewriting our old parser and I use antlr3 for that.
> > > > > Since I have few problem to handle the '{' and to call the
> > > > > appropriate
> > > > parser
> > > > > depending on the context, I am wondering if fragment lexer rule
> > > > > + a parameter could help me.
> > > > > There are many situations, but I write here 3 cases :
> > > > > ... '->' '{' ... : I need to call a specific parser (#1) ... ':'
'{'
> > ...
> > > > : I need to call
> > > > > ainother specific parser (#2)
> > > > > ...      '{' ... : I need to do a simple Java treatment
> > > > >
> > > > > I read few articles and the antlr book, and I saw that it was
> > > > > possible to
> > > > do give
> > > > > parameters to a fragment lexer rule. I am wondering if something
> > > > > like that
> > > > is
> > > > > OK :
> > > > >
> > > > > ARROWLBRACE : '->' LBRACE[2] ;
> > > > > ...
> > > > > <other rules with LBRACE[n]>
> > > > > ...
> > > > > fragment
> > > > > LBRACE[int lbtype] : '{'
> > > > >   {
> > > > >   switch(lbtype) {
> > > > >   case 1:
> > > > >     <Java code1>
> > > > >   case 2:
> > > > >     <Java code2>
> > > > >   case 3:
> > > > >     <Java code3>
> > > > >     ...
> > > > >   }
> > > > >   }
> > > > >   ;
> > > > >
> > > > > But am I also allowed to write a parser rule containing a
> > > > > LBRACE[n] or is
> > > > it
> > > > > totally illegal ?
> > > > > e.g. :
> > > > >
> > > > > myRule :
> > > > >  ... LBRACE[1] ... -> ...
> > > > >  |... ARROWLBRACE ... -> ...
> > > > >  ;
> > > > >
> > > > > When attempting to do that, I have errors :
> > > > > "token reference LBRACE may not have parameters"
> > > > >
> > > > > Is there any good way to solve this type of problem ?
> > > > >
> > > > > Thanks in advance,
> > > > >
> > > > > JC
> > > > >
> > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > > > Unsubscribe:
> > > > > http://www.antlr.org/mailman/options/antlr-interest/your-
> > > > > email-address
> > > >
> > > >
> > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > > Unsubscribe:
> > > > http://www.antlr.org/mailman/options/antlr-interest/your-email-add
> > > > ress
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe:
> > > http://www.antlr.org/mailman/options/antlr-interest/your-
> > > email-address
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address