[antlr-interest] Lexer Rule matching member variable (Java)

Fri Aug 20 09:15:30 PDT 2010

You just need to not use ANY. Once the predicate matches, your rule is
selected and it will do whatever you tell it. So, don't use ANY+ just put
some code in to consume until you get to some point that you don't want to
consume any more:

TEXT: { lxstate==State.NORMAL && input.LA(1)!=curspecial} ?=> .
{
  while (input.LA(1) != curspecial) { input.consume(); }
}
;

It is often instructive to look at the generated code and steal from it to
get where you need to.

Jim 

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Frederic Beister
> Sent: Friday, August 20, 2010 12:34 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Lexer Rule matching member variable (Java)
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi and thanks for the quick response!
> 
> I got that part to work. However, when I use these terminal rules:
> 
> fragment ANY_CHAR : .;
> 
> SPECIAL :
> 	{ lxstate==State.NORMAL && input.LA(1)==curspecial} ?=>
> 	ANY_CHAR
> 	{ lxstate = State.SPECIAL; } ;
> 
> TEXT :
> 	{ lxstate==State.NORMAL && input.LA(1)!=curspecial} ?=>
> 	ANY_CHAR ;
> 
> I get lots of TEXT-nodes with only one character. When I modify the second
> rule to match ANY_CHAR+, the TEXT-rule might consume something like
> "Text@{test" where @ is the current special character - which I don't want
it
> to (There is even a warning when I run ANTLS regarding that possibility).
> 
> I'd need to express the matching of (~curspecial)+ as a predicate. Is
there a
> possibility to do that? I'd also be willing to modify the generated code.
I guess
> I'd have to insert some kind of loop where the ANY_CHAR is matched(?)
> 
> Greetings
> Frederic
> 
> 
> 
> Am 16.08.2010 19:30, schrieb Jim Idle:
> > You want a rule like this:
> >
> > DELIM : { input.LA(1) == currDelim}?=> . ;
> >
> > However, this could get a little complicated to get the matching order
> > of the rules correct when you start having a lot more rules. You will
> > need to experiment a little.
> >
> > You could also pre-process the input and substitute something
> > extremely unlikely to clash with the language, such as '\u0001' or
> > something like that. Then  only look for that character in the main
lexer.
> >
> > Jim
> >
> >> -----Original Message-----
> >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >> bounces at antlr.org] On Behalf Of Frederic Beister
> >> Sent: Sunday, August 15, 2010 11:40 PM
> >> To: antlr-interest at antlr.org
> >> Subject: [antlr-interest] Lexer Rule matching member variable (Java)
> >>
> > Hello,
> >
> > I want to write an ANTLR-Lexer and -Parser for a Literate Programming
> > language. The idea is to embed code fragments in various languages in
> > a LaTeX document and generate source files on-the-fly.
> >
> > The languages allows to change the special character used to denote
> > the beginning of a code snippet and the special character used inside
> > these snippets to denote inclusion of other snippets. This is needed
> > because
> >> some
> > "guest"-languages might need a pre-defined special character themselves.
> >
> > The special character can be changed anywhere in the source text by
> > using <OLDSPECIAL>=<NEWSPECIAL> where <OLDSPACIAL> is the old
> special
> > character and <NEWSPECIAL> is the new special character that should be
> > active after that instruction.
> >
> > My idea was to modify the lexer such that it has a member variable
> > "char cur_special" that is set to the current special character and
> > match
> >> against it in
> > a rule
> >
> > "fragment SCHAR : cur_special"
> >
> > such that the token stream abstracts from the different possible
> > special characters.
> >
> > At the moment, the only way I see to accomplish this is to manually
> > modify the generated lexer in many places.
> >
> > Is there perhaps a built-in functionality in ANTLR 3.2 i could use? I
> >> couldn't
> > find anything in the archives searching for "lexer match member". I
> > really don't need a full how-to but a gentle nudge in the correct
direction.
> >
> > Thanks in advance
> > Frederic
> >
> > P.S.
> > Sorry if this is/becomes a repost. My first mail didn't make it
> > through
> > - perhaps because it had a PGP signature attachment
> >
> >
> >>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
> 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> 
> - --
> PGP Fingerprint = 782C 2BE7 0972 D632 8BDF 4A23 3811 174A 1530 64ED -----
> BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAkxuL88ACgkQOBEXShUwZO221wCcDrMEYhlJ6nAc1qdFBP93h
> RyM
> p+wAn3ee90Bzytkpaw1cDvLp+Ne5Oc7s
> =NR0C
> -----END PGP SIGNATURE-----
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address