[antlr-interest] Lexer Rule matching member variable (Java)

Frederic Beister azamir at azamir.de
Fri Aug 20 00:33:35 PDT 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi and thanks for the quick response!

I got that part to work. However, when I use these terminal rules:

fragment ANY_CHAR : .;

SPECIAL :
	{ lxstate==State.NORMAL && input.LA(1)==curspecial} ?=>
	ANY_CHAR
	{ lxstate = State.SPECIAL; } ;

TEXT :
	{ lxstate==State.NORMAL && input.LA(1)!=curspecial} ?=>
	ANY_CHAR ;

I get lots of TEXT-nodes with only one character. When I modify the
second rule to match ANY_CHAR+, the TEXT-rule might consume something
like "Text@{test" where @ is the current special character - which I
don't want it to (There is even a warning when I run ANTLS regarding
that possibility).

I'd need to express the matching of (~curspecial)+ as a predicate. Is
there a possibility to do that? I'd also be willing to modify the
generated code. I guess I'd have to insert some kind of loop where the
ANY_CHAR is matched(?)

Greetings
Frederic



Am 16.08.2010 19:30, schrieb Jim Idle:
> You want a rule like this:
> 
> DELIM : { input.LA(1) == currDelim}?=> . ;
> 
> However, this could get a little complicated to get the matching order of
> the rules correct when you start having a lot more rules. You will need to
> experiment a little.
> 
> You could also pre-process the input and substitute something extremely
> unlikely to clash with the language, such as '\u0001' or something like
> that. Then  only look for that character in the main lexer.
> 
> Jim
> 
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Frederic Beister
>> Sent: Sunday, August 15, 2010 11:40 PM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Lexer Rule matching member variable (Java)
>>
> Hello,
> 
> I want to write an ANTLR-Lexer and -Parser for a Literate Programming
> language. The idea is to embed code fragments in various languages in a
> LaTeX document and generate source files on-the-fly.
> 
> The languages allows to change the special character used to denote the
> beginning of a code snippet and the special character used inside these
> snippets to denote inclusion of other snippets. This is needed because
>> some
> "guest"-languages might need a pre-defined special character themselves.
> 
> The special character can be changed anywhere in the source text by using
> <OLDSPECIAL>=<NEWSPECIAL> where <OLDSPACIAL> is the old special
> character and <NEWSPECIAL> is the new special character that should be
> active after that instruction.
> 
> My idea was to modify the lexer such that it has a member variable "char
> cur_special" that is set to the current special character and match
>> against it in
> a rule
> 
> "fragment SCHAR : cur_special"
> 
> such that the token stream abstracts from the different possible special
> characters.
> 
> At the moment, the only way I see to accomplish this is to manually modify
> the generated lexer in many places.
> 
> Is there perhaps a built-in functionality in ANTLR 3.2 i could use? I
>> couldn't
> find anything in the archives searching for "lexer match member". I really
> don't need a full how-to but a gentle nudge in the correct direction.
> 
> Thanks in advance
> Frederic
> 
> P.S.
> Sorry if this is/becomes a repost. My first mail didn't make it through
> - perhaps because it had a PGP signature attachment
> 
> 
>>
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
email-address

> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


- -- 
PGP Fingerprint = 782C 2BE7 0972 D632 8BDF 4A23 3811 174A 1530 64ED
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxuL88ACgkQOBEXShUwZO221wCcDrMEYhlJ6nAc1qdFBP93hRyM
p+wAn3ee90Bzytkpaw1cDvLp+Ne5Oc7s
=NR0C
-----END PGP SIGNATURE-----


More information about the antlr-interest mailing list