[antlr-interest] Re: suggested ANTLR projects?

bogdan_mt bogdan_mt at yahoo.com
Fri Aug 15 08:22:58 PDT 2003


> As Marco says one way to solve it is to use state variables but 
this 
> doesn't work in incremental lexing (at least in the netbeans 
> implementation), you need some notion of non-restartable tokens so 
> the state is properly updated, e.g. when you change "options" 
> to "optios" it needs to relex the following tokens (left to 
right), 
> to pick up what is now an action, when you delete the curly it 
needs ...

This will work, but you are reinventing the wheel. ANTLR has a 
better solution for this: lexer multiplexing. In fact, the problem 
was that the option specification is an embedded language, with a 
different grammar. The "right" solution is to write two lexers that 
call one another when appropriate. Read the documentation and the 
examples from ANTLR distribution for more details. Ter was probably 
too busy and used a quick hack.

BTW, porting the Netbeans approach in ANTLR might not be a good 
idea. They wanted something very general, that works with any parser 
generator, and had to reimplement many features that ANTLR already 
had.

Best regards,
Bogdan


--- In antlr-interest at yahoogroups.com, "tbrandonau" <tom at p...> wrote:
> Terr was right, there was a good reason. Basically options 
section, 
> tokens section and actions are horribly ambiguous, partly due to 
the 
> opacity of actions. The rules are:
> OPTIONS: "options" (WS|COMMENT)* LCURLY; // Same for tokens
> ACTION: LCURLY (.*) RCURLY; // With extra stuff to handle RCURLY 
in 
> comment\string literal etc.
> So, if you see a LCURLY it's really hard to know what to do. Is it 
an 
> action where you want to swallow everything pretty 
indescriminantly 
> or the start of a tokens\options block where you can actually 
parse 
> what's inside?
> The solution used in Antlr is to match "options" (WS|COMMENT)* 
LCURLY 
> in RULEDEF (lowercase starting identifiers).
> 
> As Marco says one way to solve it is to use state variables but 
this 
> doesn't work in incremental lexing (at least in the netbeans 
> implementation), you need some notion of non-restartable tokens so 
> the state is properly updated, e.g. when you change "options" 
> to "optios" it needs to relex the following tokens (left to 
right), 
> to pick up what is now an action, when you delete the curly it 
needs 
> to re-lex "options" as a ruleDef not an OPTIONS_BLOCK (left to 
right) 
> etc. So, what you really need to do is recognise it as a single 
block 
> and record 'subtokens' for the various parts. That way the re-
lexing 
> stuff treats it as one token but you can pull the various parts 
out. 
> Hence you want a way to return multiple tokens from a single rule. 
Or 
> you can make a custom token class to store subtokens, but then you 
> have a problem hooking into the incremental lexer. After lexing 
you 
> need to unpack the subtokens for subsequent stuff and then repack 
> them back up for the incremental lexer, meaning you need to 
hookinto 
> the lexer. I managed to hack the Netbeans lexer to support non-
> restartable tokens and that kinda worked. There was some problem 
in 
> there (incremental and batch lexing was slightly different in a 
few 
> cases) but seemed to get the right stuff.
> 
> Ideally you might try and leave it to the parser, but the opacity 
of 
> actions makes that not possible, there can be stuff in an action 
that 
> is not lexable (unless you made a new Antlr lexer for every action 
> language).
> 
> Tom.
> --- In antlr-interest at yahoogroups.com, Marco Ladermann 
> <ladermann at h...> wrote:
> > Am Mittwoch, 13. August 2003 04:57 schrieb Brian Smith:
> > > tbrandonau wrote:
> > > > Ensemble section). In fact the Netbeans support could be 
> improved
> > > > upon, incremental lexing gains from having a way to in 
effect 
> return
> > > > multiple tokens at a time, to tell the incremental lexer not 
to 
> try
> > > > an resume in the middle of a token (e.g. in Antlr you want to
> > > > return "options {" as two tokens: LITERAL_options and LCURLY 
> but you
> > > > want to lex it in a single rule) so either non-restartable 
> tokens or
> > >
> > > Please explain why "options {" is better lexed as a single 
rule? I
> > > noticed this kind of thing in ANTLR's antlr.g grammar and I 
> simply could
> > > not understand why the grammar was written like that. I feel I 
> must be
> > > overlooking something.
> > 
> > I'm just playing around with what Tom suggests - a ANTLR-
Netbeans 
> module - and 
> > my first step was to transform the antlr.g into a tree grammar. 
The 
> matching 
> > of "options {" ("tokens {") as one token was indeed a problem. 
The 
> rationale 
> > behind this, I think, is that there is a need to distuingush 
action 
> code from 
> > the options/tokens name-value pairs. My solution was to 
introduce a 
> state 
> > variable and semantic predicates to make the decision. This 
allows 
> also to 
> > recognize the comments between "options" and "{", which are 
simply 
> ignored in 
> > the original antlr.g.
> > 
> > Marco


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list