[antlr-interest] Re: suggested ANTLR projects?

Wed Aug 13 17:52:17 PDT 2003

Terr was right, there was a good reason. Basically options section, 
tokens section and actions are horribly ambiguous, partly due to the 
opacity of actions. The rules are:
OPTIONS: "options" (WS|COMMENT)* LCURLY; // Same for tokens
ACTION: LCURLY (.*) RCURLY; // With extra stuff to handle RCURLY in 
comment\string literal etc.
So, if you see a LCURLY it's really hard to know what to do. Is it an 
action where you want to swallow everything pretty indescriminantly 
or the start of a tokens\options block where you can actually parse 
what's inside?
The solution used in Antlr is to match "options" (WS|COMMENT)* LCURLY 
in RULEDEF (lowercase starting identifiers).

As Marco says one way to solve it is to use state variables but this 
doesn't work in incremental lexing (at least in the netbeans 
implementation), you need some notion of non-restartable tokens so 
the state is properly updated, e.g. when you change "options" 
to "optios" it needs to relex the following tokens (left to right), 
to pick up what is now an action, when you delete the curly it needs 
to re-lex "options" as a ruleDef not an OPTIONS_BLOCK (left to right) 
etc. So, what you really need to do is recognise it as a single block 
and record 'subtokens' for the various parts. That way the re-lexing 
stuff treats it as one token but you can pull the various parts out. 
Hence you want a way to return multiple tokens from a single rule. Or 
you can make a custom token class to store subtokens, but then you 
have a problem hooking into the incremental lexer. After lexing you 
need to unpack the subtokens for subsequent stuff and then repack 
them back up for the incremental lexer, meaning you need to hookinto 
the lexer. I managed to hack the Netbeans lexer to support non-
restartable tokens and that kinda worked. There was some problem in 
there (incremental and batch lexing was slightly different in a few 
cases) but seemed to get the right stuff.

Ideally you might try and leave it to the parser, but the opacity of 
actions makes that not possible, there can be stuff in an action that 
is not lexable (unless you made a new Antlr lexer for every action 
language).

Tom.
--- In antlr-interest at yahoogroups.com, Marco Ladermann 
<ladermann at h...> wrote:
> Am Mittwoch, 13. August 2003 04:57 schrieb Brian Smith:
> > tbrandonau wrote:
> > > Ensemble section). In fact the Netbeans support could be 
improved
> > > upon, incremental lexing gains from having a way to in effect 
return
> > > multiple tokens at a time, to tell the incremental lexer not to 
try
> > > an resume in the middle of a token (e.g. in Antlr you want to
> > > return "options {" as two tokens: LITERAL_options and LCURLY 
but you
> > > want to lex it in a single rule) so either non-restartable 
tokens or
> >
> > Please explain why "options {" is better lexed as a single rule? I
> > noticed this kind of thing in ANTLR's antlr.g grammar and I 
simply could
> > not understand why the grammar was written like that. I feel I 
must be
> > overlooking something.
> 
> I'm just playing around with what Tom suggests - a ANTLR-Netbeans 
module - and 
> my first step was to transform the antlr.g into a tree grammar. The 
matching 
> of "options {" ("tokens {") as one token was indeed a problem. The 
rationale 
> behind this, I think, is that there is a need to distuingush action 
code from 
> the options/tokens name-value pairs. My solution was to introduce a 
state 
> variable and semantic predicates to make the decision. This allows 
also to 
> recognize the comments between "options" and "{", which are simply 
ignored in 
> the original antlr.g.
> 
> Marco

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/