[antlr-interest] Re: suggested ANTLR projects?
tbrandonau
tom at psy.unsw.edu.au
Wed Aug 13 17:52:17 PDT 2003
Terr was right, there was a good reason. Basically options section,
tokens section and actions are horribly ambiguous, partly due to the
opacity of actions. The rules are:
OPTIONS: "options" (WS|COMMENT)* LCURLY; // Same for tokens
ACTION: LCURLY (.*) RCURLY; // With extra stuff to handle RCURLY in
comment\string literal etc.
So, if you see a LCURLY it's really hard to know what to do. Is it an
action where you want to swallow everything pretty indescriminantly
or the start of a tokens\options block where you can actually parse
what's inside?
The solution used in Antlr is to match "options" (WS|COMMENT)* LCURLY
in RULEDEF (lowercase starting identifiers).
As Marco says one way to solve it is to use state variables but this
doesn't work in incremental lexing (at least in the netbeans
implementation), you need some notion of non-restartable tokens so
the state is properly updated, e.g. when you change "options"
to "optios" it needs to relex the following tokens (left to right),
to pick up what is now an action, when you delete the curly it needs
to re-lex "options" as a ruleDef not an OPTIONS_BLOCK (left to right)
etc. So, what you really need to do is recognise it as a single block
and record 'subtokens' for the various parts. That way the re-lexing
stuff treats it as one token but you can pull the various parts out.
Hence you want a way to return multiple tokens from a single rule. Or
you can make a custom token class to store subtokens, but then you
have a problem hooking into the incremental lexer. After lexing you
need to unpack the subtokens for subsequent stuff and then repack
them back up for the incremental lexer, meaning you need to hookinto
the lexer. I managed to hack the Netbeans lexer to support non-
restartable tokens and that kinda worked. There was some problem in
there (incremental and batch lexing was slightly different in a few
cases) but seemed to get the right stuff.
Ideally you might try and leave it to the parser, but the opacity of
actions makes that not possible, there can be stuff in an action that
is not lexable (unless you made a new Antlr lexer for every action
language).
Tom.
--- In antlr-interest at yahoogroups.com, Marco Ladermann
<ladermann at h...> wrote:
> Am Mittwoch, 13. August 2003 04:57 schrieb Brian Smith:
> > tbrandonau wrote:
> > > Ensemble section). In fact the Netbeans support could be
improved
> > > upon, incremental lexing gains from having a way to in effect
return
> > > multiple tokens at a time, to tell the incremental lexer not to
try
> > > an resume in the middle of a token (e.g. in Antlr you want to
> > > return "options {" as two tokens: LITERAL_options and LCURLY
but you
> > > want to lex it in a single rule) so either non-restartable
tokens or
> >
> > Please explain why "options {" is better lexed as a single rule? I
> > noticed this kind of thing in ANTLR's antlr.g grammar and I
simply could
> > not understand why the grammar was written like that. I feel I
must be
> > overlooking something.
>
> I'm just playing around with what Tom suggests - a ANTLR-Netbeans
module - and
> my first step was to transform the antlr.g into a tree grammar. The
matching
> of "options {" ("tokens {") as one token was indeed a problem. The
rationale
> behind this, I think, is that there is a need to distuingush action
code from
> the options/tokens name-value pairs. My solution was to introduce a
state
> variable and semantic predicates to make the decision. This allows
also to
> recognize the comments between "options" and "{", which are simply
ignored in
> the original antlr.g.
>
> Marco
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list