[antlr-interest] Re: suggested ANTLR projects?
bogdan_mt
bogdan_mt at yahoo.com
Fri Aug 15 08:22:58 PDT 2003
> As Marco says one way to solve it is to use state variables but
this
> doesn't work in incremental lexing (at least in the netbeans
> implementation), you need some notion of non-restartable tokens so
> the state is properly updated, e.g. when you change "options"
> to "optios" it needs to relex the following tokens (left to
right),
> to pick up what is now an action, when you delete the curly it
needs ...
This will work, but you are reinventing the wheel. ANTLR has a
better solution for this: lexer multiplexing. In fact, the problem
was that the option specification is an embedded language, with a
different grammar. The "right" solution is to write two lexers that
call one another when appropriate. Read the documentation and the
examples from ANTLR distribution for more details. Ter was probably
too busy and used a quick hack.
BTW, porting the Netbeans approach in ANTLR might not be a good
idea. They wanted something very general, that works with any parser
generator, and had to reimplement many features that ANTLR already
had.
Best regards,
Bogdan
--- In antlr-interest at yahoogroups.com, "tbrandonau" <tom at p...> wrote:
> Terr was right, there was a good reason. Basically options
section,
> tokens section and actions are horribly ambiguous, partly due to
the
> opacity of actions. The rules are:
> OPTIONS: "options" (WS|COMMENT)* LCURLY; // Same for tokens
> ACTION: LCURLY (.*) RCURLY; // With extra stuff to handle RCURLY
in
> comment\string literal etc.
> So, if you see a LCURLY it's really hard to know what to do. Is it
an
> action where you want to swallow everything pretty
indescriminantly
> or the start of a tokens\options block where you can actually
parse
> what's inside?
> The solution used in Antlr is to match "options" (WS|COMMENT)*
LCURLY
> in RULEDEF (lowercase starting identifiers).
>
> As Marco says one way to solve it is to use state variables but
this
> doesn't work in incremental lexing (at least in the netbeans
> implementation), you need some notion of non-restartable tokens so
> the state is properly updated, e.g. when you change "options"
> to "optios" it needs to relex the following tokens (left to
right),
> to pick up what is now an action, when you delete the curly it
needs
> to re-lex "options" as a ruleDef not an OPTIONS_BLOCK (left to
right)
> etc. So, what you really need to do is recognise it as a single
block
> and record 'subtokens' for the various parts. That way the re-
lexing
> stuff treats it as one token but you can pull the various parts
out.
> Hence you want a way to return multiple tokens from a single rule.
Or
> you can make a custom token class to store subtokens, but then you
> have a problem hooking into the incremental lexer. After lexing
you
> need to unpack the subtokens for subsequent stuff and then repack
> them back up for the incremental lexer, meaning you need to
hookinto
> the lexer. I managed to hack the Netbeans lexer to support non-
> restartable tokens and that kinda worked. There was some problem
in
> there (incremental and batch lexing was slightly different in a
few
> cases) but seemed to get the right stuff.
>
> Ideally you might try and leave it to the parser, but the opacity
of
> actions makes that not possible, there can be stuff in an action
that
> is not lexable (unless you made a new Antlr lexer for every action
> language).
>
> Tom.
> --- In antlr-interest at yahoogroups.com, Marco Ladermann
> <ladermann at h...> wrote:
> > Am Mittwoch, 13. August 2003 04:57 schrieb Brian Smith:
> > > tbrandonau wrote:
> > > > Ensemble section). In fact the Netbeans support could be
> improved
> > > > upon, incremental lexing gains from having a way to in
effect
> return
> > > > multiple tokens at a time, to tell the incremental lexer not
to
> try
> > > > an resume in the middle of a token (e.g. in Antlr you want to
> > > > return "options {" as two tokens: LITERAL_options and LCURLY
> but you
> > > > want to lex it in a single rule) so either non-restartable
> tokens or
> > >
> > > Please explain why "options {" is better lexed as a single
rule? I
> > > noticed this kind of thing in ANTLR's antlr.g grammar and I
> simply could
> > > not understand why the grammar was written like that. I feel I
> must be
> > > overlooking something.
> >
> > I'm just playing around with what Tom suggests - a ANTLR-
Netbeans
> module - and
> > my first step was to transform the antlr.g into a tree grammar.
The
> matching
> > of "options {" ("tokens {") as one token was indeed a problem.
The
> rationale
> > behind this, I think, is that there is a need to distuingush
action
> code from
> > the options/tokens name-value pairs. My solution was to
introduce a
> state
> > variable and semantic predicates to make the decision. This
allows
> also to
> > recognize the comments between "options" and "{", which are
simply
> ignored in
> > the original antlr.g.
> >
> > Marco
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list