[antlr-interest] Grammar Perplexity in v3.0 (More)

Loring Craymer lgcraymer at yahoo.com
Sun Nov 12 12:05:31 PST 2006


Randy--

The "combined" nature of combined grammars is pretty
much just syntactic sugar--the parser and lexer are
still generated as distinct classes (there are some
advantages for token naming and synchronization of
token types between lexer and parser, but these are
minor).  Lexers are still behaviorally different from
parsers in that there is a master mTokens rule that
recognizes tokens and is the concatenation of all
non-fragment lexer rules as alternatives.  Parser
rules are called in context; lexer rules have no
context (just because you have matched an "A" token in
the lexer, there is no reason to expect "B" or any
other token; in the parser, A is matched as part of a
sequence--A D E perhaps--and not in isolation.

It is probably best to think of the lexer as a
separate entity--that will help you avoid these
particular traps.

--Loring

--- Randall R Schulz <rschulz at sonic.net> wrote:

> Terence,
> 
> On Sunday 12 November 2006 08:46, Terence Parr
> wrote:
> > On Nov 12, 2006, at 8:44 AM, Randall R Schulz
> wrote:
> > > plainTerm
> > >
> > >     :    AtomicWord ( '(' arguments ')' ) ?
> > >
> > > AtomicWord
> > >
> > >     :   LowerWord
> > >
> > >     ;
> >
> > These rules are a problem.  AtomicWord is
> unreachable as both rules
> > can match it's input.  You will never see it in
> the parser.
> > Ter
> 
> Oh, I get it. You cannot (meaningfully) have lexical
> rules like
> 
> AtomicWord
>     :    LowerWord
>     ;
> 
> 
> Because the replacement (or one alternative, anyway)
> is 
> indistinguishable from the rule head. The lexer
> generator has to pick 
> one token type to generate and in this case,
> LowerWord was chosen, 
> essentially "stranding" any parser rule that refers
> to AtomicWord.
> 
> Out of curiosity, why do production such as this
> work for syntax rules 
> but not for lexical rules?
> 
> 
> I've noticed that when I have a lexical rule like
> this:
> 
> Dot: '.';
> 
> in addition to literal references to '.' in the
> grammar. In such cases, 
> ANTLRworks displays the literal '.' instances as the
> named lexical 
> rule "Dot."
> 
> Perhaps this identification can be used to collapse
> lexer rules such as 
> my ill-formed ones?
> 
> 
> Randy
> 



 
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com


More information about the antlr-interest mailing list