[antlr-interest] how to let parser control lexer state.

Fri Apr 27 23:31:45 PDT 2007

You are correct; I hadn't been aware of heredoc.  It
is still not a problem that needs tieing parser and
lexer together.  Instead, you need a lexer rule that
either matches a heredoc or returns "<<".  Something
like (ANTLR 2 version:  that could be made to work.)

SHIFT
    :
    "<<"
    (  (HEREDOC)=> HEREDOC { _ttype = HEREDOC; }
    |
    )
    ;

protected
HEREDOC
    :
    DOCTEXT
    { capture doc string here in function returning
true }?
    ( ~'\n' )+ '\n'
    { mark start of string here in function returning
true }?
    LINES
    ;

protected
LINES
    :
    (ENDTEXT)=> DOCTEXT
    |  (~'\n')+ '\n' LINES
    ;

ENDTEXT
    :
    { mark text start in function returning true}?
    DOCTEXT
    { check if text matches text in function }?
    '\n'
    ;

protected
DOCTEXT
    :
    ('a' .. 'z' | 'A' .. 'Z')+
    ;

Very ugly; most of the sempreds are actions which must
be evaluated, but would not otherwise be inside of a
synpred.  Alternatively, all of the HEREDOC mechanism
could be in a handwritten parse routine (returning
true or false and setting token type if true is
returned:

SHIFT
    :
    "<<"
    (    { heredoc() }? .
    |
    )
    ;

This second aproach should be workable in ANTLR 3;
note that heredoc() unwinds one character so that
antlr can match it as a wildcard--that may be
necessary.

Nasty problem.  It is amazing how often really bad
ideas are adopted.

--Loring

--- David Holroyd <dave at badgers-in-foil.co.uk> wrote:

> On Fri, Apr 27, 2007 at 10:42:24AM -0700, Loring
> Craymer wrote:
> > That's what semantic predicates (in the parser)
> are
> > for--there are no lexical issues involved in this
> > case.  You want to do the symbol table lookup and
> test
> > in a predicate.
> 
> Here-docs are a lexing problem, I think -- content
> on subsequent lines
> must not be interpreted with normal lexer rules
> until the document
> terminator is seen..?
> 
> > --- femtowin1 <femtowin1 at gmail.com> wrote:
> > 
> > > Hi all, in antlr3, can parser control lexer
> state
> > > and decide how lexer lexing? some grammar has
> > > ambiguity
> > > decided upon by parser knowledge.
> > >   for ruby grammar <<
> > > x << 1
> > > test
> > > 1
> > > if x is a variable, then << is shift operator,
> > > otherwise it is a heredoc. so lexing must know
> > > from the symbol table whether x has been defined
> > > beforehand. But current antlrv3 implementation,
> > > lexer lexing to a constant token stream, and
> feed
> > > it into parser, so can't achieve this effect.
> 
> -- 
> http://david.holroyd.me.uk/
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com