[antlr-interest] how to let parser control lexer state.
Loring Craymer
lgcraymer at yahoo.com
Fri Apr 27 23:31:45 PDT 2007
You are correct; I hadn't been aware of heredoc. It
is still not a problem that needs tieing parser and
lexer together. Instead, you need a lexer rule that
either matches a heredoc or returns "<<". Something
like (ANTLR 2 version: that could be made to work.)
SHIFT
:
"<<"
( (HEREDOC)=> HEREDOC { _ttype = HEREDOC; }
|
)
;
protected
HEREDOC
:
DOCTEXT
{ capture doc string here in function returning
true }?
( ~'\n' )+ '\n'
{ mark start of string here in function returning
true }?
LINES
;
protected
LINES
:
(ENDTEXT)=> DOCTEXT
| (~'\n')+ '\n' LINES
;
ENDTEXT
:
{ mark text start in function returning true}?
DOCTEXT
{ check if text matches text in function }?
'\n'
;
protected
DOCTEXT
:
('a' .. 'z' | 'A' .. 'Z')+
;
Very ugly; most of the sempreds are actions which must
be evaluated, but would not otherwise be inside of a
synpred. Alternatively, all of the HEREDOC mechanism
could be in a handwritten parse routine (returning
true or false and setting token type if true is
returned:
SHIFT
:
"<<"
( { heredoc() }? .
|
)
;
This second aproach should be workable in ANTLR 3;
note that heredoc() unwinds one character so that
antlr can match it as a wildcard--that may be
necessary.
Nasty problem. It is amazing how often really bad
ideas are adopted.
--Loring
--- David Holroyd <dave at badgers-in-foil.co.uk> wrote:
> On Fri, Apr 27, 2007 at 10:42:24AM -0700, Loring
> Craymer wrote:
> > That's what semantic predicates (in the parser)
> are
> > for--there are no lexical issues involved in this
> > case. You want to do the symbol table lookup and
> test
> > in a predicate.
>
> Here-docs are a lexing problem, I think -- content
> on subsequent lines
> must not be interpreted with normal lexer rules
> until the document
> terminator is seen..?
>
> > --- femtowin1 <femtowin1 at gmail.com> wrote:
> >
> > > Hi all, in antlr3, can parser control lexer
> state
> > > and decide how lexer lexing? some grammar has
> > > ambiguity
> > > decided upon by parser knowledge.
> > > for ruby grammar <<
> > > x << 1
> > > test
> > > 1
> > > if x is a variable, then << is shift operator,
> > > otherwise it is a heredoc. so lexing must know
> > > from the symbol table whether x has been defined
> > > beforehand. But current antlrv3 implementation,
> > > lexer lexing to a constant token stream, and
> feed
> > > it into parser, so can't achieve this effect.
>
> --
> http://david.holroyd.me.uk/
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the antlr-interest
mailing list