[antlr-interest] lexer woes
Matt Benson
gudnabrsam at yahoo.com
Tue Mar 4 14:55:09 PST 2008
--- Loring Craymer <lgcraymer at yahoo.com> wrote:
> 1.) Yes--see calls to prefixWithSynPred() in
> antlr.g
Hmm. The reason I asked is that I continue to get
NPEs whenever I turn on backtracking in my lexer
grammar and run Tool against it.
> 2.) ANTLR 3 defaults to k=*; the best approach is
> to leave k alone. For ANTLR 2, k was to find a
> minimum value that removed ambiguities; for ANTLR 3,
> a fixed k is the maximum value investigated for any
> decision and so weakens the analysis relative to
> k=*.
Again, if I don't set k=2 for my lexer grammar, it
disables rules that I don't want disabled. As this
grammar is intended for OSS anyway, I've posted it at:
http://people.apache.org/~mbenson/sharedfiles/BantamLexer.g3
if anyone feels like playing with it.
-Matt
>
> --Loring
>
> ----- Original Message ----
> > From: Matt Benson <gudnabrsam at yahoo.com>
> > To: Antlr List <antlr-interest at antlr.org>
> > Sent: Tuesday, March 4, 2008 2:05:01 PM
> > Subject: Re: [antlr-interest] lexer woes
> >
> > Lest my other questions be lost in the noise, I am
> > still confused as to:
> >
> > 1) Whether backtracking mode is supported for
> lexers,
> > and
> > 2) How to specify lexer options (particularly
> "global"
> > k) in a combined grammar.
> >
> > -Matt
> >
> > --- Matt Benson wrote:
> >
> > >
> > > --- Loring Craymer wrote:
> > >
> > > > This one's easy--unfortunately. Ter does not
> yet
> > > > use FOLLOW sets in the lexer, and that tends
> to
> > > > cause havoc with your nicely factored grammar.
>
> > > > Also, you have gone overboard on using
> fragment
> > > > rules where they are not particularly
> appropriate
> > > > (all of your conmments, for example).
> > > >
> > > > Can comments really be turned into tokens if
> > > > followed by odd characters? This seems really
> > > > strange.
> > > >
> > >
> > > No, that wasn't my intention. Ugh, I had my
> comment
> > > rules factored out properly but kept getting
> told
> > > they
> > > were unreachable, despite my awareness of
> > > order-of-rules issues, etc. However, I just
> changed
> > > my default k back to 2, put SL_COMMENT and
> > > ML_COMMENT
> > > before Token, and now it seems the Tool wants to
> > > disable Token for // and /* as is proper. Not
> sure
> > > why I couldn't get it working before but that
> > > problem
> > > appears to be solved. That said I guess I
> should
> > > keep
> > > playing around for awhile here...
> > >
> > > > Anyway, I would suggest factoring out a
> comment
> > > rule
> > > > and either inline most of the fragments or
> wait
> > > > until Ter adds in FOLLOW set usage.
> > > >
> > >
> > > Is that in the plan? I don't pretend to
> understand
> > > the whole follow set thing, but Google tells me
> it
> > > has
> > > lots of stuff for me to read and I'm still
> working
> > > my
> > > way through the Dragon book which I imagine
> probably
> > > contains some relevant info as well.
> > >
> > > Thanks, Loring.
> > >
> > > > --Loring
> > > >
> > > > ----- Original Message ----
> > > > > From: Matt Benson
> > > > > To: Antlr List
> > > > > Sent: Monday, March 3, 2008 12:53:54 PM
> > > > > Subject: [antlr-interest] lexer woes
> > > > >
> > > > > I am working on a language with a fairly
> loose
> > > > lexing
> > > > > scheme. I am running into all sorts of
> problems
> > > > > specifying my lexer: in particular I can't
> find
> > > > any
> > > > > evidence that backtracking works for lexer
> > > > grammars.
> > > > > I tend to get NPEs building the NFAs when
> > > > combining
> > > > > synpreds, lexer grammars, and
> backtracking=true,
> > > > > whether I use ANTLR 3.0.1 or a fairly recent
> 3.1
> > > > > build. I have had to use a strategy whereby
> any
> > > > > possibly confusing tokens are generated from
> a
> > > > single
> > > > > lexer rule. I'll include my current lexer
> > > grammar
> > > > > that passes Tool generation; if anyone has
> the
> > > > > time/inclination/interest to offer ideas how
> I
> > > > could
> > > > > have done things more cleanly I'd be glad to
> > > hear
> > > > > about it.
> > > > >
> > > > > Thanks (or not),
> > > > > Matt
> > > > >
> > > > > lexer grammar Loose;
> > > > > options {k=1;}
> > > > > tokens { Identifier; SEMI; SL_COMMENT;
> > > > ML_COMMENT;}
> > > > >
> > > > > EQUALS : '=';
> > > > >
> > > > > StringLiteral
> > > > > : '"' ( EscapeSequence | ~('\\'|'"')
> )*
> > > '"'
> > > > > ;
> > > > >
> > > > > fragment
> > > > > EscapeSequence
> > > > > : '\\'
> > > > > (
> > > ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
> > > > > | Unicode
> > > > > | Octal
> > > > > )
> > > > > ;
> > > > >
> > > > > fragment
> > > > > Octal
> > > > > options {k=3;}
> > > > > : ('0'..'3') ('0'..'7') ('0'..'7')
> > > > > | ('0'..'7') ('0'..'7')?
> > > > > ;
> > > > >
> > > > > fragment
> > > > > Unicode
> > > > > : 'u' HexDigit HexDigit HexDigit
> HexDigit
> > > > > ;
> > > > >
> > > > > fragment
> > > > > HexDigit
> > > > > : ('0'..'9'|'a'..'f'|'A'..'F')
> > > > > ;
> > > > >
> > > > > WS : (WsChar)+ {$channel=HIDDEN;}
> > > > > ;
> > > > >
> > > > > fragment
> > > > > WsChar
> > > > > : ' '|'\r'|'\t'|'\u000C'|'\n'
> > > > > ;
> > > > >
> > > > > Token
> > > > > : (';' WsChar)=>';' {$type=SEMI;}
> > > > > | ('//')=>LineComment
> {$type=SL_COMMENT;}
> > > > > | ('/*')=>Comment {$type=ML_COMMENT;}
> > > > > | (TokenMark)=>TokenTail
> {$type=Token;}
> > > > > | ( (Letter)=>Ident
> > > {$type=Identifier;}
> > > > > | IDDigit (Letter|IDDigit)*
> > > > > )
> > > > > //the presence of a token tail
> overrides
> > > > any
> > > > > previously assigned token type:
> > > > > (TokenTail {$type=Token;})?
> > > > > ;
> > > > >
> > > > > fragment
> > > > > LineComment
> > > > > : '//' ~('\n'|'\r')* '\r'? '\n'
> > > > {$channel=HIDDEN;}
> > > > > ;
> > > > >
> > > > > fragment
> > > > > Comment
> > > > > : '/*' ( options {greedy=false;} : .
> )*
> > > > '*/'
> > > > > {$channel=HIDDEN;}
> > > > > ;
> > > > >
> > > > > fragment
> > > > > TokenTail
> > > > > : TokenMark+ ((Letter|IDDigit)+
> > > > TokenTail?)?
> > > > > ;
> > > > >
> > > > > fragment
> > > > > TokenMark
> > > > > options {k=2;}
> > > > > : EscapeSequence
> > > > > | (';' ~(WsChar))=>';'//do not accept
> > > > semicolon if
> > > > > followed by WS
> > > > > |
> > > > ~(Letter|IDDigit|WsChar|';'|'"'|EQUALS|'/')
> > > > > | ('/' ~('/'|'*'))=>'/'//do not
> accept
> > > '/'
> > > > if LA
> > > > > finds an upcoming SL/ML comment
> > > > > ;
> > > > >
> > > > > fragment
> > > > > Ident
> > > > > : Letter (Letter|IDDigit)*
> > > > > ;
> > > > >
> > > > > fragment
> > > > > Letter
> > > > > : '\u0024'
> > > > > | '\u0041'..'\u005a'
> > > > > | '\u005f'
> > > > > | '\u0061'..'\u007a'
> > > > > | '\u00c0'..'\u00d6'
> > > > > | '\u00d8'..'\u00f6'
> > > > > | '\u00f8'..'\u00ff'
> > > > > | '\u0100'..'\u1fff'
> > > > > | '\u3040'..'\u318f'
> > > > > | '\u3300'..'\u337f'
> > > > > | '\u3400'..'\u3d2d'
> > > > > | '\u4e00'..'\u9fff'
> > > > > | '\uf900'..'\ufaff'
> > > > > ;
> > > > >
> > > > > fragment
> > > > > IDDigit
> > > > > : '\u0030'..'\u0039'
> > > > > | '\u0660'..'\u0669'
> > > > > | '\u06f0'..'\u06f9'
> > > > > | '\u0966'..'\u096f'
> > > > > | '\u09e6'..'\u09ef'
> > > > > | '\u0a66'..'\u0a6f'
> > > > > | '\u0ae6'..'\u0aef'
> > > > > | '\u0b66'..'\u0b6f'
> > > > > | '\u0be7'..'\u0bef'
> > > > > | '\u0c66'..'\u0c6f'
> > > > > | '\u0ce6'..'\u0cef'
> > > > > | '\u0d66'..'\u0d6f'
> > > > > | '\u0e50'..'\u0e59'
> > > > > | '\u0ed0'..'\u0ed9'
> > > > > | '\u1040'..'\u1049'
> > > > > ;
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
____________________________________________________________________________________
> > > > > Looking for last minute shopping deals?
> > > > > Find them fast with Yahoo! Search.
> > > > >
> > > >
> > >
> >
>
http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>
____________________________________________________________________________________
> > > > Be a better friend, newshound, and
> > > > know-it-all with Yahoo! Mobile. Try it now.
> > > >
> > >
> >
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> >
>
____________________________________________________________________________________
> > > Looking for last minute shopping deals?
> > > Find them fast with Yahoo! Search.
> > >
> >
>
http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> > >
> >
> >
> >
> >
> >
>
____________________________________________________________________________________
> > Never miss a thing. Make Yahoo your home page.
> > http://www.yahoo.com/r/hs
> >
>
>
>
>
>
>
____________________________________________________________________________________
> Looking for last minute shopping deals?
> Find them fast with Yahoo! Search.
>
http://tools.search.yahoo.com/newsearch/category.php?category=shopping
>
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
More information about the antlr-interest
mailing list