[antlr-interest] lexer woes
Loring Craymer
lgcraymer at yahoo.com
Tue Mar 4 14:35:48 PST 2008
1.) Yes--see calls to prefixWithSynPred() in antlr.g
2.) ANTLR 3 defaults to k=*; the best approach is to leave k alone. For ANTLR 2, k was to find a minimum value that removed ambiguities; for ANTLR 3, a fixed k is the maximum value investigated for any decision and so weakens the analysis relative to k=*.
--Loring
----- Original Message ----
> From: Matt Benson <gudnabrsam at yahoo.com>
> To: Antlr List <antlr-interest at antlr.org>
> Sent: Tuesday, March 4, 2008 2:05:01 PM
> Subject: Re: [antlr-interest] lexer woes
>
> Lest my other questions be lost in the noise, I am
> still confused as to:
>
> 1) Whether backtracking mode is supported for lexers,
> and
> 2) How to specify lexer options (particularly "global"
> k) in a combined grammar.
>
> -Matt
>
> --- Matt Benson wrote:
>
> >
> > --- Loring Craymer wrote:
> >
> > > This one's easy--unfortunately. Ter does not yet
> > > use FOLLOW sets in the lexer, and that tends to
> > > cause havoc with your nicely factored grammar.
> > > Also, you have gone overboard on using fragment
> > > rules where they are not particularly appropriate
> > > (all of your conmments, for example).
> > >
> > > Can comments really be turned into tokens if
> > > followed by odd characters? This seems really
> > > strange.
> > >
> >
> > No, that wasn't my intention. Ugh, I had my comment
> > rules factored out properly but kept getting told
> > they
> > were unreachable, despite my awareness of
> > order-of-rules issues, etc. However, I just changed
> > my default k back to 2, put SL_COMMENT and
> > ML_COMMENT
> > before Token, and now it seems the Tool wants to
> > disable Token for // and /* as is proper. Not sure
> > why I couldn't get it working before but that
> > problem
> > appears to be solved. That said I guess I should
> > keep
> > playing around for awhile here...
> >
> > > Anyway, I would suggest factoring out a comment
> > rule
> > > and either inline most of the fragments or wait
> > > until Ter adds in FOLLOW set usage.
> > >
> >
> > Is that in the plan? I don't pretend to understand
> > the whole follow set thing, but Google tells me it
> > has
> > lots of stuff for me to read and I'm still working
> > my
> > way through the Dragon book which I imagine probably
> > contains some relevant info as well.
> >
> > Thanks, Loring.
> >
> > > --Loring
> > >
> > > ----- Original Message ----
> > > > From: Matt Benson
> > > > To: Antlr List
> > > > Sent: Monday, March 3, 2008 12:53:54 PM
> > > > Subject: [antlr-interest] lexer woes
> > > >
> > > > I am working on a language with a fairly loose
> > > lexing
> > > > scheme. I am running into all sorts of problems
> > > > specifying my lexer: in particular I can't find
> > > any
> > > > evidence that backtracking works for lexer
> > > grammars.
> > > > I tend to get NPEs building the NFAs when
> > > combining
> > > > synpreds, lexer grammars, and backtracking=true,
> > > > whether I use ANTLR 3.0.1 or a fairly recent 3.1
> > > > build. I have had to use a strategy whereby any
> > > > possibly confusing tokens are generated from a
> > > single
> > > > lexer rule. I'll include my current lexer
> > grammar
> > > > that passes Tool generation; if anyone has the
> > > > time/inclination/interest to offer ideas how I
> > > could
> > > > have done things more cleanly I'd be glad to
> > hear
> > > > about it.
> > > >
> > > > Thanks (or not),
> > > > Matt
> > > >
> > > > lexer grammar Loose;
> > > > options {k=1;}
> > > > tokens { Identifier; SEMI; SL_COMMENT;
> > > ML_COMMENT;}
> > > >
> > > > EQUALS : '=';
> > > >
> > > > StringLiteral
> > > > : '"' ( EscapeSequence | ~('\\'|'"') )*
> > '"'
> > > > ;
> > > >
> > > > fragment
> > > > EscapeSequence
> > > > : '\\'
> > > > (
> > ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
> > > > | Unicode
> > > > | Octal
> > > > )
> > > > ;
> > > >
> > > > fragment
> > > > Octal
> > > > options {k=3;}
> > > > : ('0'..'3') ('0'..'7') ('0'..'7')
> > > > | ('0'..'7') ('0'..'7')?
> > > > ;
> > > >
> > > > fragment
> > > > Unicode
> > > > : 'u' HexDigit HexDigit HexDigit HexDigit
> > > > ;
> > > >
> > > > fragment
> > > > HexDigit
> > > > : ('0'..'9'|'a'..'f'|'A'..'F')
> > > > ;
> > > >
> > > > WS : (WsChar)+ {$channel=HIDDEN;}
> > > > ;
> > > >
> > > > fragment
> > > > WsChar
> > > > : ' '|'\r'|'\t'|'\u000C'|'\n'
> > > > ;
> > > >
> > > > Token
> > > > : (';' WsChar)=>';' {$type=SEMI;}
> > > > | ('//')=>LineComment {$type=SL_COMMENT;}
> > > > | ('/*')=>Comment {$type=ML_COMMENT;}
> > > > | (TokenMark)=>TokenTail {$type=Token;}
> > > > | ( (Letter)=>Ident
> > {$type=Identifier;}
> > > > | IDDigit (Letter|IDDigit)*
> > > > )
> > > > //the presence of a token tail overrides
> > > any
> > > > previously assigned token type:
> > > > (TokenTail {$type=Token;})?
> > > > ;
> > > >
> > > > fragment
> > > > LineComment
> > > > : '//' ~('\n'|'\r')* '\r'? '\n'
> > > {$channel=HIDDEN;}
> > > > ;
> > > >
> > > > fragment
> > > > Comment
> > > > : '/*' ( options {greedy=false;} : . )*
> > > '*/'
> > > > {$channel=HIDDEN;}
> > > > ;
> > > >
> > > > fragment
> > > > TokenTail
> > > > : TokenMark+ ((Letter|IDDigit)+
> > > TokenTail?)?
> > > > ;
> > > >
> > > > fragment
> > > > TokenMark
> > > > options {k=2;}
> > > > : EscapeSequence
> > > > | (';' ~(WsChar))=>';'//do not accept
> > > semicolon if
> > > > followed by WS
> > > > |
> > > ~(Letter|IDDigit|WsChar|';'|'"'|EQUALS|'/')
> > > > | ('/' ~('/'|'*'))=>'/'//do not accept
> > '/'
> > > if LA
> > > > finds an upcoming SL/ML comment
> > > > ;
> > > >
> > > > fragment
> > > > Ident
> > > > : Letter (Letter|IDDigit)*
> > > > ;
> > > >
> > > > fragment
> > > > Letter
> > > > : '\u0024'
> > > > | '\u0041'..'\u005a'
> > > > | '\u005f'
> > > > | '\u0061'..'\u007a'
> > > > | '\u00c0'..'\u00d6'
> > > > | '\u00d8'..'\u00f6'
> > > > | '\u00f8'..'\u00ff'
> > > > | '\u0100'..'\u1fff'
> > > > | '\u3040'..'\u318f'
> > > > | '\u3300'..'\u337f'
> > > > | '\u3400'..'\u3d2d'
> > > > | '\u4e00'..'\u9fff'
> > > > | '\uf900'..'\ufaff'
> > > > ;
> > > >
> > > > fragment
> > > > IDDigit
> > > > : '\u0030'..'\u0039'
> > > > | '\u0660'..'\u0669'
> > > > | '\u06f0'..'\u06f9'
> > > > | '\u0966'..'\u096f'
> > > > | '\u09e6'..'\u09ef'
> > > > | '\u0a66'..'\u0a6f'
> > > > | '\u0ae6'..'\u0aef'
> > > > | '\u0b66'..'\u0b6f'
> > > > | '\u0be7'..'\u0bef'
> > > > | '\u0c66'..'\u0c6f'
> > > > | '\u0ce6'..'\u0cef'
> > > > | '\u0d66'..'\u0d6f'
> > > > | '\u0e50'..'\u0e59'
> > > > | '\u0ed0'..'\u0ed9'
> > > > | '\u1040'..'\u1049'
> > > > ;
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> ____________________________________________________________________________________
> > > > Looking for last minute shopping deals?
> > > > Find them fast with Yahoo! Search.
> > > >
> > >
> >
> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> ____________________________________________________________________________________
> > > Be a better friend, newshound, and
> > > know-it-all with Yahoo! Mobile. Try it now.
> > >
> >
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > >
> > >
> > >
> >
> >
> >
> >
> >
> ____________________________________________________________________________________
> > Looking for last minute shopping deals?
> > Find them fast with Yahoo! Search.
> >
> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> >
>
>
>
>
> ____________________________________________________________________________________
> Never miss a thing. Make Yahoo your home page.
> http://www.yahoo.com/r/hs
>
____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
More information about the antlr-interest
mailing list