[antlr-interest] lexer woes

Tue Mar 4 14:35:48 PST 2008

1.)  Yes--see calls to prefixWithSynPred() in antlr.g
2.)  ANTLR 3 defaults to k=*; the best approach is to leave k alone.  For ANTLR 2, k was to find a minimum value that removed ambiguities; for ANTLR 3, a fixed k is the maximum value investigated for any decision and so weakens the analysis relative to k=*.

--Loring

----- Original Message ----
> From: Matt Benson <gudnabrsam at yahoo.com>
> To: Antlr List <antlr-interest at antlr.org>
> Sent: Tuesday, March 4, 2008 2:05:01 PM
> Subject: Re: [antlr-interest] lexer woes
> 
> Lest my other questions be lost in the noise, I am
> still confused as to:
> 
> 1) Whether backtracking mode is supported for lexers,
> and
> 2) How to specify lexer options (particularly "global"
> k) in a combined grammar.
> 
> -Matt
> 
> --- Matt Benson  wrote:
> 
> > 
> > --- Loring Craymer  wrote:
> > 
> > > This one's easy--unfortunately.  Ter does not yet
> > > use FOLLOW sets in the lexer, and that tends to
> > > cause havoc with your nicely factored grammar. 
> > > Also, you have gone overboard on using fragment
> > > rules where they are not particularly appropriate
> > > (all of your conmments, for example).
> > > 
> > > Can comments really be turned into tokens if
> > > followed by odd characters?  This seems really
> > > strange.
> > > 
> > 
> > No, that wasn't my intention.  Ugh, I had my comment
> > rules factored out properly but kept getting told
> > they
> > were unreachable, despite my awareness of
> > order-of-rules issues, etc.  However, I just changed
> > my default k back to 2, put SL_COMMENT and
> > ML_COMMENT
> > before Token, and now it seems the Tool wants to
> > disable Token for // and /* as is proper.  Not sure
> > why I couldn't get it working before but that
> > problem
> > appears to be solved.  That said I guess I should
> > keep
> > playing around for awhile here...
> > 
> > > Anyway, I would suggest factoring out a comment
> > rule
> > > and either inline most of the fragments or wait
> > > until Ter adds in FOLLOW set usage.
> > > 
> > 
> > Is that in the plan?  I don't pretend to understand
> > the whole follow set thing, but Google tells me it
> > has
> > lots of stuff for me to read and I'm still working
> > my
> > way through the Dragon book which I imagine probably
> > contains some relevant info as well.
> > 
> > Thanks, Loring.
> > 
> > > --Loring
> > > 
> > > ----- Original Message ----
> > > > From: Matt Benson 
> > > > To: Antlr List 
> > > > Sent: Monday, March 3, 2008 12:53:54 PM
> > > > Subject: [antlr-interest] lexer woes
> > > > 
> > > > I am working on a language with a fairly loose
> > > lexing
> > > > scheme.  I am running into all sorts of problems
> > > > specifying my lexer:  in particular I can't find
> > > any
> > > > evidence that backtracking works for lexer
> > > grammars. 
> > > > I tend to get NPEs building the NFAs when
> > > combining
> > > > synpreds, lexer grammars, and backtracking=true,
> > > > whether I use ANTLR 3.0.1 or a fairly recent 3.1
> > > > build.  I have had to use a strategy whereby any
> > > > possibly confusing tokens are generated from a
> > > single
> > > > lexer rule.  I'll include my current lexer
> > grammar
> > > > that passes Tool generation; if anyone has the
> > > > time/inclination/interest to offer ideas how I
> > > could
> > > > have done things more cleanly I'd be glad to
> > hear
> > > > about it.
> > > > 
> > > > Thanks (or not),
> > > > Matt
> > > > 
> > > > lexer grammar Loose;
> > > > options {k=1;}
> > > > tokens { Identifier; SEMI; SL_COMMENT;
> > > ML_COMMENT;}
> > > > 
> > > > EQUALS    :    '=';
> > > > 
> > > > StringLiteral
> > > >     :    '"' ( EscapeSequence | ~('\\'|'"') )*
> > '"'
> > > >     ;
> > > > 
> > > > fragment
> > > > EscapeSequence
> > > >     :    '\\'
> > > >         (   
> > ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
> > > >         |    Unicode
> > > >         |    Octal
> > > >         )
> > > >     ;
> > > > 
> > > > fragment
> > > > Octal
> > > > options {k=3;}
> > > >     :   ('0'..'3') ('0'..'7') ('0'..'7')
> > > >     |    ('0'..'7') ('0'..'7')?
> > > >     ;
> > > > 
> > > > fragment
> > > > Unicode
> > > >     :    'u' HexDigit HexDigit HexDigit HexDigit
> > > >     ;
> > > > 
> > > > fragment
> > > > HexDigit
> > > >     :    ('0'..'9'|'a'..'f'|'A'..'F')
> > > >     ;
> > > > 
> > > > WS    :    (WsChar)+ {$channel=HIDDEN;}
> > > >     ;
> > > > 
> > > > fragment
> > > > WsChar
> > > >     :    ' '|'\r'|'\t'|'\u000C'|'\n'
> > > >     ;
> > > > 
> > > > Token
> > > >     :    (';' WsChar)=>';' {$type=SEMI;}
> > > >     |    ('//')=>LineComment {$type=SL_COMMENT;}
> > > >     |    ('/*')=>Comment {$type=ML_COMMENT;}
> > > >     |    (TokenMark)=>TokenTail {$type=Token;}
> > > >     |    (    (Letter)=>Ident
> > {$type=Identifier;}
> > > >         |    IDDigit (Letter|IDDigit)*
> > > >         )
> > > >         //the presence of a token tail overrides
> > > any
> > > > previously assigned token type:
> > > >         (TokenTail {$type=Token;})?
> > > >     ;
> > > > 
> > > > fragment
> > > > LineComment
> > > >     :    '//' ~('\n'|'\r')* '\r'? '\n'
> > > {$channel=HIDDEN;}
> > > >     ;
> > > > 
> > > > fragment
> > > > Comment
> > > >     :    '/*' ( options {greedy=false;} : . )*
> > > '*/'
> > > > {$channel=HIDDEN;}
> > > >     ;
> > > > 
> > > > fragment
> > > > TokenTail
> > > >     :    TokenMark+ ((Letter|IDDigit)+
> > > TokenTail?)?
> > > >     ;
> > > > 
> > > > fragment
> > > > TokenMark
> > > > options {k=2;}
> > > >     :    EscapeSequence
> > > >     |    (';' ~(WsChar))=>';'//do not accept
> > > semicolon if
> > > > followed by WS
> > > >     |   
> > > ~(Letter|IDDigit|WsChar|';'|'"'|EQUALS|'/')
> > > >     |    ('/' ~('/'|'*'))=>'/'//do not accept
> > '/'
> > > if LA
> > > > finds an upcoming SL/ML comment
> > > >     ;
> > > > 
> > > > fragment
> > > > Ident
> > > >     :    Letter (Letter|IDDigit)*
> > > >     ;
> > > > 
> > > > fragment
> > > > Letter
> > > >     :    '\u0024'
> > > >     |    '\u0041'..'\u005a'
> > > >     |    '\u005f'
> > > >     |    '\u0061'..'\u007a'
> > > >     |    '\u00c0'..'\u00d6'
> > > >     |    '\u00d8'..'\u00f6'
> > > >     |    '\u00f8'..'\u00ff'
> > > >     |    '\u0100'..'\u1fff'
> > > >     |    '\u3040'..'\u318f'
> > > >     |    '\u3300'..'\u337f'
> > > >     |    '\u3400'..'\u3d2d'
> > > >     |    '\u4e00'..'\u9fff'
> > > >     |    '\uf900'..'\ufaff'
> > > >     ;
> > > > 
> > > > fragment
> > > > IDDigit
> > > >     :    '\u0030'..'\u0039'
> > > >     |    '\u0660'..'\u0669'
> > > >     |    '\u06f0'..'\u06f9'
> > > >     |    '\u0966'..'\u096f'
> > > >     |    '\u09e6'..'\u09ef'
> > > >     |    '\u0a66'..'\u0a6f'
> > > >     |    '\u0ae6'..'\u0aef'
> > > >     |    '\u0b66'..'\u0b6f'
> > > >     |    '\u0be7'..'\u0bef'
> > > >     |    '\u0c66'..'\u0c6f'
> > > >     |    '\u0ce6'..'\u0cef'
> > > >     |    '\u0d66'..'\u0d6f'
> > > >     |    '\u0e50'..'\u0e59'
> > > >     |    '\u0ed0'..'\u0ed9'
> > > >     |    '\u1040'..'\u1049'
> > > >     ;
> > > > 
> > > > 
> > > > 
> > > > 
> > > >       
> > > >
> > >
> >
> ____________________________________________________________________________________
> > > > Looking for last minute shopping deals?  
> > > > Find them fast with Yahoo! Search.  
> > > >
> > >
> >
> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> > > > 
> > > 
> > > 
> > > 
> > > 
> > >      
> > >
> >
> ____________________________________________________________________________________
> > > Be a better friend, newshound, and 
> > > know-it-all with Yahoo! Mobile.  Try it now. 
> > >
> >
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > > 
> > > 
> > > 
> > 
> > 
> > 
> >      
> >
> ____________________________________________________________________________________
> > Looking for last minute shopping deals?  
> > Find them fast with Yahoo! Search. 
> >
> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> > 
> 
> 
> 
>       
> ____________________________________________________________________________________
> Never miss a thing.  Make Yahoo your home page. 
> http://www.yahoo.com/r/hs
> 

      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping