[antlr-interest] lexer woes

Matt Benson gudnabrsam at yahoo.com
Mon Mar 3 14:15:17 PST 2008


--- Loring Craymer <lgcraymer at yahoo.com> wrote:

> This one's easy--unfortunately.  Ter does not yet
> use FOLLOW sets in the lexer, and that tends to
> cause havoc with your nicely factored grammar. 
> Also, you have gone overboard on using fragment
> rules where they are not particularly appropriate
> (all of your conmments, for example).
> 
> Can comments really be turned into tokens if
> followed by odd characters?  This seems really
> strange.
> 

No, that wasn't my intention.  Ugh, I had my comment
rules factored out properly but kept getting told they
were unreachable, despite my awareness of
order-of-rules issues, etc.  However, I just changed
my default k back to 2, put SL_COMMENT and ML_COMMENT
before Token, and now it seems the Tool wants to
disable Token for // and /* as is proper.  Not sure
why I couldn't get it working before but that problem
appears to be solved.  That said I guess I should keep
playing around for awhile here...

> Anyway, I would suggest factoring out a comment rule
> and either inline most of the fragments or wait
> until Ter adds in FOLLOW set usage.
> 

Is that in the plan?  I don't pretend to understand
the whole follow set thing, but Google tells me it has
lots of stuff for me to read and I'm still working my
way through the Dragon book which I imagine probably
contains some relevant info as well.

Thanks, Loring.

> --Loring
> 
> ----- Original Message ----
> > From: Matt Benson <gudnabrsam at yahoo.com>
> > To: Antlr List <antlr-interest at antlr.org>
> > Sent: Monday, March 3, 2008 12:53:54 PM
> > Subject: [antlr-interest] lexer woes
> > 
> > I am working on a language with a fairly loose
> lexing
> > scheme.  I am running into all sorts of problems
> > specifying my lexer:  in particular I can't find
> any
> > evidence that backtracking works for lexer
> grammars. 
> > I tend to get NPEs building the NFAs when
> combining
> > synpreds, lexer grammars, and backtracking=true,
> > whether I use ANTLR 3.0.1 or a fairly recent 3.1
> > build.  I have had to use a strategy whereby any
> > possibly confusing tokens are generated from a
> single
> > lexer rule.  I'll include my current lexer grammar
> > that passes Tool generation; if anyone has the
> > time/inclination/interest to offer ideas how I
> could
> > have done things more cleanly I'd be glad to hear
> > about it.
> > 
> > Thanks (or not),
> > Matt
> > 
> > lexer grammar Loose;
> > options {k=1;}
> > tokens { Identifier; SEMI; SL_COMMENT;
> ML_COMMENT;}
> > 
> > EQUALS    :    '=';
> > 
> > StringLiteral
> >     :    '"' ( EscapeSequence | ~('\\'|'"') )* '"'
> >     ;
> > 
> > fragment
> > EscapeSequence
> >     :    '\\'
> >         (    ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
> >         |    Unicode
> >         |    Octal
> >         )
> >     ;
> > 
> > fragment
> > Octal
> > options {k=3;}
> >     :   ('0'..'3') ('0'..'7') ('0'..'7')
> >     |    ('0'..'7') ('0'..'7')?
> >     ;
> > 
> > fragment
> > Unicode
> >     :    'u' HexDigit HexDigit HexDigit HexDigit
> >     ;
> > 
> > fragment
> > HexDigit
> >     :    ('0'..'9'|'a'..'f'|'A'..'F')
> >     ;
> > 
> > WS    :    (WsChar)+ {$channel=HIDDEN;}
> >     ;
> > 
> > fragment
> > WsChar
> >     :    ' '|'\r'|'\t'|'\u000C'|'\n'
> >     ;
> > 
> > Token
> >     :    (';' WsChar)=>';' {$type=SEMI;}
> >     |    ('//')=>LineComment {$type=SL_COMMENT;}
> >     |    ('/*')=>Comment {$type=ML_COMMENT;}
> >     |    (TokenMark)=>TokenTail {$type=Token;}
> >     |    (    (Letter)=>Ident {$type=Identifier;}
> >         |    IDDigit (Letter|IDDigit)*
> >         )
> >         //the presence of a token tail overrides
> any
> > previously assigned token type:
> >         (TokenTail {$type=Token;})?
> >     ;
> > 
> > fragment
> > LineComment
> >     :    '//' ~('\n'|'\r')* '\r'? '\n'
> {$channel=HIDDEN;}
> >     ;
> > 
> > fragment
> > Comment
> >     :    '/*' ( options {greedy=false;} : . )*
> '*/'
> > {$channel=HIDDEN;}
> >     ;
> > 
> > fragment
> > TokenTail
> >     :    TokenMark+ ((Letter|IDDigit)+
> TokenTail?)?
> >     ;
> > 
> > fragment
> > TokenMark
> > options {k=2;}
> >     :    EscapeSequence
> >     |    (';' ~(WsChar))=>';'//do not accept
> semicolon if
> > followed by WS
> >     |   
> ~(Letter|IDDigit|WsChar|';'|'"'|EQUALS|'/')
> >     |    ('/' ~('/'|'*'))=>'/'//do not accept '/'
> if LA
> > finds an upcoming SL/ML comment
> >     ;
> > 
> > fragment
> > Ident
> >     :    Letter (Letter|IDDigit)*
> >     ;
> > 
> > fragment
> > Letter
> >     :    '\u0024'
> >     |    '\u0041'..'\u005a'
> >     |    '\u005f'
> >     |    '\u0061'..'\u007a'
> >     |    '\u00c0'..'\u00d6'
> >     |    '\u00d8'..'\u00f6'
> >     |    '\u00f8'..'\u00ff'
> >     |    '\u0100'..'\u1fff'
> >     |    '\u3040'..'\u318f'
> >     |    '\u3300'..'\u337f'
> >     |    '\u3400'..'\u3d2d'
> >     |    '\u4e00'..'\u9fff'
> >     |    '\uf900'..'\ufaff'
> >     ;
> > 
> > fragment
> > IDDigit
> >     :    '\u0030'..'\u0039'
> >     |    '\u0660'..'\u0669'
> >     |    '\u06f0'..'\u06f9'
> >     |    '\u0966'..'\u096f'
> >     |    '\u09e6'..'\u09ef'
> >     |    '\u0a66'..'\u0a6f'
> >     |    '\u0ae6'..'\u0aef'
> >     |    '\u0b66'..'\u0b6f'
> >     |    '\u0be7'..'\u0bef'
> >     |    '\u0c66'..'\u0c6f'
> >     |    '\u0ce6'..'\u0cef'
> >     |    '\u0d66'..'\u0d6f'
> >     |    '\u0e50'..'\u0e59'
> >     |    '\u0ed0'..'\u0ed9'
> >     |    '\u1040'..'\u1049'
> >     ;
> > 
> > 
> > 
> > 
> >       
> >
>
____________________________________________________________________________________
> > Looking for last minute shopping deals?  
> > Find them fast with Yahoo! Search.  
> >
>
http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> > 
> 
> 
> 
> 
>      
>
____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> 
> 



      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping


More information about the antlr-interest mailing list