[antlr-interest] ambigous lexer tokens

Torsten Curdt tcurdt at vafer.org
Wed Jun 27 13:44:14 PDT 2007


I would like to write a grammar for the following output:

  drwxr-xr-x   23 tcurdt  tcurdt    782 Jun 24 22:54 ..
  -rw-r--r--    1 tcurdt  tcurdt  18545 Nov  1  2006  
ASMContentHandler.Rule.html

My first naive try was

  grammar test;

  prog	
	: (line)+ EOF
	;
	
  line
	: TYPE MODS WS INT WS NAME WS NAME WS INT WS NAME WS (HOUR | YEAR)  
WS NAME NEWLINE
	;
	
  TYPE
	: ['d' | '-' ]
	;

  MODS
	: (['r' | 'w' | 'x' | '-' ]){9}
	;
	
  INT
	: ['0'..'9']+
	;

  NAME
	: ['0'..'9' | 'a'-'z' | 'A'..'Z' | '.' | '-']+	
	;

  HOUR
	: (INT){2} ':' (INT){2}
	;

  YEAR
	: (INT){4}
	;
	
  NEWLINE
	: '\r'? '\n'
     	;

  WS
	: (' '|'\t'|'\n'|'\r')+ { skip(); }
	;

Of course that means that the tokens (TYPE/MODS/INT/NAME/HOUR/YEAR)  
for the lexer are ambiguous.
How should such a grammar look like? Pointers?

cheers
--
Torsten


More information about the antlr-interest mailing list