[antlr-interest] ambigous lexer tokens
Torsten Curdt
tcurdt at vafer.org
Wed Jun 27 13:44:14 PDT 2007
I would like to write a grammar for the following output:
drwxr-xr-x 23 tcurdt tcurdt 782 Jun 24 22:54 ..
-rw-r--r-- 1 tcurdt tcurdt 18545 Nov 1 2006
ASMContentHandler.Rule.html
My first naive try was
grammar test;
prog
: (line)+ EOF
;
line
: TYPE MODS WS INT WS NAME WS NAME WS INT WS NAME WS (HOUR | YEAR)
WS NAME NEWLINE
;
TYPE
: ['d' | '-' ]
;
MODS
: (['r' | 'w' | 'x' | '-' ]){9}
;
INT
: ['0'..'9']+
;
NAME
: ['0'..'9' | 'a'-'z' | 'A'..'Z' | '.' | '-']+
;
HOUR
: (INT){2} ':' (INT){2}
;
YEAR
: (INT){4}
;
NEWLINE
: '\r'? '\n'
;
WS
: (' '|'\t'|'\n'|'\r')+ { skip(); }
;
Of course that means that the tokens (TYPE/MODS/INT/NAME/HOUR/YEAR)
for the lexer are ambiguous.
How should such a grammar look like? Pointers?
cheers
--
Torsten
More information about the antlr-interest
mailing list