[antlr-interest] ambigous lexer tokens
Randall R Schulz
rschulz at sonic.net
Wed Jun 27 15:58:56 PDT 2007
On Wednesday 27 June 2007 13:44, Torsten Curdt wrote:
> I would like to write a grammar for the following output:
>
> drwxr-xr-x 23 tcurdt tcurdt 782 Jun 24 22:54 ..
> -rw-r--r-- 1 tcurdt tcurdt 18545 Nov 1 2006
> ASMContentHandler.Rule.html
>
> My first naive try was
>
> grammar test;
>
> prog
>
> : (line)+ EOF
> ;
>
> line
> : TYPE MODS WS INT WS NAME WS NAME WS INT WS NAME WS (HOUR | YEAR)
>
> WS NAME NEWLINE
> ;
>
> TYPE
> : ['d' | '-' ]
> ;
There are several other file types:
- plain file
d directory
p pipe (named pipe / FIFO)
s socket
l symbolic link
b block special (e.g., a disk or disk partition)
c character special (e.g., a (pseudo-) tty or serial port)
> MODS
> : (['r' | 'w' | 'x' | '-' ]){9}
> ;
You can strengthen the portions that recognize the modes by observing
that they come in groups of three and that each position has either a
permission character (if granted) or a dash (if not). The owner and
group 'x' bits may be replaced by a capital S to indicate set user or
set group ID, resp.
Keep in mind, too, that the last character has an extra value beyond the
usual 'x' permission bit. Sticky executables (technically obsolescent)
or directories are displayed with a 't' in place of their word execute
bit.
On some systems that support ACLs, the presence of ACLs that don't fit
the classic Unix model will cause a plus to be added to the mode
string.
> ...
> NAME
> : ['0'..'9' | 'a'-'z' | 'A'..'Z' | '.' | '-']+
> ;
Technically, on Unix (-like) systems, which this seems to be, the only
character that may not be part of a file name is a NUL byte. Perhaps
more to the point, you'll have to know about precisely how the "ls"
command(s) you're dealing with present file names, especially those
with non-ASCII or non-printing characters in their names, all of which
are possible.
> ...
>
> Of course that means that the tokens (TYPE/MODS/INT/NAME/HOUR/YEAR)
> for the lexer are ambiguous.
>
> How should such a grammar look like? Pointers?
>
> cheers
> --
> Torsten
I'm not sure what your overall goal is, but perhaps using the "getfacl"
command, if available on your system, would present you with a more
tractable format?
Good luck.
Randall Schulz
More information about the antlr-interest
mailing list