[antlr-interest] File spec grammar

Mike Lischke lists at lischke-online.de
Sun Apr 11 03:12:12 PDT 2004


Hi John, 

> I haven't actually tried this using Antlr but how about:

Thank you for your example. I came up with something similar but the problem is that with that grammar I don't get all
parts (e.g. the extension if there is one). I know the file spec is ambiquous because just from looking at:

/abc

You cannot tell if this is a file name or a directory. However one can say the last part not finished by a path
separator is a priori a file name unless proved wrong in the following semantic phase. This is not a serious problem in
my eyes. My current grammar is similar to yours but a bit more general, as it allows both path separators and Unicode
file names:

  DRIVE_LETTER:        'a'..'z';
protected
  FILE_NAME_LETTER:    ~('\\' | '/' | ':' | '*' | '?' | '<' | '>' | '|');
protected
  FILE_NAME_SEPARATOR: '\\' | '/';
  PATH_PART:           FILE_NAME_SEPARATOR (FILE_NAME_LETTER)*;

file_name:
  (drive)? (PATH_PART)*
;
drive:
  DRIVE_LETTER COLON
;

This grammar suffers from the same limitations though and causes warning messages about lexical nondeterminisms, e.g.
for DIV (defined as '/') and PATH_PART. I'm not sure how to solve that problem. And I really would like to have the file
name already splitted in my AST (drive, path, name, extension) instead adding another parse state.
 
My earlier attempt was this:

  FILE_NAME_LETTER:    ~('\\' | '/' | ':' | '*' | '?' | '<' | '>' | '|');
  EXTENSION_NAME_LETTER:    ~('\\' | '/' | ':' | '*' | '?' | '<' | '>' | '|' | '.');
  FILE_NAME_SEPARATOR: '\\' | '/';

// -- file specification
file_name:
  (drive)? (FILE_NAME_SEPARATOR)? (directory)* filename
;
	
  drive:
    "a".."z" COLON
    | "~"
  ;
  
  directory:
    basename FILE_NAME_SEPARATOR
  ;
  
  filename:
    basename ("." extension)?
  ;
  
  basename:
    (FILE_NAME_LETTER)+
  ;
  
  extension:
    (EXTENSION_NAME_LETTER)+
  ;

If this would work then I would get my file names nicely splitted. Unfortunately, this throws several nondeterminism
warnings because the file name letters conflict with other definitions in my grammar and additionally I get a Java error
for the "a".."z" range, which uses matchRange(String, String), an ANTLR function that is not accessible by the resulting
parser.

> and you did mean unix filenames, right?

I hoped to get both worlds into one grammar :-)

Mike
--
www.soft-gems.net



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list