[antlr-interest] Global scopes for lexers

Andy Tripp antlr at jazillian.com
Wed Aug 1 13:46:22 PDT 2007


If we're listing places where lexer behavior depends on context, here's 
another one...
In COBOL, you can specify the set of valid input chars by saying
"ALPHABET IS ASCII" or "ALPHABET IS EBCDIC" within the file that
you're lexing. You can also list specific characters and ranges after 
"ALPHABET IS..."

Also, you can say "CURRENCY SIGN IS <char>", which has the effect of
using some character to replace '$' in the grammar. Similarly,
"DECIMAL-POINT IS COMMA" switches all the ','s and '.'s.

And, of course, there's the fixed-format issue: COBOL code can either be 
"free-format"
or "fixed-format" (where the first 7 chars of each column are ignored). 
Though this is
normally an option when running a COBOL compiler, there are also 
non-standard
constructs within a file itself to say "this file is fixed format".

All of these are pretty easy to handle with a simple preprocessor.


More information about the antlr-interest mailing list