[antlr-interest] Can ANTLR build a COBOL lexer?

glindholm glindholm at yahoo.com
Sat Apr 13 15:00:12 PDT 2002


I'm working on a COBOL parser and trying to decide if I can use 
ANTLR to build the lexer or if I should just roll my own. 

I'm going to use this for language translation so I want to preserve 
all the COBOL "fluff" tokens like line-numbers and mod-codes as 
hidden tokens.

The problem is that COBOL has column positional tokens. 
Everything in columns 1-6 is considered the line-number.
The character in column 7 is the comment or continuation character.
Columns 73-80 are the mod-code.
Everything in columns 8 to 72 is free format (mostly).


So my first attempt (which of course failed) at getting the line 
number was:

LINENUM: {1==getColumn()}? . . . . . .;

This has a nondeterminism with every other token rule because of 
the '.' matches everything. The semantic predicate {1==getColumn()}? 
doesn't seem to help because it doesn't get checked until we're 
already in the rule where it throws a SemanticException() if it 
fails.

Question 1) Is the SemanticException suppose to be caught in 
nextToken() and the next rule tried?  I.e. We went into the wrong 
rule let's try the next one?

Question 2) Is this what Hoisting is all about? If Hoisting was 
supported would the {1==getColumn()}? be checked before going into 
the rule?

Question 3) Can this be made to work? Is there any facility in ANTLR 
that I can use for this or do I write my own lexer?


If I write my own lexer I know I need to implement TokenStream.  No 
problem.

Question 4)
What is the strategy for keeping the Token Vocabularies syncronized 
between the ANTLR parser and my non-Antlr lexer?

Should I write the parser first so I can use xTokenTypes in my 
lexer? Or is there some reason I need to hand code a xTokenTypes.txt 
file?

Any other tips or suggestions?

Thanks

Greg Lindholm


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list