[antlr-interest] Can ANTLR build a COBOL lexer?

Sun Apr 14 11:20:53 PDT 2002

Greg,

lexers are not very well suited for positional stuff like left/right margin,
etc. I would advise you to use a preprocessor to extract the margins
and merge the removed margins afterwards with the syntax tree.

Silvain

p.s. good luck writing the cobol grammar; it can be done with antlr
       (I did) but solving the ambiguities introduced by ANSI '85 can
       prove to be a challenge ;-)

----- Original Message -----
From: "glindholm" <glindholm at yahoo.com>
To: <antlr-interest at yahoogroups.com>
Sent: Sunday, April 14, 2002 0:00
Subject: [antlr-interest] Can ANTLR build a COBOL lexer?

> I'm working on a COBOL parser and trying to decide if I can use
> ANTLR to build the lexer or if I should just roll my own.
>
> I'm going to use this for language translation so I want to preserve
> all the COBOL "fluff" tokens like line-numbers and mod-codes as
> hidden tokens.
>
> The problem is that COBOL has column positional tokens.
> Everything in columns 1-6 is considered the line-number.
> The character in column 7 is the comment or continuation character.
> Columns 73-80 are the mod-code.
> Everything in columns 8 to 72 is free format (mostly).
>
>
> So my first attempt (which of course failed) at getting the line
> number was:
>
> LINENUM: {1==getColumn()}? . . . . . .;
>
> This has a nondeterminism with every other token rule because of
> the '.' matches everything. The semantic predicate {1==getColumn()}?
> doesn't seem to help because it doesn't get checked until we're
> already in the rule where it throws a SemanticException() if it
> fails.
>
> Question 1) Is the SemanticException suppose to be caught in
> nextToken() and the next rule tried?  I.e. We went into the wrong
> rule let's try the next one?
>
> Question 2) Is this what Hoisting is all about? If Hoisting was
> supported would the {1==getColumn()}? be checked before going into
> the rule?
>
> Question 3) Can this be made to work? Is there any facility in ANTLR
> that I can use for this or do I write my own lexer?
>
>
> If I write my own lexer I know I need to implement TokenStream.  No
> problem.
>
> Question 4)
> What is the strategy for keeping the Token Vocabularies syncronized
> between the ANTLR parser and my non-Antlr lexer?
>
> Should I write the parser first so I can use xTokenTypes in my
> lexer? Or is there some reason I need to hand code a xTokenTypes.txt
> file?
>
> Any other tips or suggestions?
>
> Thanks
>
> Greg Lindholm
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/