[antlr-interest] COBOL

Sinan sinan.karasu at boeing.com
Wed May 29 11:40:57 PDT 2002


Balvinder Singh wrote:
> 
> Hi Sinan,
> 
> But If I will use your scheme, getting warning
> 
> warning:Syntactic predicate ignored for single alternative
> 
> what should I refactor?
> 
> balvinder
> 
> >From: Sinan <sinan.karasu at boeing.com>
> >Reply-To: antlr-interest at yahoogroups.com
> >To: antlr-interest at yahoogroups.com
> >Subject: Re: [antlr-interest] COBOL
> >Date: Wed, 29 May 2002 08:55:03 -0700
> >
> >Balvinder Singh wrote:
> > >
> > > Hi all,
> > >
> > >    I'm writing cobol parser only for WORKING STORAGE AREA of data
> >division.
> > > I'm using grammar rule and lexical rule for WORKING STORAGE AREA from VS
> > > COBOL II (http://adam.wins.uva.nl/~x/grammars/vs-cobol-ii/)
> > >
> > > I have converted lexical rule to ANTLR format, but I'm getting conflicts
> >for
> > > some of the rules, rules are as follows :
> > >
> > > Literal : NonNumeric | Numeric
> > >         ;
> > >
> > > protected
> > > NonNumeric : '"' ( (~'"') | '"' '"' )* '"'
> > >            | '\'' ( (~'\'') | '\'' '\'')* '\''
> > >            | ('X' 'x') '"' HexDigits '"'
> > >            | ('X' 'x') '\'' HexDigits '\''
> > >            ;
> >Factor this:
> >
> >NonNumeric : '"' ( (~'"') | '"' '"' )* '"'
> >             | '\'' ( (~'\'') | '\'' '\'')* '\''
> >             | ('X' 'x')( '"' HexDigits '"' | '\'' HexDigits '\'')
> >             ;
> >
> >
> >
> >
> > > AphabeticUserDefinedWord : (('0'.. '9')+ ('-')*)* ('0' .. '9')* ('A'
> >..
> > > 'Z' 'a' .. 'z') ('A' .. 'Z' 'a' .. 'z' '0' .. '9')* (('-')+ ('A' .. 'Z'
> >'a'
> > > .. 'z' '0' .. '9')+)*
> > >                           ;

Aha , didn't notice that before..

(1):

AphabeticUserDefinedWord : (('0'.. '9')+ ('-')*)* ('0' .. '9')*
('A'..'Z' 'a' .. 'z') ('A' .. 'Z' 'a' .. 'z' '0' .. '9')* (('-')+ ('A'
.. 'Z' 'a'.. 'z' '0' .. '9')+)*
                          ;
Has a genuine ambiguity....

(('0'.. '9')+ ('-')*)* ('0' .. '9')*

problem is with  ('-')* 

so you can have

 99

satisfied by ('0'.. '9')+

or

(('0'.. '9')+ ('-')*)* ('0' .. '9')*

Don't you really want

(('0'.. '9')+ ('-' ('0' .. '9')*)* )* 



in which case the solution would be (in case you still get
ambiguities...)

(('0'.. '9')+ (('-')=> '-' ('0' .. '9')*)* )* 

Which probably is not correct either, but then the rule you have has
another ambiguity, mainly

(('0'.. '9')+ ('-')*)* ('0' .. '9')* ('A'..'Z' 'a' .. 'z') ('A' .. 'Z'
'a' .. 'z' '0' .. '9')*

is equivalent to:

(('0'.. '9')+ ('-')*)*  ('A' .. 'Z' 'a' .. 'z' '0' .. '9')+


so. we get:


AphabeticUserDefinedWord : (('0'.. '9')+ ('-')*)* ('A' .. 'Z' 'a' .. 'z'
'0' .. '9')+ (('-')+ ('A' .. 'Z' 'a'.. 'z' '0' .. '9')+)*;

but then your rule really collapses to:


(2)

AlphabeticUserDefinedWord : ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9')+ (
('-')+ ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9') );

Check it.


It appears that whatever satisfies (1) , satisfies (2).

It appears that (1) tries to make some tokens start witn numeric, but it
fails...

A9--A9

will skip (('0'.. '9')+ ('-')*)* ('0' .. '9')*

and be parsed ( if it was not ambigous)

9A-9A will be parsed by:
('0' .. '9')* ('A'..'Z' 'a' .. 'z') ('A' .. 'Z' 'a' .. 'z' '0' .. '9')*
(('-')+ ('A' .. 'Z' 'a'.. 'z' '0' .. '9')+)*

So what you said is what you got , it is just that what you got is not
what you meant.

Try to start with rule (2) , and fix it as you go along....

If you specify case insensitive literals , then

AlphabeticUserDefinedWord : ('a' .. 'z' | '0' .. '9')+ ( ('-')+ ('a' ..
'z' |  '0' .. '9') );

BTW , do you really do not want the | between literals ??? I am
confused....

Sinan

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list