[antlr-interest] Antlr 3.0b6 Error Issue (ANTLRWorks 1.0b8)

Fri Feb 9 12:07:46 PST 2007

Bert,

Looks like nobody answered this one, so:

First, it looks like you are confusing parser and lexer rules a bit.
Parser rules should start with a lower case letter and lexer rules
should start with an UPPER case letter. This will get rid of a lot of
your warnings I suggest.

When I told you about changing he lexer rules, I assumed that you were
using them a part of bigger elements, if this is your full grammar then
you are not and you will never fire the COL_ELEM lexer rule because
ALPHA or DIGIT will fire first. Instead get rid of COL_ELEM and create a
parser rule col_elem : ALPHA | DIGIT ; However, if this can occur where
you would otherwise be able to have an ALPHA or DIGIT then you will just
create ambiguities. You probably just want to recognize ALPHA and DIGIT
and decide what their context is afterwards the lexer doesn't know what
you want at any particular point, it just says "Here are some digits".
Try to construct the minimum results that makes a syntactically possible
tree then use the tree to work out what the things are.

I suggest that you forget about what you have and build things up in
very gradual steps, eliminating errors and so on at each step. 

Jim

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Bert Williams
Sent: Wednesday, February 07, 2007 10:01 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Antlr 3.0b6 Error Issue (ANTLRWorks 1.0b8)

Greetings:

After a few download issues, I have successfully installed and run the
Antlr 3.0 beta software.

This is a great improvement over the 2.7.7 I was previously trying.  At
least I appear to be getting further along.  The automatic left
recursion removal is a very nice feature, especially for someone who
doesn't write grammars every day.

When I try to run the grammar below (Posix Extended Regular Expressions,
a subset), using the ANTLRWorks 1.0b8 GUI,

I get the following warnings/errors.  I also get occasional internal
errors as shown below.  Whenever I make a change that fails the "Check
Grammar" operation, I seem to get this and other internal errors.

My question is:

With respect to the warning (200), how does one specify that the parser
be generated?  I really have just one alternative (and I did try e (e)*
instead of (e)+)

Is there a "greedy=false" option?  I don't see anything wrong with the
grammar, but perhaps there is something subtle I am missing.

Thanks for your helpful reply.

Bert Williams.

------------------------------Warnings/Errors---------------------------
----------------------------------------------------------

[12:36:11] error(106): regexshort.g:9:23: reference to undefined rule:
one_character_ERE

[12:36:11] error(106): regexshort.g:9:59: reference to undefined rule:
extended_reg_exp

[12:36:16] Checking Grammar...

[12:36:17] warning(200): regexshort.g:7:37: Decision can match input
such as "<EOT>" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input

[12:36:17] warning(201): regexshort.g:7:37: The following alternatives
are unreachable: 2

[12:36:17] warning(208): regexshort.g:136:1: The following token
definitions are unreachable:
ERE_expression,ERE_dupl_symbol,DIGITSET,ALPHASET,ALPHA,DIGIT,COLL_ELEM,P
LUS,DOLLAR,STAR,QUESTION,CARET  

-----------------Antlr Internal
Error-------------------------------------------------------------------
-----------------

[12:46:58] error(10):  internal error:
org.antlr.tool.Message.toString(Message.java:124): Assertion failed!
Message ID 106 created but is not present in errorMsgIDs or
warningMsgIDs.

--------------------------------------------------------ANTLR 3.0
Grammar
(regexshort.g)----------------------------------------------------

grammar regexshort;

extended_reg_exp   : (ERE_branch) ('|' ERE_branch)* (EOF)

                   ;

ERE_branch         : (ERE_expression)+

                   ;

ERE_expression     : (one_character_ERE | '^' | '$' | '('
extended_reg_exp ')') (ERE_dupl_symbol)?

                   ;

one_character_ERE  : ORD_CHAR

                   | QUOTED_CHAR

                   | '.'

                   | bracket_expression

                   ;

ERE_dupl_symbol    : '*'

                   | '+'

                   | '?'

                   | '{' DUP_COUNT               '}'

                   | '{' DUP_COUNT ','           '}'

                   | '{' DUP_COUNT ',' DUP_COUNT '}'

                   ;

DUP_COUNT 

            :           DIGIT ;                   

QUOTED_CHAR

            :           '\\' ORD_CHAR;

ORD_CHAR 

            :           COLL_ELEM

            |           DOLLAR

            ;

bracket_expression : '[' matching_list    ']'

               | '[' nonmatching_list ']'

               ;

matching_list  : bracket_list

               ;

nonmatching_list : '^' bracket_list

               ;

bracket_list   : follow_list

               | follow_list '-'

               ;

follow_list    :  (expression_term) (expression_term)*

               ;

expression_term : single_expression

               | range_expression

               ;

single_expression : end_range

               ;

range_expression : start_range end_range

               | start_range '-'

               ;

start_range    : end_range '-'

               ;

end_range      : COLL_ELEM

//               | collating_symbol

               ;

protected

DIGITSET : '0'..'9' ;

protected

ALPHASET :     ('a'..'z'|'A'..'Z' ) ;

ALPHA : ALPHASET;

DIGIT    : DIGITSET ;

COLL_ELEM

      :     ALPHASET

      |     DIGITSET

      ;

PLUS

            :           '+'

            ;

COMMA

            :           ','

            ;           

DOLLAR

            :           '$'

            ;

STAR

            :           '*'

            ;

QUESTION

            :           '?'

            ;

LPAREN

            :           '('

            ;           

RPAREN

            :           ')'

            ;           

ALTERNATION

            :           '|'

            ;

LBRACE

            :           '{'

            ;

RBRACE

            :           '}'

            ;           

LBRACKET

            :           '['

            ;

RBRACKET

            :           ']'

            ;           

PERIOD

            :           '.'

            ;

DASH

            :           '-'

            ;

BACKSLASH 

            :           '\\'

            ;

CARET

            :           '^'

            ;

WS       :           ' '

            |           '\t'

            |           '\n'

            |           '\r'

            ;

SEMI:   ';'

            ;

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070209/8c0c8514/attachment-0001.html