[antlr-interest] Learning the basics of ANTLR

Evan Metheny evanpmeth at gmail.com
Tue Oct 13 10:46:23 PDT 2009


Alright I am currently trying to learn ANTLR via the definitive guide
book. My current questions regards the following XML Grammar. What I
am trying to do is; re-write the grammar from the XMLLexer.g example
to be a parser and lexer grammar, I am doing this as an exercise to
try to understand ANTLR.

When debugging under ANTLR Works 1.3 i get a missing token exception
on GENERIC_ID within the "attribute" parser rule. I tried problem
solving by changing it to a non-fragment lexer rule and to a parser
rule, but this causes the beginning XML declaration to break. I cant
understand why it would break the recognition of "XML" when its before
the attribute call.

Any help would be much appreciated for me to understand this situation better.


XML.g:
-----------------------------------------------------------------

grammar XML;

options {
backtrack = true;
}

document
	:	xmldecl WS? doctype
	;

doctype
    :
        '<!DOCTYPE' WS? GENERIC_ID

        WS?
        (
            ( 'SYSTEM' WS? VALUE
            | 'PUBLIC' WS? VALUE WS? VALUE
            )
            ( WS )?
        )?
        ( INTERNAL_DTD

        )?
		'>'
	;

INTERNAL_DTD : '[' (options {greedy=false;} : .)* ']' ;

pi :
        '<?' GENERIC_ID WS?

        ( attribute WS? )*  '?>'
	;

xmldecl :
        '<?' ('x'|'X') ('m'|'M') ('l'|'L') WS?

        attribute  '?>'
	;


element
    : ( start_tag
            (element
            | PCDATA

            | cdata

            | comment

            | pi
            )*
            end_tag
        | emptyelement
        )
    ;

start_tag
    : '<' WS? GENERIC_ID WS?

        ( attribute WS? )* '>'
    ;

emptyelement
    : '<' WS? GENERIC_ID WS?

        ( attribute WS? )* '/>'
    ;

attribute
    : GENERIC_ID WS? '=' WS? VALUE

    ;

end_tag
    : '</' WS? GENERIC_ID WS? '>'

    ;

comment
	:	'<!--' (options {greedy=false;} : .)* '-->'
	;

cdata
	:	'<![CDATA[' (options {greedy=false;} : .)* ']]>'
	;



fragment GENERIC_ID
    : ( LETTER | '_' | ':')
        ( options {greedy=true;} :
        LETTER | '0'..'9' | '.' | '-' | '_' | ':' )*
	;

fragment LETTER
	: 'a'..'z'
	| 'A'..'Z'
	;


 WS  :
        (   ' '
        |   '\t'
        |  ( '\n'
            |	'\r\n'
            |	'\r'
            )
        )+
    ;

fragment PCDATA : (~'<')+ ;

fragment VALUE :
        ( '\"' (~'\"')* '\"'
        | '\'' (~'\'')* '\''
        )
	;


More information about the antlr-interest mailing list