[antlr-interest] (newbie) very basic grammar for simple text and integer

Johannes Luber jaluber at gmx.de
Mon Aug 6 03:30:58 PDT 2007


ali azimi wrote:
> Hi,
>  
> Thank you very much for your advice. I appreciate it alot.Could you also
> advise me for the following?
>  
> I need to change my grammar slightly. I need to make my grammar
> understand to recognise two integers and parenthesis and comma in this form:
>  
> ( INT , INT ). I need my parser create different tokens for ‘(‘ ‘)’,
> comma and INTs.
>  
> For this purpose, the suggested grammar will not work properly. Since,
> both AlphaNumeric and Integer rules use Decimaldigit rule. As a result
> the parser uses the Text rule to parse something like ( INT , INT ), so
> I will not have different tokens for INTs which I want.(I eventually
> need to extract the INTs from AST)In another word the parser will parse
> something like ( 2 , 4 ) not like ‘(‘ ‘2’ ‘,’ ‘4’ ‘)’ but like ‘( 2 , 4
> )’ as one token (as a text). How can I tell the parser to use this rule:
> LPAR  INT COMMA INT RPAR to parse something like (2,4) not  the rule Text?

I had to remove from Special '(', ')' and ',', so Text won't be able to
match these double integers any more. The restriction is that in the
double integer no space may be included. If that is a requirement you
have remove Space from text entirely. If you can't evade this through
the usage of CHARACTERSTRING, then you have to test Text.getText() on
the equivalence to Integer (with trailing spaces) and to change the
token type in this case.

> And also with the suggested grammar, a simple text like following is
> parsed and put in three different nodes, How can I tell the parser to
> put all the text in one token so that I will have one node for whole the
> sentence not three.
>  
> SIGNAL
> Newgame,Probe,Result,
> Endgame,Win,Lose,Score(Integer),Bump;

The following grammar treats NEWLINE as part of Text.

Best regards,
Johannes Luber

input_data  : (LeftParen Integer Comma Integer RightParen
	|	Text
	|	Integer)*;

Integer       :Decimaldigit+ ;
Text      :	(AlphaNumeric|Special|Apostrophe|NEWLINE)
(AlphaNumeric|Special|Space|Apostrophe|NEWLINE)+ ;

fragment Apostrophe:'\'';
fragment Space           : ' ';
fragment Word            : ( AlphaNumeric | '.' )+ ;
fragment CHARACTERSTRING : '\'' ( options{greedy=false;}:
(~('\''|'\r'|'\n')| '\'' '\''))* '\'';
fragment Special
:'+'|'-'|'!'|'/'|'>'|';'|'<'|'='|':'|'?'|'&'|'%'|'.'|'_';
fragment AlphaNumeric    :Uppercase|National|Lowercase|Decimaldigit;
fragment Decimaldigit    :'0'..'9' ;
fragment National        :'#'|'@'|'\"'|'$'|'['|']'|'{'|'}'|'^'|'~' ;
fragment Lowercase       :'a'..'z' ;
fragment Uppercase       :'A'..'Z' ;
LeftParen	:	'(';
RightParen	:	')';
Comma	:	',';

fragment NEWLINE:'\r' ? '\n';
WS : (' ' |'\t')+ {skip();} ;


More information about the antlr-interest mailing list