[antlr-interest] Lexer rule for INTEGER and COMMA_INTEGER

Bernard Kaiflin bkaiflin.ruby at gmail.com
Tue Nov 6 15:17:55 PST 2012


A solution for v3.4.

I had a hard time with decisions using multiple alternatives, multiplying
the subrules and syntactic predicates. Once you have tasted ANTLR4, you no
longer want to bother with all these ambiguity and backtracking problems.
Give it a try !

========== grammar

grammar Q3;

/* Recognize edited numbers like 1,234,567 as a whole but
   F(1, 2 ,3, 44,55,66) as 4 parameters, white space skipped,
   but `, ` and ` ,` are separators.
   for ANTLR v3.4 */

@parser::members {
    ArrayList<String> parms;
    void storeAtom(String text) {parms.add(text);}
}

sample
@init {System.out.println("---- last update 1908");}
    : piece* EOF ;

piece
@after {System.out.println("===== processed one piece of input : <" +
$piece.text + ">");}
    :   comma_integer | function
    ;

comma_integer
    :   INT COMMA INT ( COMMA INT )* {System.out.println("CI found a
comma_integer : " + $comma_integer.text);}
    ;

function
@init {parms = new ArrayList<String>();}
@after {int n = parms.size();
        System.out.println(">>>>> Function " + $function.text + " has " + n
+ " parameters");
        for(int i = 0; i < n; i++) System.out.println("p" + (i + 1) + "=`"
+ parms.get(i) + "`");
       }

    :   ID '(' list ')'
    ;

list
    :   a=atom_comma            {System.out.println("1a rule list chose
atom_comma <" + $a.text + ">"); storeAtom($a.text);}
        ( e1=element            {System.out.println("1b rule list chose
element <" + $e1.text + ">");
                                 /* p+=element doesn't works well} */
storeAtom($e1.text);}
        )
        ( seperator e2=element  {System.out.println("1c rule list chose
element <" + $e2.text + ">"); storeAtom($e2.text);}
        )*
    |   (INT COMMA INT)=>c=comma_integer
                                {System.out.println("2a rule list chose ci
<" + $c.text + ">"); storeAtom($c.text);}
        ( seperator d=element   {System.out.println("2b rule list chose
COMMA element <," + $d.text + ">");

/* storeAtom(","); */ storeAtom($d.text);}
        )*
    |   atom                    {System.out.println("3a rule list chose
atom <" + $atom.text + ">"); storeAtom($atom.text);}
        ( COMMA f=element       {System.out.println("3b rule list chose
COMMA element <," + $f.text + ">");

            storeAtom(","); storeAtom($f.text);}
        )*
    ;

element
    :   (INT COMMA INT)=> comma_integer {System.out.println("rule element
found a CI : <"    + $element.text + ">");}
    |   (atom_comma   )=> atom_comma    {System.out.println("rule element
found an AC : <"   + $element.text + ">");}
    |   atom                            {System.out.println("rule element
found an atom : <" + $element.text + ">");}
    ;

atom_comma
    :   atom COMMA_SPACE
    |   atom SPACE_COMMA
    ;

seperator
    :   COMMA
    |   COMMA_SPACE
    |   SPACE_COMMA
    ;

atom
    :   ID
    |   INT
    ;

COMMA_SPACE : ', '
{System.out.println("rule COMMA_SPACE `" + $text + "`");} ;
SPACE_COMMA : ' ,'
{System.out.println("rule SPACE_COMMA `" + $text + "`");} ;
COMMA : ',' ;
ID  : ( 'a'..'z' | 'A'..'Z' | '_')+
 {System.out.println("rule ID   `" + $text + "`");} ;
INT : DIGIT+
{System.out.println("rule INT  `" + $text + "`");} ;
WS  : ( ' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;}
 {System.out.println("rule WS");} ;

fragment DIGIT : '0'..'9' ;


========== standard Test file

import org.antlr.runtime.*;

public class Test {
    public static void main(String[] args) throws Exception {
        ANTLRInputStream input = new ANTLRInputStream(System.in);
        Q3Lexer lexer = new Q3Lexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        Q3Parser parser = new Q3Parser(tokens);
        parser.sample();
    }
}

========== input

$ cat t2.comma
1,234,567 F(1, x)  G(11,   12  , 13,444)  H(99,88,77,  66,6)  P(9,
8,77,666)  X(1 , 2, 3 ,4 , 5,6     ,   7,888,999)

========== execution

$ java Test < t2.comma
...
===== processed one piece of input : <1,234,567>
...
>>>>> Function F(1, x) has 2 parameters
p1=`1, `
p2=`x`
...
>>>>> Function H(99,88,77,  66,6) has 2 parameters
p1=`99,88,77`
p2=`66,6`
...
>>>>> Function X(1 , 2, 3 ,4 , 5,6     ,   7,888,999) has 6 parameters
p1=`1 ,`
p2=`2`
p3=`3`
p4=`4`
p5=`5,6`
p6=`7,888,999`


2012/11/3 Zhaohui Yang <yezonghui at gmail.com>

> Hi,
>
> I have a lexer grammar that that has to recognize INTEGER like 1234 and
> COMMA_INTEGER like 1,234,567
> The later integer token has comma in it, and of cause the language has
> other places that use comma, e.g. F(1, x) is valid, which contains "1,"
> that should be recognized as a INTEGER 1 followd by a comma.
> .........
> Yes. If there are white space before or after the comma, they are seperate
> parameters; if no white spaces around, it is one COMMA_integer.
> --
> Regards,
>
> Yang, Zhaohui
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list