[antlr-interest] Lexer rule for INTEGER and COMMA_INTEGER

Zhaohui Yang yezonghui at gmail.com
Fri Nov 9 20:39:09 PST 2012


The main ambiguity here is that a sequence like "1  ,2" can either by
recognized as a comma_integer (INT WS COMMA INT) or a list
(comma_integer=INT, seperator=SPACE_COMMA, comma_integer=INT).

I guess the simplicity of the V4 version come from some default priority /
greedy policy that favous comma_integer (than seperator in list). Or ANTLR
V4 has unified ambiguity analysis that considers all lexer and parser rules
together?

Maybe I should buy a V4 book and find out :)


2012/11/7 Bernard Kaiflin <bkaiflin.ruby at gmail.com>

> A solution for v4.
>
> Roughly 2 hours using v4, 2 days using v3.4. As you can see by comparing
> with the v3.4 solution, ANTLR4 is much more powerful, writing a grammar is
> simpler, the trace is more user-friendly
>
> enter   comma_integer, LT(1)=1
> consume [@59,80:80='1',<7>,1:80] rule comma_integer alt=1
> exit    comma_integer, LT(1)= ,
>
> A big quantum leap, a five stars tool, if not All*.
>
> ========== grammar
>
> grammar Q4;
>
> /* Recognize edited numbers like 1,234,567 as a whole but
>    F(1, 2 ,3, 44,55,66) as 4 parameters, white space skipped,
>    but `, ` and ` ,` are separators.
>    for ANTLR v4 */
>
> @parser::members {
>     ArrayList<String> parms;
>     void storeAtom(String text) {
>         parms.add(text);
> //        System.out.println("atom <" + text + "> has been added");
>     }
> }
>
> line
> @init {System.out.println("--- last update 1426");}
>     : piece* EOF ;
>
> piece
>     :   comma_integer  {System.out.println("===== found a COMMA_INTEGER :
> <" + $comma_integer.text + ">");}
>     |   function
>     ;
>
> comma_integer
>     :   INT ( COMMA INT )*
>     ;
>
> function
> @init {parms = new ArrayList<String>();}
> @after {System.out.println(">>>>> Function " + $function.text + " has " +
> parms.size() + " parameters");
>         for(int i = 0; i < parms.size(); i++) System.out.println("p" + (i
> + 1) + "=`" + parms.get(i) + "`");
>        }
>
>     :   ID '(' list ')'
>     ;
>
> list
>     :   a=atom
>  {storeAtom($a.text);}
>         ( seperator b=atom  {/* storeAtom($seperator.text); */
> storeAtom($b.text);}
>         )*
>     ;
>
> seperator
>     :   COMMA
>     |   COMMA_SPACE
>     |   SPACE_COMMA
>     ;
>
> atom
>     :   ID
>     |   comma_integer
>     |   INT
>     ;
>
> COMMA_SPACE : ', ' ;
> SPACE_COMMA : ' ,' ;
> COMMA : ',' ;
> ID  : [a-zA-Z_]+ ;
> INT : DIGIT+ ;
> WS  : [ \t\r\n] -> channel(HIDDEN) ;
>
> fragment DIGIT : [0-9];
>
>
> ========== input
>
> $ cat t2.comma
> 1,234,567 F(1, x)  G(11,   12  , 13,444)  H(99,88,77,  66,6)  P(9,
> 8,77,666)  X(1 , 2, 3 ,4 , 5,6     ,   7,888,999)
>
> ========== execution
>
> $ alias
> alias antlr4='java -jar /usr/local/lib/antlr-4.0b2-complete.jar'
> $ antlr4 Q4.g4
> $ javac Q4*.java
> $ grun Q4 line -tokens t2.comma
> [@0,0:0='1',<7>,1:0]
> [@1,1:1=',',<5>,1:1]
> [@2,2:4='234',<7>,1:2]
> ...
> --- last update 1426
> ===== found a COMMA_INTEGER : <1,234,567>
> >>>>> Function F(1, x) has 2 parameters
> p1=`1`
> p2=`x`
> >>>>> Function G(11,   12  , 13,444) has 3 parameters
> p1=`11`
> p2=`12`
> p3=`13,444`
> >>>>> Function H(99,88,77,  66,6) has 2 parameters
> p1=`99,88,77`
> p2=`66,6`
> >>>>> Function P(9, 8,77,666) has 2 parameters
> p1=`9`
> p2=`8,77,666`
> >>>>> Function X(1 , 2, 3 ,4 , 5,6     ,   7,888,999) has 6 parameters
> p1=`1`
> p2=`2`
> p3=`3`
> p4=`4`
> p5=`5,6`
> p6=`7,888,999`
>
>
> 2012/11/3 Zhaohui Yang <yezonghui at gmail.com>
>
>> Hi,
>>
>> I have a lexer grammar that that has to recognize INTEGER like 1234 and
>> COMMA_INTEGER like 1,234,567
>> The later integer token has comma in it, and of cause the language has
>> other places that use comma, e.g. F(1, x) is valid, which contains "1,"
>> that should be recognized as a INTEGER 1 followd by a comma.
>> ...........
>>
> Yes. If there are white space before or after the comma, they are seperate
> parameters; if no white spaces around, it is one COMMA_integer.
>
>> --
>> Regards,
>>
>> Yang, Zhaohui
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>


-- 
Regards,

Yang, Zhaohui


More information about the antlr-interest mailing list