[antlr-interest] Lexer rule for INTEGER and COMMA_INTEGER
Bernard Kaiflin
bkaiflin.ruby at gmail.com
Tue Nov 6 15:23:38 PST 2012
A solution for v4.
Roughly 2 hours using v4, 2 days using v3.4. As you can see by comparing
with the v3.4 solution, ANTLR4 is much more powerful, writing a grammar is
simpler, the trace is more user-friendly
enter comma_integer, LT(1)=1
consume [@59,80:80='1',<7>,1:80] rule comma_integer alt=1
exit comma_integer, LT(1)= ,
A big quantum leap, a five stars tool, if not All*.
========== grammar
grammar Q4;
/* Recognize edited numbers like 1,234,567 as a whole but
F(1, 2 ,3, 44,55,66) as 4 parameters, white space skipped,
but `, ` and ` ,` are separators.
for ANTLR v4 */
@parser::members {
ArrayList<String> parms;
void storeAtom(String text) {
parms.add(text);
// System.out.println("atom <" + text + "> has been added");
}
}
line
@init {System.out.println("--- last update 1426");}
: piece* EOF ;
piece
: comma_integer {System.out.println("===== found a COMMA_INTEGER :
<" + $comma_integer.text + ">");}
| function
;
comma_integer
: INT ( COMMA INT )*
;
function
@init {parms = new ArrayList<String>();}
@after {System.out.println(">>>>> Function " + $function.text + " has " +
parms.size() + " parameters");
for(int i = 0; i < parms.size(); i++) System.out.println("p" + (i +
1) + "=`" + parms.get(i) + "`");
}
: ID '(' list ')'
;
list
: a=atom
{storeAtom($a.text);}
( seperator b=atom {/* storeAtom($seperator.text); */
storeAtom($b.text);}
)*
;
seperator
: COMMA
| COMMA_SPACE
| SPACE_COMMA
;
atom
: ID
| comma_integer
| INT
;
COMMA_SPACE : ', ' ;
SPACE_COMMA : ' ,' ;
COMMA : ',' ;
ID : [a-zA-Z_]+ ;
INT : DIGIT+ ;
WS : [ \t\r\n] -> channel(HIDDEN) ;
fragment DIGIT : [0-9];
========== input
$ cat t2.comma
1,234,567 F(1, x) G(11, 12 , 13,444) H(99,88,77, 66,6) P(9,
8,77,666) X(1 , 2, 3 ,4 , 5,6 , 7,888,999)
========== execution
$ alias
alias antlr4='java -jar /usr/local/lib/antlr-4.0b2-complete.jar'
$ antlr4 Q4.g4
$ javac Q4*.java
$ grun Q4 line -tokens t2.comma
[@0,0:0='1',<7>,1:0]
[@1,1:1=',',<5>,1:1]
[@2,2:4='234',<7>,1:2]
...
--- last update 1426
===== found a COMMA_INTEGER : <1,234,567>
>>>>> Function F(1, x) has 2 parameters
p1=`1`
p2=`x`
>>>>> Function G(11, 12 , 13,444) has 3 parameters
p1=`11`
p2=`12`
p3=`13,444`
>>>>> Function H(99,88,77, 66,6) has 2 parameters
p1=`99,88,77`
p2=`66,6`
>>>>> Function P(9, 8,77,666) has 2 parameters
p1=`9`
p2=`8,77,666`
>>>>> Function X(1 , 2, 3 ,4 , 5,6 , 7,888,999) has 6 parameters
p1=`1`
p2=`2`
p3=`3`
p4=`4`
p5=`5,6`
p6=`7,888,999`
2012/11/3 Zhaohui Yang <yezonghui at gmail.com>
> Hi,
>
> I have a lexer grammar that that has to recognize INTEGER like 1234 and
> COMMA_INTEGER like 1,234,567
> The later integer token has comma in it, and of cause the language has
> other places that use comma, e.g. F(1, x) is valid, which contains "1,"
> that should be recognized as a INTEGER 1 followd by a comma.
> ...........
>
Yes. If there are white space before or after the comma, they are seperate
parameters; if no white spaces around, it is one COMMA_integer.
> --
> Regards,
>
> Yang, Zhaohui
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list