[antlr-interest] Parsing a Cucumber-like language in ANTLR

Bernard Kaiflin bkaiflin.ruby at gmail.com
Tue Nov 20 01:18:43 PST 2012


Hello,

It's not easy to ignore portions of input, or capture until we see
something, or as long we don't see, especially as newbie. You can have a
look at these threads :
http://www.antlr.org/pipermail/antlr-interest/2012-November/045764.html
http://www.antlr.org/pipermail/antlr-interest/2012-November/045750.html

Having said that, it appears that the Gherkin language, and probably yours,
is structured in such a way that any text can appear between two keywords.
So a rule starts with a keyword and loops as long as the next possible
keyword does not appear. I'm a four weeks old newbie and it took me about
one hour to write this grammar. You'll have to add more tokens to recognize
special characters that can appear in the free text, as told by messages
like "token recognition error at: '('".

grammar Cucumber;

/* Recognize Cucumber-like DSL. */

file
@init {System.out.println("Cucumber last update 0947");}
    : keyword+ EOF ;

keyword
    :   description
    |   scenario
    |   given
    |   when
    |   then
    |   and
    ;

description
@after {System.out.println("===== found a description");}
    :   'Description' ':' ~'Scenario'+
    ;

scenario
@after {System.out.println("===== scenario : " + $scenario.text);}
    :   'Scenario' ':' ~'Given'*
    ;

given /* according to The Cucumber book, we can choose between "Given ...
Given ..."
         and "Given ... And ...". */
@after {System.out.println("===== given : " + $given.text);}
    :   'Given' ~( 'Given' | 'When' | 'And' )*
    ;

when
@after {System.out.println("===== when : " + $when.text);}
    :   'When' ~( 'Then' | 'And' )*
    ;

then /* according to The Cucumber book, we can choose between "Then ...
Then ..."
         and "Then ... And ...". */
@after {System.out.println("===== then : " + $then.text);}
    :   'Then' ~( 'Then' | 'And' | 'Scenario' | 'Description' )*
    ;

and
@after {System.out.println("===== and : " + $and.text);}
    :   'And' ~( 'And' | 'Given' | 'When' | 'Then' | 'Scenario' |
'Description' )*
    ;

ID  : [a-zA-Z]+ ;
INT : DIGIT+ ;
SPECIAL : '_' | '-' | '.' | '+' | '/' | ':' | '%' | '$' ;
WS  : [ \t\r\n] -> channel(HIDDEN) ;

fragment DIGIT : [0-9];

========== input file t.cucumber

Description:
Multi line text goes here
to specifiy the description

Scenario: A
Given size between 10 and 20
   And location is spread out
Then add 2 confidence

Scenario: Attempt withdrawal using stolen card
(from The Cucumber Book)
Given I have $100 in my account
Given my card is invalid
When I request $50
Then my card should not be returned
Then I should be told to contact the bank

========== Execution

$ echo $CLASSPATH
.:/usr/local/lib/antlr-4.0b3-complete.jar
$ antlr4 Cucumber.g4
$ javac Cucumber*.java
$ grun Cucumber file -tokens -diagnostics -trace t.cucumber
line 11:0 token recognition error at: '('
line 11:23 token recognition error at: ')'
[@0,0:10='Description',<6>,1:0]
[@1,11:11=':',<3>,1:11]
[@2,12:12='\n',<11>,channel=1,1:12]
[@3,13:17='Multi',<8>,2:0]
[@4,18:18=' ',<11>,channel=1,2:5]
[@5,19:22='line',<8>,2:6]
....
enter   file, LT(1)=Description
Cucumber last update 0947
enter   keyword, LT(1)=Description
enter   description, LT(1)=Description
consume [@0,0:10='Description',<6>,1:0] rule description alt=1
consume [@1,11:11=':',<3>,1:11] rule description alt=1
consume [@3,13:17='Multi',<8>,2:0] rule description alt=1
....

$ grun Cucumber file -diagnostics t.cucumber
line 11:0 token recognition error at: '('
line 11:23 token recognition error at: ')'
Cucumber last update 0947
===== found a description
===== scenario : Scenario: A
===== given : Given size between 10 and 20
===== and : And location is spread out
===== then : Then add 2 confidence
===== scenario : Scenario: Attempt withdrawal using stolen card
from The Cucumber Book
===== given : Given I have $100 in my account
===== given : Given my card is invalid
===== when : When I request $50
===== then : Then my card should not be returned
===== then : Then I should be told to contact the bank

HTH
Bernard

2012/11/19 Wesley Ripley <wripley at wpi.edu>

> The problem I am having is in capturing muli-line blocks of text. So here
> is a short example: ...
>

We want ANTLR to see the Description: keyword and know to capture
> everything between that and the next keyword....


More information about the antlr-interest mailing list