[antlr-interest] Get results of multible tokens

Fri Sep 4 13:30:49 PDT 2009

Am Wed, 2 Sep 2009 23:05:30 +0100 schrieb Hugo Picado:

Hello Hugo,

thanks this works nice.

regards
	Andreas

> Hi,
> 
> One fast approach is to divide to conquer:
> 
> line
>  : property subtokenlist DPOINT attribute
>  ;
> property
>  : TOKEN { System.out.println ("Property: " + $TOKEN.text); }
>  ;
> subtokenlist
>  : (SEMI TOKEN { System.out.println("Subtoken: " + $TOKEN.text); } )*
>  ;
> attribute
>  : TOKEN { System.out.println ("Attribute: " + $TOKEN.text); }
>  ;
> 
> This also eliminates the need for having the SUBTOKEN rule and solves
> the semicolon problem.
>  I didn't try this because it is not possible for me right now so I
> don't know if it is actually working, but the idea is there :)
> 
> Good luck,
> Hugo.
> 
> 
> On Wed, Sep 2, 2009 at 10:13 PM, Andreas Volz <lists at brachttal.net>
> wrote:
> 
> > Hello,
> >
> > I have this grammar file:
> >
> > grammar VCard;
> >
> > @members {
> >    public static void main(String[] args) throws Exception {
> >        VCardLexer lex = new VCardLexer(new
> > ANTLRFileStream(args[0])); CommonTokenStream tokens = new
> > CommonTokenStream(lex);
> >
> >        VCardParser parser = new VCardParser(tokens);
> >
> >        try {
> >            parser.line();
> >        } catch (RecognitionException e)  {
> >            e.printStackTrace();
> >        }
> >    }
> > }
> >
> > line
> >        : property=TOKEN subtoken=SUBTOKEN* DPOINT attribute=TOKEN
> >        {
> >                System.out.println ("Property: " + $property.text);
> >                System.out.println ("Attribute: " + $attribute.text);
> >                System.out.println ("Subtoken: " + $subtoken.text);
> >
> >        }
> >        ;
> >
> > TOKEN
> >        : (ALPHA | DIGIT)+
> >        ;
> >
> > SUBTOKEN
> >        : SEMI TOKEN
> >        ;
> >
> > WS
> >        : ('\n' | ' ' | '\t')* {$channel=HIDDEN;}
> >        ;
> >
> > fragment DIGIT
> >        : '0'..'9'
> >        ;
> >
> > fragment ALPHA
> >        : 'a'..'z' | 'A'..'Z'
> >        ;
> >
> > DPOINT
> >        : ':'
> >        ;
> >
> > SEMI
> >        : ';'
> >        ;
> >
> >
> > And this input:
> >
> > a;b;c;2:3a3bcde
> >
> > This is the output:
> >
> > Property: a
> > Attribute: 3a3bcde
> > Subtoken: ;2
> >
> > What I like to get is:
> >
> > Property: a
> > Subtoken: b
> > Subtoken: c
> > Subtoken: 2
> > Attribute: 3a3bcde
> >
> > I couldn't find in the docs how to match multiple tokens that I get
> > from a * or + parser.
> >
> > A second question is how to not include the ';' in the match.
> >
> > I tried it for some time now, but I find no way. Could someone give
> > me an hint.
> >
> > regards
> > Andreas
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >