[antlr-interest] chat message parser rule?

Mon Jan 26 12:28:21 PST 2009

ian eyberg wrote:
> Oh awesome antlr apostles:
>
>   I've been stuggling with a parser rule and 
> am hoping you might be able to help me.
>
>   I'm have a bit of data I'm trying to parse that
> looks something like this:
>
>   username: the rain in spain stays....
>
> Running this through antlrworks I am hitting my 'chatrule'
> every time but it will start chopping up the
> player's message into other tokens that I have defined.
>
> My question: why wouldn't the ALPHA token always be
> chosen over the other tokens?
>
> For example say I have a token of:
>   LAMB  : 'rack';
>
> and my player's chat message comes across as:
>   joeblow: the rain in spain
>
>
> my abbr. grammar:
>   
I thk u hv ab. 2 mch. Gmr ds nt mk sns. ;-)

Remember that the lexer rules all run first and tokenize. There is no 
influence on them by the parser. Also there is no guranatee that one 
rule will be chosen over another in the lexer if there ambiguities. make 
sure that you solve any warnings you are getting.

If you define a token:

LAMB : 'newzealand' ;

Then you will ALWAYS get that token back for that sequence. So, if your 
keywords can occur in the message, then you will ahve to cater for them 
in the parser rule:

msg: USER COLON message_stuff NL ;
message_stuff: (ALPHA | WS | LAMB)* ;

However, if everything after the colon should just be eaten, then 
something similar to:

@lexer::members { boolean isMsg = false; }

MESSAGE : { isMsg }?=> ~('\n'|'\r')* { isMsg = false; }
COLON : ':' { isMsg = true; } ;
WS : (' '| '\t') { skip(); } ;
NL : '\r'? '\n' { skip(); } ;
ID :  (LOWER_LETTER | UPPER_LETTER)+ ;

msg : ID COLON MESSAGE ;

Will do you better.

Jim
> --------------------------------------------------------
>
> chatrule  : player COLON_SPACE ~NEWLINE+;
>
> player  : (INT | ALPHA | WS)+ ;
>
> fragment LOWER_LETTER   : 'a'..'z' ;
> fragment UPPER_LETTER   : 'A'..'Z' ;
> ALPHA : (LOWER_LETTER | UPPER_LETTER)+ ;
>
> COMMA_SP  : ',' ' ' ;
> COLON_SP  : ': ';
>
> fragment DIGIT  : ('0'..'9')+ ;
> INT : DIGIT COMMA_SP?;
> COMMA_INT
> NEWLINE : '\r'? '\n' ;
>
> --------------------------------------------------------
>
> thanks,
> Ian
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>