[antlr-interest] strange? eat not match char

Mon Aug 13 13:39:53 PDT 2007

I ran your grammar, and the lexer rules just mashed all of the text in example 1 in to one BIG token:
[hello world] [ hello world] --->[hello world] [ hello world] 

Also, to let it work I had to change:
> LBAK    =    '{';
> RBAK    =    '}';
To
LBAK : '[';  RBAK : ']';
Because it wasn't doing anything with the brackets (I thought you said the brackets were important...?).

I would suggest not giving the lexer so much power in determining the results.  As for checking for white space...just keep it as hidden (and heaped together), then just check to see if there was white space on the hidden channel:

SPACE    :    (' ' | '\t' | '\f')+ {$channel=2;};
LINE    :    ('\r'? '\n')+{$channel=HIDDEN;};

//check the syntax on this...I'm just making it up
words 
  : WORD {((Token)input.LT(-1)).getChannel()==2}? WORD 
  | WORD
  ;

The less power you give to the lexer, the more flexible the parser can be.  It's a whole lot easier to play with the parser (and there's a lot more visual support with the parser in ANTLRWorks, too).  I'm sure there's a lot better ways than above, but that's just one way.

Good luck,

Matt

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org 
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of ???
> Sent: Monday, August 13, 2007 9:51 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] strange? eat not match char
> 
> Hi, I do a simple grammar, as follows,
> and do follow tests with antlrworks interpreter, and because message
> whitespace, and I bracket string message with[] pair,
> interpreter with "strings" rule:
> 1,[hello world] [ hello world] --->[hello world], ok
> 2,[ hello word ][ hello word ,][ hello word , ]-->NoViableAltException
> 3,[ hello word ,s]-->[ hello word ,s]
> 4,[ hello word , s]-->[s]
> why get 3 and 4 result, it let me questionfull:)
> at 3, comma not a char, but it present
> at 4, message before comma eated, I not understand.
> Could someone give helps?
> Thanks.
> 
> 
> grammar On16;
> 
> /*
> options{
>     k=2;
>     output=AST;
> }
> */
> tokens{
> COMMA    =    ',';
> SEMI    =    ';';
> COLON    =    ':';
> LBAK    =    '{';
> RBAK    =    '}';
> SQUOTE    =    '\'';
> DQUOTE    =    '"';
> }
> @header{package on;}
> @lexer::header{package on;}
> 
> //document:    string|strings|object|objects|pairs;
> /*******************************************
> * parser rulers
> ********************************************/
> strings    :    string  (COMMA string)* COMMA?;
> name    :    words|WORD;
> string    :    words|WORD|STRING;
> words    :    WORDS;
> 
> WORDS    :    WORD (WHITE WORD)+;
> //idname returns [string s] {s = " ";}:
> // t=ID { s += t.getText(); }
> //(options{greedy=true;}: ws=WS { s += ws.getText(); } t2=ID! { s +=
> t2.getText(); } )*;
> 
> //must not be fragment
> WORD    :    CHAR+;
> STRING    :    SQUOTE (~(SQUOTE))* SQUOTE
>     |    DQUOTE (~(DQUOTE))* DQUOTE
>     ;
> 
> /*******************************************
> * lexer rulers
> ********************************************/
> fragment
> WHITE    :    SPACE+ {$channel=0;};
> WS    :    (SPACE | LINE)+ {$channel=HIDDEN;};
> //META CHARACTOR;
> fragment
> CHAR    :    ~(COMMA | SEMI | COLON | LBAK | RBAK | SQUOTE | 
> DQUOTE | SPACE);
> fragment
> SPACE    :    ' ' | '\t' | '\f';
> fragment
> CRLF    :    '\r' | '\n';
> LINE    :    '\r'? '\n';
> //WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+;
> 
> 
> -- 
> 致敬
> 向秦贤
>