[antlr-interest] How to match a phrase i.e. multiple words?
    Gavin Lambert 
    antlr at mirality.co.nz
       
    Sat Feb 14 23:11:37 PST 2009
    
    
  
At 04:18 15/02/2009, Swaroop C H wrote:
 >    PHRASE
 >        : '"' WORD '"' { $text = $WORD.text }
 >        ;
 >
 >    WORD
 >        : ( 'a'..'z' | 'A'..'Z' | '.' )+
 >        ;
 >
 >    WHITESPACE
 >        :   (' '|'\t'|'\n'|'\r')+ { self.skip() }
 >        ;
 >
 >The problem is that I'm unable to proceed from here. If I put
 >
 >    PHRASE
 >        : '"' w=WORD+ '"' { $text = $w.text }
 >        ;
 >
 >Then I get the following error:
 >
 >    ANTLR Parser Generator  Version 3.1.1
 >    line 1:22 mismatched character u' ' expecting u'"'
 >    line 1:28 required (...)+ loop did not match anything at
 >character '<EOF>'
 >    line 1:23 missing PHRASE at u'todo'
 >    description = <missing PHRASE>
Since PHRASE is a lexer rule, it is at the same "level" as the 
WHITESPACE rule, and thus whitespace isn't magically removed from 
the character stream (like it would be if it were a parser rule 
instead).
If you want to retain the discrete WORD tokens, then you could 
change PHRASE into a parser rule.  I suspect that's not really 
what you want, though.
The simplest thing to do is simply to define a PHRASE as anything 
at all within quotes:
PHRASE
   : '"' .* '"'
   ;
If you want to restrict the accepted characters, though, then you 
could use something like this:
PHRASE
   : '"' (~('\r' | '\n' | '"'))* '"'
   ;
or this:
PHRASE
   : '"' (WORD | ' ' | '\t')* '"'
   ;
It's usually best though to let your lexer be fairly tolerant, and 
raise errors about invalid content at parse or tree-walk time 
instead.
    
    
More information about the antlr-interest
mailing list