[antlr-interest] How to match a phrase i.e. multiple words?
Gavin Lambert
antlr at mirality.co.nz
Sat Feb 14 23:11:37 PST 2009
At 04:18 15/02/2009, Swaroop C H wrote:
> PHRASE
> : '"' WORD '"' { $text = $WORD.text }
> ;
>
> WORD
> : ( 'a'..'z' | 'A'..'Z' | '.' )+
> ;
>
> WHITESPACE
> : (' '|'\t'|'\n'|'\r')+ { self.skip() }
> ;
>
>The problem is that I'm unable to proceed from here. If I put
>
> PHRASE
> : '"' w=WORD+ '"' { $text = $w.text }
> ;
>
>Then I get the following error:
>
> ANTLR Parser Generator Version 3.1.1
> line 1:22 mismatched character u' ' expecting u'"'
> line 1:28 required (...)+ loop did not match anything at
>character '<EOF>'
> line 1:23 missing PHRASE at u'todo'
> description = <missing PHRASE>
Since PHRASE is a lexer rule, it is at the same "level" as the
WHITESPACE rule, and thus whitespace isn't magically removed from
the character stream (like it would be if it were a parser rule
instead).
If you want to retain the discrete WORD tokens, then you could
change PHRASE into a parser rule. I suspect that's not really
what you want, though.
The simplest thing to do is simply to define a PHRASE as anything
at all within quotes:
PHRASE
: '"' .* '"'
;
If you want to restrict the accepted characters, though, then you
could use something like this:
PHRASE
: '"' (~('\r' | '\n' | '"'))* '"'
;
or this:
PHRASE
: '"' (WORD | ' ' | '\t')* '"'
;
It's usually best though to let your lexer be fairly tolerant, and
raise errors about invalid content at parse or tree-walk time
instead.
More information about the antlr-interest
mailing list