[antlr-interest] Stripping Tokens, Skipping leading text
Gavin Lambert
antlr at mirality.co.nz
Fri May 8 16:48:41 PDT 2009
At 11:33 9/05/2009, Christian Schladetsch wrote:
>My attempts so far have failed:
>
> CODE_BLOCK: '[[' (options{greedy=false;}:.)* ']]' ;
>
>This correctly parses the entire token, but the token value in
>the lexer contains the enclosing delimiters '[[' and ']]'
CODE_BLOCK: '[[' .* ']]' { setText($text.substring(2,
$length.length()-4)); };
(Minor variation needed to make it C#, but that should give you
the general idea.)
>While I'm here, I have a similar problem. I'd like to skip all
>input until a starting token is found:
>
> any text here that is not parsed lah di dah /** text here is
> parsed **/ no text parsing here
You might want to look into filter lexers, or island
grammars. But anyway:
START
: ( ~'/'
| '/' ~'*'
| '/*' ~'*'
)*
'/**'
;
This sort of thing is dangerous, though; there's a very good
probability that it will mess up the contents of what you're
trying to parse as well.
A better solution is to match the whole /** (anything) **/
sequence as a single lexer token, and then run another
lexer/parser over the result -- ie. an island grammar.
More information about the antlr-interest
mailing list