[antlr-interest] Stripping Tokens, Skipping leading text
Christian Schladetsch
christian.schladetsch at gmail.com
Fri May 8 17:31:31 PDT 2009
Thanks Gavin,
Although the first solution is a hack (using out-of-band target-specific
code), I greatly appreciate it and will use it. However, the underlying
issue remains a curiosity for me.
I'll follow your advice and will research both lexical filters and island
grammars.
Thanks again,
Christian.
On Sat, May 9, 2009 at 11:48 AM, Gavin Lambert <antlr at mirality.co.nz> wrote:
> At 11:33 9/05/2009, Christian Schladetsch wrote:
>
>> My attempts so far have failed:
>>
>> CODE_BLOCK: '[[' (options{greedy=false;}:.)* ']]' ;
>>
>> This correctly parses the entire token, but the token value in the lexer
>> contains the enclosing delimiters '[[' and ']]'
>>
>
> CODE_BLOCK: '[[' .* ']]' { setText($text.substring(2, $length.length()-4));
> };
>
> (Minor variation needed to make it C#, but that should give you the general
> idea.)
>
> While I'm here, I have a similar problem. I'd like to skip all input until
>> a starting token is found:
>>
>> any text here that is not parsed lah di dah /** text here is parsed **/
>> no text parsing here
>>
>
> You might want to look into filter lexers, or island grammars. But anyway:
>
> START
> : ( ~'/'
> | '/' ~'*'
> | '/*' ~'*'
> )*
> '/**'
> ;
>
> This sort of thing is dangerous, though; there's a very good probability
> that it will mess up the contents of what you're trying to parse as well.
>
> A better solution is to match the whole /** (anything) **/ sequence as a
> single lexer token, and then run another lexer/parser over the result -- ie.
> an island grammar.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090509/85315368/attachment.html
More information about the antlr-interest
mailing list