[antlr-interest] Stripping Tokens, Skipping leading text

Fri May 8 17:31:31 PDT 2009

Thanks Gavin,

Although the first solution is a hack (using out-of-band target-specific
code), I greatly appreciate it and will use it. However, the underlying
issue remains a curiosity for me.

I'll follow your advice and will research both lexical filters and island
grammars.

Thanks again,
Christian.

On Sat, May 9, 2009 at 11:48 AM, Gavin Lambert <antlr at mirality.co.nz> wrote:

> At 11:33 9/05/2009, Christian Schladetsch wrote:
>
>> My attempts so far have failed:
>>
>>    CODE_BLOCK: '[[' (options{greedy=false;}:.)* ']]' ;
>>
>> This correctly parses the entire token, but the token value in the lexer
>> contains the enclosing delimiters '[[' and ']]'
>>
>
> CODE_BLOCK: '[[' .* ']]' { setText($text.substring(2, $length.length()-4));
> };
>
> (Minor variation needed to make it C#, but that should give you the general
> idea.)
>
>  While I'm here, I have a similar problem. I'd like to skip all input until
>> a starting token is found:
>>
>>    any text here that is not parsed lah di dah /** text here is parsed **/
>> no text parsing here
>>
>
> You might want to look into filter lexers, or island grammars.  But anyway:
>
> START
>  : ( ~'/'
>    | '/' ~'*'
>    | '/*' ~'*'
>    )*
>    '/**'
>  ;
>
> This sort of thing is dangerous, though; there's a very good probability
> that it will mess up the contents of what you're trying to parse as well.
>
> A better solution is to match the whole /** (anything) **/ sequence as a
> single lexer token, and then run another lexer/parser over the result -- ie.
> an island grammar.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090509/85315368/attachment.html