[antlr-interest] Help with discarding lexer tokens....
Stanislav Sokorac
sokorac at gmail.com
Wed Jul 22 17:34:14 PDT 2009
The way I'd do is to first make suffix a lexer rule, and then move it down
below the DISCARD rule so that it can still match:
grammar Test;
cmd : (CMD SUFFIX? )* ;
CMD : 'a'..'z'+ ;
DISCARD : '[remove]' { $channel=HIDDEN; } ;
SUFFIX : '[' CMD ']' ;
WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;
This works for your example.
Stan
On Wed, Jul 22, 2009 at 6:06 AM, Fredrik Ohrstrom <oehrstroem at gmail.com>wrote:
> I would like to ignore certain tokens found at the lexer level.
> For example: my example source code is sprinkled with tokens
> like [remove] and I want to prevent these to be seen by the parser.
>
> This could easily be done with sed, but this is not an options since
> I use TokenRewriteStream to reconstruct the full source with
> modifications and the [remove] tokens must be there.
>
> Unfortunately there is a lookahead conflict with other valid
> sequences like [ret] and the following grammar does not work:
>
> grammar Test;
> cmd : (CMD suffix? )* ;
> suffix : '[' CMD ']' ;
> CMD : 'a'..'z'+ ;
> DISCARD : '[remove]' { $channel=HIDDEN; } ;
> WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;
>
> Fire it up in antlrworks or use antlr-3.1.3 standalone with the Java
> program supplied
> at the end of the email. Try to parse:
>
> alfa[x] beta[remove] gamma[ret]
>
> [x] is properly parsed. [remove] is properly lexed and discarded.
> But [ret] causes the lexer to enter DISCARD and fail. I have tried
> k=4, syntactic predicates, semantic predicates to no avail.
>
> Also, the failed lex messes up the token rewrite stream
> so that it does not reconstruct the original stream as
> [re is simply dropped, which results in broken source.
>
> I did finally stumble upon a solution, but it is ugly.
>
> grammar Test;
> cmd : (CMD suffix? )* ;
> suffix : LB CMD RB ;
> CMD : 'a'..'z'+ ;
> LB
> : '[' { if (input.LA(1)=='r' &&
> input.LA(2)=='e' &&
> input.LA(3)=='m' &&
> input.LA(4)=='o' &&
> input.LA(5)=='v' &&
> input.LA(6)=='e' &&
> input.LA(7)==']') {
> match("remove]");
> $channel=HIDDEN;
> }
> }
> ;
> RB : ']' ;
> WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;
>
> Is there a correct way to do it?
>
> Thanks!
>
> Fredrik Öhrström
>
> ----------------------------------------------
>
> import java.io.*;
> import org.antlr.runtime.*;
>
> public class Test
> {
> public static void main(String... args)
> {
> try {
> StringBuffer b = new StringBuffer();
> String line = null;
> BufferedReader input = new BufferedReader(new FileReader(args[0]));
> for (;;) {
> line = input.readLine();
> if (line==null) break;
> b.append(line);
> }
> String d = b.toString();
> System.out.println("Parsing\n>"+d+"<\n");
> CharStream cs = new ANTLRStringStream(d);
> TestLexer lexer = new TestLexer(cs);
> TokenRewriteStream rew = new TokenRewriteStream(lexer);
> TestParser parser = new TestParser(rew);
> parser.cmd();
>
> System.out.println("Done parsing\n>"+rew.toString()+"<\n");
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
> }
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090722/ae02d24b/attachment.html
More information about the antlr-interest
mailing list