[antlr-interest] Help with discarding lexer tokens....
Fredrik Ohrstrom
oehrstroem at gmail.com
Wed Jul 22 03:06:31 PDT 2009
I would like to ignore certain tokens found at the lexer level.
For example: my example source code is sprinkled with tokens
like [remove] and I want to prevent these to be seen by the parser.
This could easily be done with sed, but this is not an options since
I use TokenRewriteStream to reconstruct the full source with
modifications and the [remove] tokens must be there.
Unfortunately there is a lookahead conflict with other valid
sequences like [ret] and the following grammar does not work:
grammar Test;
cmd : (CMD suffix? )* ;
suffix : '[' CMD ']' ;
CMD : 'a'..'z'+ ;
DISCARD : '[remove]' { $channel=HIDDEN; } ;
WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;
Fire it up in antlrworks or use antlr-3.1.3 standalone with the Java
program supplied
at the end of the email. Try to parse:
alfa[x] beta[remove] gamma[ret]
[x] is properly parsed. [remove] is properly lexed and discarded.
But [ret] causes the lexer to enter DISCARD and fail. I have tried
k=4, syntactic predicates, semantic predicates to no avail.
Also, the failed lex messes up the token rewrite stream
so that it does not reconstruct the original stream as
[re is simply dropped, which results in broken source.
I did finally stumble upon a solution, but it is ugly.
grammar Test;
cmd : (CMD suffix? )* ;
suffix : LB CMD RB ;
CMD : 'a'..'z'+ ;
LB
: '[' { if (input.LA(1)=='r' &&
input.LA(2)=='e' &&
input.LA(3)=='m' &&
input.LA(4)=='o' &&
input.LA(5)=='v' &&
input.LA(6)=='e' &&
input.LA(7)==']') {
match("remove]");
$channel=HIDDEN;
}
}
;
RB : ']' ;
WS : (' '|'\t'|'\r'|'\n')+ { $channel=HIDDEN; } ;
Is there a correct way to do it?
Thanks!
Fredrik Öhrström
----------------------------------------------
import java.io.*;
import org.antlr.runtime.*;
public class Test
{
public static void main(String... args)
{
try {
StringBuffer b = new StringBuffer();
String line = null;
BufferedReader input = new BufferedReader(new FileReader(args[0]));
for (;;) {
line = input.readLine();
if (line==null) break;
b.append(line);
}
String d = b.toString();
System.out.println("Parsing\n>"+d+"<\n");
CharStream cs = new ANTLRStringStream(d);
TestLexer lexer = new TestLexer(cs);
TokenRewriteStream rew = new TokenRewriteStream(lexer);
TestParser parser = new TestParser(rew);
parser.cmd();
System.out.println("Done parsing\n>"+rew.toString()+"<\n");
} catch (Exception e) {
e.printStackTrace();
}
}
}
More information about the antlr-interest
mailing list