[antlr-interest] No viable for alternative with ISO-LATIN-1 non-breaking space character
Darach Ennis
darach at gmail.com
Mon Feb 18 08:58:41 PST 2008
Hi guys,
I'm not sure if this is a case of user error or a bug. I have replicated the
issue in a testcase as follows:
grammar Test;
@parser::header {
import java.io.FileInputStream;
}
@parser::members {
public static void main(String args[]) throws Throwable {
final ANTLRInputStream cs = new ANTLRInputStream(new
FileInputStream("/tmp/nbsp.txt"));
final TestLexer sl = new TestLexer(cs);
final CommonTokenStream cts = new CommonTokenStream(sl);
final TestParser sp = new TestParser(cts);
sp.rules();
}
}
rules: anything+;
anything: Other | Directive ;
Other: '-' ( ('directive') => ('directive') { $type = Directive; } | /*
empty */ );
WS : (' ' | '\t' | '\f' | '\r' | '\n' | '\u00a0') { $channel=HIDDEN;
};
Despite defining a non-breaking space (iso-latin-1) within the whitespace
hiding lexer rule 'WS'
test input with this character fails to parse as expected. Here is some test
input:
-directive †-directive †-directive †-directive - - -
Here is some example output:
line 1:11 no viable alternative at character '†'
line 1:24 no viable alternative at character '†'
line 1:37 no viable alternative at character '†'
Given the above grammar I would have expected the non-breaking space
(\u00a0) to be ignored.
Is this a bug or user error? If user error, can anyone suggest a grammar
fix?
Regards,
Darach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080218/78a59d16/attachment.html
More information about the antlr-interest
mailing list