[antlr-interest] No viable for alternative with ISO-LATIN-1 non-breaking space character

Darach Ennis darach at gmail.com
Mon Feb 18 08:58:41 PST 2008


Hi guys,

I'm not sure if this is a case of user error or a bug. I have replicated the
issue in a testcase as follows:

grammar Test;

@parser::header {
  import java.io.FileInputStream;
}

@parser::members {
  public static void main(String args[]) throws Throwable {
    final ANTLRInputStream cs = new ANTLRInputStream(new
FileInputStream("/tmp/nbsp.txt"));
    final TestLexer sl = new TestLexer(cs);
    final CommonTokenStream cts = new CommonTokenStream(sl);
    final TestParser sp = new TestParser(cts);
    sp.rules();
  }
}

rules:    anything+;
anything: Other | Directive ;
Other:   '-' ( ('directive') => ('directive') { $type = Directive; } | /*
empty */ );
WS    :    (' ' | '\t' | '\f' | '\r' | '\n' | '\u00a0') { $channel=HIDDEN;
};

Despite defining a non-breaking space (iso-latin-1) within the whitespace
hiding lexer rule 'WS'
test input with this character fails to parse as expected. Here is some test
input:

-directive †-directive †-directive †-directive - - -

Here is some example output:

line 1:11 no viable alternative at character '†'
line 1:24 no viable alternative at character '†'
line 1:37 no viable alternative at character '†'


Given the above grammar I would have expected the non-breaking space
(\u00a0) to be ignored.

Is this a bug or user error? If user error, can anyone suggest a grammar
fix?

Regards,

Darach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080218/78a59d16/attachment.html 


More information about the antlr-interest mailing list