[antlr-interest] No viable for alternative with ISO-LATIN-1 non-breaking space character
Darach Ennis
darach at gmail.com
Mon Feb 18 14:38:21 PST 2008
Hi Jim.
Bingo! Thank you! You were very close:
new ANTLRFileStream("/tmp/nbsp.txt", "ISO-8859-1")
The non-breaking-space is encoding specific and my input stream is
iso-8859-1
so this should be iso-8859-1 in my case. What is the default encoding in
ANTLRInputStream?
Is it UTF-8 or the system encoding? The javadoc could mention what the
default is.
Regards,
Darach.
PS: I generally use the POSIX.1 od utility (od -H file.txt on unix/linux)
to verify characters in the input encoding.
On Feb 18, 2008 8:53 PM, Jim Idle <jimi at temporal-wave.com> wrote:
> Are you sure that that is actually character 0xa0? Print the hex value
> of it.
>
>
>
> However, I think that perhaps you need to add the "UTF8" encoding option
> to your input stream?
>
>
>
> new ANTLRFileStream((/tmp/nbsp.txt", "UTF8")
>
>
>
> Jim
>
>
>
> *From:* Darach Ennis [mailto:darach at gmail.com]
> *Sent:* Monday, February 18, 2008 8:59 AM
> *To:* antlr-interest at antlr.org
> *Subject:* [antlr-interest] No viable for alternative with ISO-LATIN-1
> non-breaking space character
>
>
>
> Hi guys,
>
> I'm not sure if this is a case of user error or a bug. I have replicated
> the issue in a testcase as follows:
>
> grammar Test;
>
> @parser::header {
> import java.io.FileInputStream;
> }
>
> @parser::members {
> public static void main(String args[]) throws Throwable {
> final ANTLRInputStream cs = new ANTLRInputStream(new
> FileInputStream("/tmp/nbsp.txt"));
> final TestLexer sl = new TestLexer(cs);
> final CommonTokenStream cts = new CommonTokenStream(sl);
> final TestParser sp = new TestParser(cts);
> sp.rules();
> }
> }
>
> rules: anything+;
> anything: Other | Directive ;
> Other: '-' ( ('directive') => ('directive') { $type = Directive; } | /*
> empty */ );
> WS : (' ' | '\t' | '\f' | '\r' | '\n' | '\u00a0') { $channel=HIDDEN;
> };
>
> Despite defining a non-breaking space (iso-latin-1) within the whitespace
> hiding lexer rule 'WS'
> test input with this character fails to parse as expected. Here is some
> test input:
>
> -directive †-directive †-directive †-directive - - -
>
> Here is some example output:
>
> line 1:11 no viable alternative at character '†'
> line 1:24 no viable alternative at character '†'
> line 1:37 no viable alternative at character '†'
>
>
> Given the above grammar I would have expected the non-breaking space
> (\u00a0) to be ignored.
>
> Is this a bug or user error? If user error, can anyone suggest a grammar
> fix?
>
> Regards,
>
> Darach.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080218/abf2da0f/attachment.html
More information about the antlr-interest
mailing list