[antlr-interest] No viable for alternative with ISO-LATIN-1 non-breaking space character

Mon Feb 18 14:38:21 PST 2008

Hi Jim.

Bingo! Thank you! You were very close:

new ANTLRFileStream("/tmp/nbsp.txt", "ISO-8859-1")

The non-breaking-space is encoding specific and my input stream is
iso-8859-1
so this should be iso-8859-1 in my case. What is the default encoding in
ANTLRInputStream?
Is it UTF-8 or the system encoding? The javadoc could mention what the
default is.

Regards,

Darach.

PS: I generally use the POSIX.1 od  utility (od -H file.txt on unix/linux)
to verify characters in the input encoding.

On Feb 18, 2008 8:53 PM, Jim Idle <jimi at temporal-wave.com> wrote:

>  Are you sure that that is actually  character 0xa0? Print the hex value
> of it.
>
>
>
> However, I think that perhaps  you need to add the "UTF8" encoding option
> to your input stream?
>
>
>
> new ANTLRFileStream((/tmp/nbsp.txt", "UTF8")
>
>
>
> Jim
>
>
>
> *From:* Darach Ennis [mailto:darach at gmail.com]
> *Sent:* Monday, February 18, 2008 8:59 AM
> *To:* antlr-interest at antlr.org
> *Subject:* [antlr-interest] No viable for alternative with ISO-LATIN-1
> non-breaking space character
>
>
>
> Hi guys,
>
> I'm not sure if this is a case of user error or a bug. I have replicated
> the issue in a testcase as follows:
>
> grammar Test;
>
> @parser::header {
>   import java.io.FileInputStream;
> }
>
> @parser::members {
>   public static void main(String args[]) throws Throwable {
>     final ANTLRInputStream cs = new ANTLRInputStream(new
> FileInputStream("/tmp/nbsp.txt"));
>     final TestLexer sl = new TestLexer(cs);
>     final CommonTokenStream cts = new CommonTokenStream(sl);
>     final TestParser sp = new TestParser(cts);
>     sp.rules();
>   }
> }
>
> rules:    anything+;
> anything: Other | Directive ;
> Other:   '-' ( ('directive') => ('directive') { $type = Directive; } | /*
> empty */ );
> WS    :    (' ' | '\t' | '\f' | '\r' | '\n' | '\u00a0') { $channel=HIDDEN;
> };
>
> Despite defining a non-breaking space (iso-latin-1) within the whitespace
> hiding lexer rule 'WS'
> test input with this character fails to parse as expected. Here is some
> test input:
>
> -directive †-directive †-directive †-directive - - -
>
> Here is some example output:
>
> line 1:11 no viable alternative at character '†'
> line 1:24 no viable alternative at character '†'
> line 1:37 no viable alternative at character '†'
>
>
> Given the above grammar I would have expected the non-breaking space
> (\u00a0) to be ignored.
>
> Is this a bug or user error? If user error, can anyone suggest a grammar
> fix?
>
> Regards,
>
> Darach.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080218/abf2da0f/attachment.html