[antlr-interest] Unicode input
Alex Lujan
alex at apption.com
Wed Feb 9 06:34:30 PST 2011
I just realized, I'm also getting this error:
line 1:0 mismatched input '0' expecting BYTE_VALUE
Where the following rule exists within my grammar:
BYTE_VALUE : '\u0000'..'\uFFFE';
Which in my understanding should match any Unicode character that can be
represented with the UTF8 encoding.
The question is: why is the character 0 (Digit zero) not matching the
BYTE_VALUE rule.
I have verified that the first character of the input is 0 ('\u0030').
Any clues?
On Tue, Feb 8, 2011 at 5:18 PM, Alex Lujan <alex at apption.com> wrote:
> Im having an issue with parsing an input that contains unicode characters.
>
> This is the code Im using to test the parser (messageBytes is an array
> created by reading bytes from a binary file):
>
> private static void parseMessage(byte[] messageBytes) throws IOException{
>
> ByteArrayInputStream input = new
> ByteArrayInputStream(messageBytes);
> ANTLRInputStream in = new ANTLRInputStream(input);
> UnitedToteLexer lexer = new UnitedToteLexer(in);
> CommonTokenStream tokens = new CommonTokenStream(lexer);
> UnitedToteParser parser = new UnitedToteParser(tokens);
>
>
> try {
> parser.message();
>
> printHexArray(messageBytes);
>
> } catch (Exception e){
> // TODO handle unrecognized message formats
> System.out.println("Unrecognized message format");
> }
> }
>
> The main problem I have at the moment is that I get a number of these guys:
>
> line 1:1 no viable alternative at character ' '
> line 1:2 no viable alternative at character '�'
> line 1:3 no viable alternative at character '�'
> line 1:4 no viable alternative at character 'x'
> line 1:5 no viable alternative at character '?'
> ...
>
> Essentially, one for each character that is not explicitely defined as a
> token in my grammar. Nonetheless, I do have the following rule:
>
> BYTE_VALUE : '\u0000'..'\uFFFE';
>
> Which should, if I understand correctly, include all unicode characters.
>
> Now, I understand there was a charVocabulary option in previous versions of
> ANTLR to aid with this problem, but it seems it was removed in ANTLR 3.
>
> Was this problem solved in a different way?
>
> [btw my grammar is rather large, Im not sure I should post 400 lines in
> this message.]
>
>
--
Alejandro Lujan
Apption Software
(613) 725 62 68 x625
More information about the antlr-interest
mailing list