[antlr-interest] Problem parsing unit symbols

Jim Idle jimi at temporal-wave.com
Thu Nov 5 09:41:34 PST 2009


When you run the debugger, look along the list of tabs at the bottom and you will find the 'output' tab. Select this and you will see that your lexer is saying:

line 1:0 no viable alternative at character '?'
line 2:0 no viable alternative at character 'å'

So either the lexer specs are incorrect, or the characters you pasted here are not in an encoding that matches what Java is looking for. Send them in UTF8 format. The UTF8 version of Ohm is 0xE2 0x84 0xA6 for instance. What encoding are you sending in? When you come to read input files, then you will need to tell the file stream what the file encoding is.

Jim






> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Mark van Assem
> Sent: Thursday, November 05, 2009 9:30 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Problem parsing unit symbols
> 
> Hello Antlers,
> 
> I'm designing a lexer/parser for units of measure (e.g. meters,
> seconds). In that process I'm trying to match symbols like Ω (Ohm) and
> å
> (angstrom).
> 
> Below is the relevant part of the grammar -  the part that treats
> symbols. The grammar checks out OK in ANTLRWorks, but I get a
> EarlyExitException when I run it on a file that contains two lines with
> on the first the Ohm sign and on the second the angstrom sign. The
> behaviour is different in the interpreter: there the first line is
> parsed OK, but for the second line a NoViableAltException is given.
> 
> If I understand correctly an EarlyExitException means that a Expr(..)+
> failed because there wasn't anything to match. The rules "file" and
> "expr" thus seem the only suspects. However, they both seem right to me
> and fiddling with them produces other errors.
> 
> Any ideas anyone?
> 
> Thanks,
> Mark van Assem.
> 
> -----------------------------------------------------------------------
> --
> grammar unitsymbols;
> 
> file	:	(expr NEWLINE)+ ;
> 
> expr 	:	symbol+;
> 
> symbol	:	US;
> 
> /* LEXER */
> 
> WS	:	' ' {$channel=HIDDEN;} ;
> NEWLINE:'\r'? '\n'  ;
> 
> // unit symbols like Ohm
> US
> 	: OHM | ALP ;
> 
> fragment OHM	:	'\u2126' | '\u03A9';	// Ohm symbol
> fragment ALP	:	'\u0251' | '\u03B1';	// alpha
> -----------------------------------------------------------------------
> --
> 
> input:
> 
> Ω
> å
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address





More information about the antlr-interest mailing list