[antlr-interest] Problem parsing unit symbols

Mark van Assem mark at cs.vu.nl
Thu Nov 5 09:30:15 PST 2009


Hello Antlers,

I'm designing a lexer/parser for units of measure (e.g. meters, 
seconds). In that process I'm trying to match symbols like Ω (Ohm) and å 
(angstrom).

Below is the relevant part of the grammar -  the part that treats 
symbols. The grammar checks out OK in ANTLRWorks, but I get a 
EarlyExitException when I run it on a file that contains two lines with 
on the first the Ohm sign and on the second the angstrom sign. The 
behaviour is different in the interpreter: there the first line is 
parsed OK, but for the second line a NoViableAltException is given.

If I understand correctly an EarlyExitException means that a Expr(..)+
failed because there wasn't anything to match. The rules "file" and 
"expr" thus seem the only suspects. However, they both seem right to me 
and fiddling with them produces other errors.

Any ideas anyone?

Thanks,
Mark van Assem.

-------------------------------------------------------------------------
grammar unitsymbols;

file	:	(expr NEWLINE)+ ;

expr 	:	symbol+;

symbol	:	US;

/* LEXER */

WS	:	' ' {$channel=HIDDEN;} ;
NEWLINE:'\r'? '\n'  ;

// unit symbols like Ohm
US
	: OHM | ALP ;	

fragment OHM	:	'\u2126' | '\u03A9';	// Ohm symbol
fragment ALP	:	'\u0251' | '\u03B1';	// alpha
-------------------------------------------------------------------------

input:

Ω
å


More information about the antlr-interest mailing list