[antlr-interest] Lexical token with fix length

Johannes Luber jaluber at gmx.de
Fri Jun 29 06:14:20 PDT 2007


Stefan Wohlgemuth wrote:
> Hi
> I'm trying to define a grammar which has tokens with a fix length. What
> is the best way to do this?
> 
> I've tried it with this:
> 
> test  :     N1 N2;
> 
> N1  :  NDigits[1] ;
> 
> N2  :  NDigits[2] ;
> 
> fragment
> NDigits[int n]
>   :
>   {$n==1}?=> Digit
>   |
>   {$n==2}?=> Digit Digit
>   ;
> 
> Digit :    '0'..'9';
> 
> 
> But I get a compile error in the public void mTokens() method of my
> Lexer class because the variable n is not known there.

It is generally a bad idea to use parameters for validating semantic
predicates, as those can be hoisted into the calling rules. You have to
use scopes instead like this:

 // A Unicode character of the class Cf (possibly encoded)
fragment FORMATTING_CHARACTER
scope UnicodeClassScope;
	:	UNICODE_CLASS_Cf
	|	{ $UnicodeClassScope::allowedClass = UnicodeCategory.Format; }
UNICODE_ESCAPE_SEQUENCE
	;

// Restricts the unicode escape sequence to certain unicode character
classes
fragment UNICODE_ESCAPE_SEQUENCE
scope UnicodeClassScope;
	:	'\\u' { Char.GetUnicodeCategory((char)
ConvertHexCharArrayIntoInt32(new char[]{
		(char) input.LT(1), (char) input.LT(2), (char) input.LT(3), (char)
input.LT(4)})) == $UnicodeClassScope::allowedClass }?
		HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
	|	'\\U' {
Char.GetUnicodeCategory(TransformUtf32ToUtf16(ConvertHexCharArrayIntoInt32(new
char[]{
		(char) input.LT(1), (char) input.LT(2), (char) input.LT(3), (char)
input.LT(4),
		(char) input.LT(5), (char) input.LT(6), (char) input.LT(7), (char)
input.LT(8)})), 0) == $UnicodeClassScope::allowedClass }?
		HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
HEX_DIGIT
	;

Sorry for the poor formatting, but I suppose even with reformatting for
email the readability is worse than pasting this snippet into
ANTLRworks. Don't forget to declare the used scope, too!

Best regards,
Johannes Luber


More information about the antlr-interest mailing list