[antlr-interest] Lexing an interesting syntax

Jim Idle jimi at temporal-wave.com
Wed Jan 2 08:42:45 PST 2008


> -----Original Message-----
> From: Ola Bini [mailto:ola.bini at gmail.com]
> Sent: Wednesday, January 02, 2008 8:13 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Lexing an interesting syntax
> 
> Hi,
> 
> Just started work on a lexer for an Io-based language. I want the 
lexing
> to handle the same constructs as Io, and mostly it's really easy. I 
hit
> one little snag though. I have a solution, but it's incredibly ugly. 
So
> I'm wondering how this can be done in the Antlr way.
> 
> To make it easy, the lexing is only on identifiers, where any
> combination of the letter "s" and ":" is valid, "=", ":=" and "::=" is
> valid. That's all.
> With these constraints, I need:
> 
> * "s:" to lex into "s:"
> * "s:=" to lex into "s" and ":="
> * "s::=" to lex into "s:" and ":="
> * "s::::=" to lex into "s:::" and ":="

Don't try to do so much of this in the lexer is the answer. Allow a 
separate token, COLON and either make the operator ":=" in the lexer or 
perhaps even parse that in the parser. But adding a bit more to your 
requirements by guessing ;-), then for the input:

s:
s:=f
s::=f
s::::=f
s:=h==i

The grammar below should do it:

grammar t;

code
	: line*
	;

line
	: id ((COLEQ | OPASS) expr?)?
	;
	
expr
	: e1 (OPEQ e1)*
	;

e1
	: id
	;

id
	: ID COLON*
	;
	
COLEQ	:	':='	;
OPEQ	:	'=='	;
OPASS	:	'='		;
COLON	: 	':'		;

ID	:	 'a'..'z'+ 	;

WS 	:	 ('\r' | ' ' | '\n' | '\t')+
		{
			$channel = HIDDEN;
		}
	;


Note that you may find the lexer does not do what you expect if you have 
other uses of COLON in pairs of operator characters. Then you will need 
to use predicates in a COLON lexer rule that start with ':', then select 
'', '=' and 'x' where 'x' is your other character, and set $type 
accordingly. Sounds like you won't need that though.

Jim




More information about the antlr-interest mailing list