[antlr-interest] on parsers look and feel

Fri Nov 28 06:18:30 PST 2003

On Wed, Nov 26, 2003 at 10:42:14AM +0200, Cristian Amitroaie wrote:
>    o sometimes I kind of foreget what name I gave to the "=" token from the 
> Lexer (EQ/EQUAL/EQUALS/ASSIGN) when I want to add a new rule to a parser.
>    o sometimes I get bored to write LCURLEY instead of "{" or '{'
>    o sometimes it's hard for me to follow rules full of SEMI, LCURL(E)?Y, 
> LBRACK, LPARENS and so on
> 
> For example, I would like to see my parser rules look like:
> 
> assign:
>         ID "="^ ID ";"!
>     ;
> I browsed throw the documentation/big examples, yet I couldn't find any 
> similar approach as a guideline or something.
>
> Are there any disadvantages/risks related to this approach?

As long as you keep things well synchronized you'll be ok. Easiest (in my
experience) is to 'incrementally' build your tokens with
export/importvocabs. I personally already start doing this 'trick' in the
lexer so after that I can use in parser and treewalkers the real token in
stead of some enumerated value (e.g."=" in stead of ASSIGN).

Basically I repeat in the tokens section of the lexer the strings used when
matching in the rules.

class Example_Lexer extends Lexer;
options {
	k = 2;
	charVocabulary= '\u0000' .. '\u00FF';
	// Settings for literal matching
	caseSensitiveLiterals = false;	// case matters!
	testLiterals = false;
	defaultErrorHandler = true;
	exportVocab = Lexer_;
}
tokens { /*{{{*/
	MCONST="const";   // matched via IDENTIFIER rule
	MEXTERN="extern"; // use M<id> here to prevent clashes with some
<.snip.>             // and often used defines (Tcl/Tk to name one)
   BOR="|";          // duplicated from rules so I can use "|" in parser
	NOT_OP="!";
<.snip.>
   GE_OP=">=";
<.snip.>
	AT="@";
	HASH="#";
	INT;
	FLOAT;
	STRING;
	IDENTIFIER;
	/*}}}*/
}

protected EXPONENT_PART:	( 'e' | 'E' ) ( '+' | '-' )? ('0'..'9')+ ;
protected FLOAT_SUFFIX: ('F'|'f'|'L'|'l') ;

DOT_OR_DOTDOT:
	".."	{ $setType(DOTDOT); }
|	'.'	{ $setType(DOT); }
;

NUMERIC:
	('0'..'9')+	{ $setType(INT); }
	(
	|	{ LA(2) >= '0' && LA(2) <= '9' }? '.' ('0'..'9')+ (EXPONENT_PART)? (FLOAT_SUFFIX)? { $setType(FLOAT); }
	|	EXPONENT_PART (FLOAT_SUFFIX)?  { $setType(FLOAT); }
	|	FLOAT_SUFFIX  { $setType(FLOAT); }
	)
;

IDENTIFIER options { testLiterals = true; }:
	( 'a' .. 'z' | 'A' .. 'Z' | '_' )
	( 'a' .. 'z' | 'A' .. 'Z' | '0'..'9' | '_' | '#' )*
;

NOT_OP:		"!";
QUESTION:	"?";
AND_OP:		"&&";
OR_OP:		"||";
EQ_OP:		"==";
NE_OP:		"!=";
LT_OP:		"<";
GT_OP:		">";
LE_OP:		"<=";
GE_OP:		">=";
PLUS:			"+";
MINUS:		"-";
MULT:			"*";
AMPERSAND:	"&";
BOR:			"|";
EOR:			"^";
MOD:			"%";

SHIFTR_OR_ASGN:	">>" { $setType(SHIFTR); } ( "=" { $setType(SR_ASSIGN); } )?;
SHIFTL_OR_ASGN:	"<<" { $setType(SHIFTL); } ( "=" { $setType(SL_ASSIGN); } )?;

COMMA:			",";
ASSIGN:			"=";
PLUS_ASSIGN:	"+=";
MINUS_ASSIGN:	"-=";
MULT_ASSIGN:	"*=";
DIV_ASSIGN:		"/=";
MOD_ASSIGN:		"%=";
BAND_ASSIGN:	"&=";
BXOR_ASSIGN:	"^=";
BOR_ASSIGN:		"|=";
ASSIGN_START:	"{=";
ASSIGN_END:		"=}";
LBRACE:			"(";
RBRACE:			")";
LCURL:			"{";
RCURL:			"}";
LBRACKET:		"[";
RBRACKET:		"]";
DCOLON:			"::";
COLON:			":";
SEMICOLON:		";";
AT:				"@";
HASH:				"#";
------snip----

Another approach is to make you xxxTokenTypes.txt and .hpp/.java yourself
and import that one into all lexer/parser/treeparsers.

> And the walkers import the lexers vocabulary (see the attached files).

I always import from the lexer/parser/treewalker one stage below in the
hierarchy. So: lexer exports to parser exports to treewalker exports to
treewalker exports to treewalker etc. That way you'll always be sure to
import tokens that were introduced in the stage below.

> Or it's just a matter of taste?

It might well be :)

Cheers,

Ric
-- 
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893722  ----
-----+++++*****************************************************+++++++++-------
  Chaos often breeds life, when order breeds habit.
  --- Henry B. Adams, The Education of Henry Adams

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/