[antlr-interest] on parsers look and feel + #["="]

Cristian Amitroaie cristian at amiq.ro
Fri Nov 28 07:57:39 PST 2003


Hi Ric,

On Friday 28 November 2003 16:18, Ric Klaren wrote:
> On Wed, Nov 26, 2003 at 10:42:14AM +0200, Cristian Amitroaie wrote:
> >    o sometimes I kind of foreget what name I gave to the "=" token from
> > the Lexer (EQ/EQUAL/EQUALS/ASSIGN) when I want to add a new rule to a
> > parser. o sometimes I get bored to write LCURLEY instead of "{" or '{' o
> > sometimes it's hard for me to follow rules full of SEMI, LCURL(E)?Y,
> > LBRACK, LPARENS and so on
> >
> > For example, I would like to see my parser rules look like:
> >
> > assign:
> >         ID "="^ ID ";"!
> >     ;
> > I browsed throw the documentation/big examples, yet I couldn't find any
> > similar approach as a guideline or something.
> >
> > Are there any disadvantages/risks related to this approach?
>
> As long as you keep things well synchronized you'll be ok. Easiest (in my
> experience) is to 'incrementally' build your tokens with
> export/importvocabs. I personally already start doing this 'trick' in the
> lexer so after that I can use in parser and treewalkers the real token in
> stead of some enumerated value (e.g."=" in stead of ASSIGN).
>
> Basically I repeat in the tokens section of the lexer the strings used when
> matching in the rules.

Yes it works fine, thanks for the suggestion. 

I am also a fan of importing vocabs lexer -> parser -> walker way, yet I had a 
lot of trouble with a language containing many keywords and I almost always 
foregot to add the new literal to lexer's token table, hence now I am 
considering importing the parser's vocab into the lexer. 

Even doing so, your sollution still applies, thanks.

We still have an issue, that is #[] constructs when building ASTs. It's not 
straightforward. You need to write #[EQ, "="]. Why not #["="]? Afterall antlr 
computes a token table with enum_type/string/numbers associations...

Maybe we should ask Terr for an enhancement?

>
> class Example_Lexer extends Lexer;
> options {
> 	k = 2;
> 	charVocabulary= '\u0000' .. '\u00FF';
> 	// Settings for literal matching
> 	caseSensitiveLiterals = false;	// case matters!
> 	testLiterals = false;
> 	defaultErrorHandler = true;
> 	exportVocab = Lexer_;
> }
> tokens { /*{{{*/
> 	MCONST="const";   // matched via IDENTIFIER rule
> 	MEXTERN="extern"; // use M<id> here to prevent clashes with some
> <.snip.>             // and often used defines (Tcl/Tk to name one)
>    BOR="|";          // duplicated from rules so I can use "|" in parser
> 	NOT_OP="!";
> <.snip.>
>    GE_OP=">=";
> <.snip.>
> 	AT="@";
> 	HASH="#";
> 	INT;
> 	FLOAT;
> 	STRING;
> 	IDENTIFIER;
> 	/*}}}*/
> }
>
> protected EXPONENT_PART:	( 'e' | 'E' ) ( '+' | '-' )? ('0'..'9')+ ;
> protected FLOAT_SUFFIX: ('F'|'f'|'L'|'l') ;
>
> DOT_OR_DOTDOT:
> 	".."	{ $setType(DOTDOT); }
>
> |	'.'	{ $setType(DOT); }
>
> ;
>
> NUMERIC:
> 	('0'..'9')+	{ $setType(INT); }
> 	(
>
> 	|	{ LA(2) >= '0' && LA(2) <= '9' }? '.' ('0'..'9')+ (EXPONENT_PART)?
> 	| (FLOAT_SUFFIX)? { $setType(FLOAT); } EXPONENT_PART (FLOAT_SUFFIX)?  {
> 	| $setType(FLOAT); }
> 	|	FLOAT_SUFFIX  { $setType(FLOAT); }
>
> 	)
> ;
>
> IDENTIFIER options { testLiterals = true; }:
> 	( 'a' .. 'z' | 'A' .. 'Z' | '_' )
> 	( 'a' .. 'z' | 'A' .. 'Z' | '0'..'9' | '_' | '#' )*
> ;
>
> NOT_OP:		"!";
> QUESTION:	"?";
> AND_OP:		"&&";
> OR_OP:		"||";
> EQ_OP:		"==";
> NE_OP:		"!=";
> LT_OP:		"<";
> GT_OP:		">";
> LE_OP:		"<=";
> GE_OP:		">=";
> PLUS:			"+";
> MINUS:		"-";
> MULT:			"*";
> AMPERSAND:	"&";
> BOR:			"|";
> EOR:			"^";
> MOD:			"%";
>
> SHIFTR_OR_ASGN:	">>" { $setType(SHIFTR); } ( "=" { $setType(SR_ASSIGN); }
> )?; SHIFTL_OR_ASGN:	"<<" { $setType(SHIFTL); } ( "=" { $setType(SL_ASSIGN);
> } )?;
>
> COMMA:			",";
> ASSIGN:			"=";
> PLUS_ASSIGN:	"+=";
> MINUS_ASSIGN:	"-=";
> MULT_ASSIGN:	"*=";
> DIV_ASSIGN:		"/=";
> MOD_ASSIGN:		"%=";
> BAND_ASSIGN:	"&=";
> BXOR_ASSIGN:	"^=";
> BOR_ASSIGN:		"|=";
> ASSIGN_START:	"{=";
> ASSIGN_END:		"=}";
> LBRACE:			"(";
> RBRACE:			")";
> LCURL:			"{";
> RCURL:			"}";
> LBRACKET:		"[";
> RBRACKET:		"]";
> DCOLON:			"::";
> COLON:			":";
> SEMICOLON:		";";
> AT:				"@";
> HASH:				"#";
> ------snip----
>
> Another approach is to make you xxxTokenTypes.txt and .hpp/.java yourself
> and import that one into all lexer/parser/treeparsers.
>
> > And the walkers import the lexers vocabulary (see the attached files).
>
> I always import from the lexer/parser/treewalker one stage below in the
> hierarchy. So: lexer exports to parser exports to treewalker exports to
> treewalker exports to treewalker etc. That way you'll always be sure to
> import tokens that were introduced in the stage below.
>
> > Or it's just a matter of taste?
>
> It might well be :)
>
> Cheers,
>
> Ric


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list