[antlr-interest] on parsers look and feel + #["="]

Cristian Amitroaie cristian at amiq.ro
Fri Nov 28 08:22:14 PST 2003


Oops,

On Friday 28 November 2003 17:57, Cristian Amitroaie wrote:
> Hi Ric,
>
> On Friday 28 November 2003 16:18, Ric Klaren wrote:
> > On Wed, Nov 26, 2003 at 10:42:14AM +0200, Cristian Amitroaie wrote:
> > >    o sometimes I kind of foreget what name I gave to the "=" token from
> > > the Lexer (EQ/EQUAL/EQUALS/ASSIGN) when I want to add a new rule to a
> > > parser. o sometimes I get bored to write LCURLEY instead of "{" or '{'
> > > o sometimes it's hard for me to follow rules full of SEMI, LCURL(E)?Y,
> > > LBRACK, LPARENS and so on
> > >
> > > For example, I would like to see my parser rules look like:
> > >
> > > assign:
> > >         ID "="^ ID ";"!
> > >     ;
> > > I browsed throw the documentation/big examples, yet I couldn't find any
> > > similar approach as a guideline or something.
> > >
> > > Are there any disadvantages/risks related to this approach?
> >
> > As long as you keep things well synchronized you'll be ok. Easiest (in my
> > experience) is to 'incrementally' build your tokens with
> > export/importvocabs. I personally already start doing this 'trick' in the
> > lexer so after that I can use in parser and treewalkers the real token in
> > stead of some enumerated value (e.g."=" in stead of ASSIGN).
> >
> > Basically I repeat in the tokens section of the lexer the strings used
> > when matching in the rules.
>
> Yes it works fine, thanks for the suggestion.
>
> I am also a fan of importing vocabs lexer -> parser -> walker way, yet I
> had a lot of trouble with a language containing many keywords and I almost
> always foregot to add the new literal to lexer's token table, hence now I
> am considering importing the parser's vocab into the lexer.
>
> Even doing so, your sollution still applies, thanks.

With these (already mentioned somewhre in this thread) warnings:

LookLexer.g:14:8: warning:Redefinition of token in tokens {...}: EQ
LookLexer.g:15:10: warning:Redefinition of token in tokens {...}: SEMI

>
> We still have an issue, that is #[] constructs when building ASTs. It's not
> straightforward. You need to write #[EQ, "="]. Why not #["="]? Afterall
> antlr computes a token table with enum_type/string/numbers associations...
>
> Maybe we should ask Terr for an enhancement?
>
> > class Example_Lexer extends Lexer;
> > options {
> > 	k = 2;
> > 	charVocabulary= '\u0000' .. '\u00FF';
> > 	// Settings for literal matching
> > 	caseSensitiveLiterals = false;	// case matters!
> > 	testLiterals = false;
> > 	defaultErrorHandler = true;
> > 	exportVocab = Lexer_;
> > }
> > tokens { /*{{{*/
> > 	MCONST="const";   // matched via IDENTIFIER rule
> > 	MEXTERN="extern"; // use M<id> here to prevent clashes with some
> > <.snip.>             // and often used defines (Tcl/Tk to name one)
> >    BOR="|";          // duplicated from rules so I can use "|" in parser
> > 	NOT_OP="!";
> > <.snip.>
> >    GE_OP=">=";
> > <.snip.>
> > 	AT="@";
> > 	HASH="#";
> > 	INT;
> > 	FLOAT;
> > 	STRING;
> > 	IDENTIFIER;
> > 	/*}}}*/
> > }
> >
> > protected EXPONENT_PART:	( 'e' | 'E' ) ( '+' | '-' )? ('0'..'9')+ ;
> > protected FLOAT_SUFFIX: ('F'|'f'|'L'|'l') ;
> >
> > DOT_OR_DOTDOT:
> > 	".."	{ $setType(DOTDOT); }
> >
> > |	'.'	{ $setType(DOT); }
> >
> > ;
> >
> > NUMERIC:
> > 	('0'..'9')+	{ $setType(INT); }
> > 	(
> >
> > 	|	{ LA(2) >= '0' && LA(2) <= '9' }? '.' ('0'..'9')+ (EXPONENT_PART)?
> > 	| (FLOAT_SUFFIX)? { $setType(FLOAT); } EXPONENT_PART (FLOAT_SUFFIX)?  {
> > 	| $setType(FLOAT); }
> > 	|	FLOAT_SUFFIX  { $setType(FLOAT); }
> >
> > 	)
> > ;
> >
> > IDENTIFIER options { testLiterals = true; }:
> > 	( 'a' .. 'z' | 'A' .. 'Z' | '_' )
> > 	( 'a' .. 'z' | 'A' .. 'Z' | '0'..'9' | '_' | '#' )*
> > ;
> >
> > NOT_OP:		"!";
> > QUESTION:	"?";
> > AND_OP:		"&&";
> > OR_OP:		"||";
> > EQ_OP:		"==";
> > NE_OP:		"!=";
> > LT_OP:		"<";
> > GT_OP:		">";
> > LE_OP:		"<=";
> > GE_OP:		">=";
> > PLUS:			"+";
> > MINUS:		"-";
> > MULT:			"*";
> > AMPERSAND:	"&";
> > BOR:			"|";
> > EOR:			"^";
> > MOD:			"%";
> >
> > SHIFTR_OR_ASGN:	">>" { $setType(SHIFTR); } ( "=" { $setType(SR_ASSIGN); }
> > )?; SHIFTL_OR_ASGN:	"<<" { $setType(SHIFTL); } ( "=" {
> > $setType(SL_ASSIGN); } )?;
> >
> > COMMA:			",";
> > ASSIGN:			"=";
> > PLUS_ASSIGN:	"+=";
> > MINUS_ASSIGN:	"-=";
> > MULT_ASSIGN:	"*=";
> > DIV_ASSIGN:		"/=";
> > MOD_ASSIGN:		"%=";
> > BAND_ASSIGN:	"&=";
> > BXOR_ASSIGN:	"^=";
> > BOR_ASSIGN:		"|=";
> > ASSIGN_START:	"{=";
> > ASSIGN_END:		"=}";
> > LBRACE:			"(";
> > RBRACE:			")";
> > LCURL:			"{";
> > RCURL:			"}";
> > LBRACKET:		"[";
> > RBRACKET:		"]";
> > DCOLON:			"::";
> > COLON:			":";
> > SEMICOLON:		";";
> > AT:				"@";
> > HASH:				"#";
> > ------snip----
> >
> > Another approach is to make you xxxTokenTypes.txt and .hpp/.java yourself
> > and import that one into all lexer/parser/treeparsers.
> >
> > > And the walkers import the lexers vocabulary (see the attached files).
> >
> > I always import from the lexer/parser/treewalker one stage below in the
> > hierarchy. So: lexer exports to parser exports to treewalker exports to
> > treewalker exports to treewalker etc. That way you'll always be sure to
> > import tokens that were introduced in the stage below.
> >
> > > Or it's just a matter of taste?
> >
> > It might well be :)
> >
> > Cheers,
> >
> > Ric
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list