[antlr-interest] on parsers look and feel + #["="]
Cristian Amitroaie
cristian at amiq.ro
Fri Nov 28 07:57:39 PST 2003
Hi Ric,
On Friday 28 November 2003 16:18, Ric Klaren wrote:
> On Wed, Nov 26, 2003 at 10:42:14AM +0200, Cristian Amitroaie wrote:
> > o sometimes I kind of foreget what name I gave to the "=" token from
> > the Lexer (EQ/EQUAL/EQUALS/ASSIGN) when I want to add a new rule to a
> > parser. o sometimes I get bored to write LCURLEY instead of "{" or '{' o
> > sometimes it's hard for me to follow rules full of SEMI, LCURL(E)?Y,
> > LBRACK, LPARENS and so on
> >
> > For example, I would like to see my parser rules look like:
> >
> > assign:
> > ID "="^ ID ";"!
> > ;
> > I browsed throw the documentation/big examples, yet I couldn't find any
> > similar approach as a guideline or something.
> >
> > Are there any disadvantages/risks related to this approach?
>
> As long as you keep things well synchronized you'll be ok. Easiest (in my
> experience) is to 'incrementally' build your tokens with
> export/importvocabs. I personally already start doing this 'trick' in the
> lexer so after that I can use in parser and treewalkers the real token in
> stead of some enumerated value (e.g."=" in stead of ASSIGN).
>
> Basically I repeat in the tokens section of the lexer the strings used when
> matching in the rules.
Yes it works fine, thanks for the suggestion.
I am also a fan of importing vocabs lexer -> parser -> walker way, yet I had a
lot of trouble with a language containing many keywords and I almost always
foregot to add the new literal to lexer's token table, hence now I am
considering importing the parser's vocab into the lexer.
Even doing so, your sollution still applies, thanks.
We still have an issue, that is #[] constructs when building ASTs. It's not
straightforward. You need to write #[EQ, "="]. Why not #["="]? Afterall antlr
computes a token table with enum_type/string/numbers associations...
Maybe we should ask Terr for an enhancement?
>
> class Example_Lexer extends Lexer;
> options {
> k = 2;
> charVocabulary= '\u0000' .. '\u00FF';
> // Settings for literal matching
> caseSensitiveLiterals = false; // case matters!
> testLiterals = false;
> defaultErrorHandler = true;
> exportVocab = Lexer_;
> }
> tokens { /*{{{*/
> MCONST="const"; // matched via IDENTIFIER rule
> MEXTERN="extern"; // use M<id> here to prevent clashes with some
> <.snip.> // and often used defines (Tcl/Tk to name one)
> BOR="|"; // duplicated from rules so I can use "|" in parser
> NOT_OP="!";
> <.snip.>
> GE_OP=">=";
> <.snip.>
> AT="@";
> HASH="#";
> INT;
> FLOAT;
> STRING;
> IDENTIFIER;
> /*}}}*/
> }
>
> protected EXPONENT_PART: ( 'e' | 'E' ) ( '+' | '-' )? ('0'..'9')+ ;
> protected FLOAT_SUFFIX: ('F'|'f'|'L'|'l') ;
>
> DOT_OR_DOTDOT:
> ".." { $setType(DOTDOT); }
>
> | '.' { $setType(DOT); }
>
> ;
>
> NUMERIC:
> ('0'..'9')+ { $setType(INT); }
> (
>
> | { LA(2) >= '0' && LA(2) <= '9' }? '.' ('0'..'9')+ (EXPONENT_PART)?
> | (FLOAT_SUFFIX)? { $setType(FLOAT); } EXPONENT_PART (FLOAT_SUFFIX)? {
> | $setType(FLOAT); }
> | FLOAT_SUFFIX { $setType(FLOAT); }
>
> )
> ;
>
> IDENTIFIER options { testLiterals = true; }:
> ( 'a' .. 'z' | 'A' .. 'Z' | '_' )
> ( 'a' .. 'z' | 'A' .. 'Z' | '0'..'9' | '_' | '#' )*
> ;
>
> NOT_OP: "!";
> QUESTION: "?";
> AND_OP: "&&";
> OR_OP: "||";
> EQ_OP: "==";
> NE_OP: "!=";
> LT_OP: "<";
> GT_OP: ">";
> LE_OP: "<=";
> GE_OP: ">=";
> PLUS: "+";
> MINUS: "-";
> MULT: "*";
> AMPERSAND: "&";
> BOR: "|";
> EOR: "^";
> MOD: "%";
>
> SHIFTR_OR_ASGN: ">>" { $setType(SHIFTR); } ( "=" { $setType(SR_ASSIGN); }
> )?; SHIFTL_OR_ASGN: "<<" { $setType(SHIFTL); } ( "=" { $setType(SL_ASSIGN);
> } )?;
>
> COMMA: ",";
> ASSIGN: "=";
> PLUS_ASSIGN: "+=";
> MINUS_ASSIGN: "-=";
> MULT_ASSIGN: "*=";
> DIV_ASSIGN: "/=";
> MOD_ASSIGN: "%=";
> BAND_ASSIGN: "&=";
> BXOR_ASSIGN: "^=";
> BOR_ASSIGN: "|=";
> ASSIGN_START: "{=";
> ASSIGN_END: "=}";
> LBRACE: "(";
> RBRACE: ")";
> LCURL: "{";
> RCURL: "}";
> LBRACKET: "[";
> RBRACKET: "]";
> DCOLON: "::";
> COLON: ":";
> SEMICOLON: ";";
> AT: "@";
> HASH: "#";
> ------snip----
>
> Another approach is to make you xxxTokenTypes.txt and .hpp/.java yourself
> and import that one into all lexer/parser/treeparsers.
>
> > And the walkers import the lexers vocabulary (see the attached files).
>
> I always import from the lexer/parser/treewalker one stage below in the
> hierarchy. So: lexer exports to parser exports to treewalker exports to
> treewalker exports to treewalker etc. That way you'll always be sure to
> import tokens that were introduced in the stage below.
>
> > Or it's just a matter of taste?
>
> It might well be :)
>
> Cheers,
>
> Ric
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list