[antlr-interest] on parsers look and feel + #["="]
Cristian Amitroaie
cristian at amiq.ro
Fri Nov 28 08:22:14 PST 2003
Oops,
On Friday 28 November 2003 17:57, Cristian Amitroaie wrote:
> Hi Ric,
>
> On Friday 28 November 2003 16:18, Ric Klaren wrote:
> > On Wed, Nov 26, 2003 at 10:42:14AM +0200, Cristian Amitroaie wrote:
> > > o sometimes I kind of foreget what name I gave to the "=" token from
> > > the Lexer (EQ/EQUAL/EQUALS/ASSIGN) when I want to add a new rule to a
> > > parser. o sometimes I get bored to write LCURLEY instead of "{" or '{'
> > > o sometimes it's hard for me to follow rules full of SEMI, LCURL(E)?Y,
> > > LBRACK, LPARENS and so on
> > >
> > > For example, I would like to see my parser rules look like:
> > >
> > > assign:
> > > ID "="^ ID ";"!
> > > ;
> > > I browsed throw the documentation/big examples, yet I couldn't find any
> > > similar approach as a guideline or something.
> > >
> > > Are there any disadvantages/risks related to this approach?
> >
> > As long as you keep things well synchronized you'll be ok. Easiest (in my
> > experience) is to 'incrementally' build your tokens with
> > export/importvocabs. I personally already start doing this 'trick' in the
> > lexer so after that I can use in parser and treewalkers the real token in
> > stead of some enumerated value (e.g."=" in stead of ASSIGN).
> >
> > Basically I repeat in the tokens section of the lexer the strings used
> > when matching in the rules.
>
> Yes it works fine, thanks for the suggestion.
>
> I am also a fan of importing vocabs lexer -> parser -> walker way, yet I
> had a lot of trouble with a language containing many keywords and I almost
> always foregot to add the new literal to lexer's token table, hence now I
> am considering importing the parser's vocab into the lexer.
>
> Even doing so, your sollution still applies, thanks.
With these (already mentioned somewhre in this thread) warnings:
LookLexer.g:14:8: warning:Redefinition of token in tokens {...}: EQ
LookLexer.g:15:10: warning:Redefinition of token in tokens {...}: SEMI
>
> We still have an issue, that is #[] constructs when building ASTs. It's not
> straightforward. You need to write #[EQ, "="]. Why not #["="]? Afterall
> antlr computes a token table with enum_type/string/numbers associations...
>
> Maybe we should ask Terr for an enhancement?
>
> > class Example_Lexer extends Lexer;
> > options {
> > k = 2;
> > charVocabulary= '\u0000' .. '\u00FF';
> > // Settings for literal matching
> > caseSensitiveLiterals = false; // case matters!
> > testLiterals = false;
> > defaultErrorHandler = true;
> > exportVocab = Lexer_;
> > }
> > tokens { /*{{{*/
> > MCONST="const"; // matched via IDENTIFIER rule
> > MEXTERN="extern"; // use M<id> here to prevent clashes with some
> > <.snip.> // and often used defines (Tcl/Tk to name one)
> > BOR="|"; // duplicated from rules so I can use "|" in parser
> > NOT_OP="!";
> > <.snip.>
> > GE_OP=">=";
> > <.snip.>
> > AT="@";
> > HASH="#";
> > INT;
> > FLOAT;
> > STRING;
> > IDENTIFIER;
> > /*}}}*/
> > }
> >
> > protected EXPONENT_PART: ( 'e' | 'E' ) ( '+' | '-' )? ('0'..'9')+ ;
> > protected FLOAT_SUFFIX: ('F'|'f'|'L'|'l') ;
> >
> > DOT_OR_DOTDOT:
> > ".." { $setType(DOTDOT); }
> >
> > | '.' { $setType(DOT); }
> >
> > ;
> >
> > NUMERIC:
> > ('0'..'9')+ { $setType(INT); }
> > (
> >
> > | { LA(2) >= '0' && LA(2) <= '9' }? '.' ('0'..'9')+ (EXPONENT_PART)?
> > | (FLOAT_SUFFIX)? { $setType(FLOAT); } EXPONENT_PART (FLOAT_SUFFIX)? {
> > | $setType(FLOAT); }
> > | FLOAT_SUFFIX { $setType(FLOAT); }
> >
> > )
> > ;
> >
> > IDENTIFIER options { testLiterals = true; }:
> > ( 'a' .. 'z' | 'A' .. 'Z' | '_' )
> > ( 'a' .. 'z' | 'A' .. 'Z' | '0'..'9' | '_' | '#' )*
> > ;
> >
> > NOT_OP: "!";
> > QUESTION: "?";
> > AND_OP: "&&";
> > OR_OP: "||";
> > EQ_OP: "==";
> > NE_OP: "!=";
> > LT_OP: "<";
> > GT_OP: ">";
> > LE_OP: "<=";
> > GE_OP: ">=";
> > PLUS: "+";
> > MINUS: "-";
> > MULT: "*";
> > AMPERSAND: "&";
> > BOR: "|";
> > EOR: "^";
> > MOD: "%";
> >
> > SHIFTR_OR_ASGN: ">>" { $setType(SHIFTR); } ( "=" { $setType(SR_ASSIGN); }
> > )?; SHIFTL_OR_ASGN: "<<" { $setType(SHIFTL); } ( "=" {
> > $setType(SL_ASSIGN); } )?;
> >
> > COMMA: ",";
> > ASSIGN: "=";
> > PLUS_ASSIGN: "+=";
> > MINUS_ASSIGN: "-=";
> > MULT_ASSIGN: "*=";
> > DIV_ASSIGN: "/=";
> > MOD_ASSIGN: "%=";
> > BAND_ASSIGN: "&=";
> > BXOR_ASSIGN: "^=";
> > BOR_ASSIGN: "|=";
> > ASSIGN_START: "{=";
> > ASSIGN_END: "=}";
> > LBRACE: "(";
> > RBRACE: ")";
> > LCURL: "{";
> > RCURL: "}";
> > LBRACKET: "[";
> > RBRACKET: "]";
> > DCOLON: "::";
> > COLON: ":";
> > SEMICOLON: ";";
> > AT: "@";
> > HASH: "#";
> > ------snip----
> >
> > Another approach is to make you xxxTokenTypes.txt and .hpp/.java yourself
> > and import that one into all lexer/parser/treeparsers.
> >
> > > And the walkers import the lexers vocabulary (see the attached files).
> >
> > I always import from the lexer/parser/treewalker one stage below in the
> > hierarchy. So: lexer exports to parser exports to treewalker exports to
> > treewalker exports to treewalker etc. That way you'll always be sure to
> > import tokens that were introduced in the stage below.
> >
> > > Or it's just a matter of taste?
> >
> > It might well be :)
> >
> > Cheers,
> >
> > Ric
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list