[antlr-interest] on parsers look and feel
Ric Klaren
klaren at cs.utwente.nl
Fri Nov 28 06:18:30 PST 2003
On Wed, Nov 26, 2003 at 10:42:14AM +0200, Cristian Amitroaie wrote:
> o sometimes I kind of foreget what name I gave to the "=" token from the
> Lexer (EQ/EQUAL/EQUALS/ASSIGN) when I want to add a new rule to a parser.
> o sometimes I get bored to write LCURLEY instead of "{" or '{'
> o sometimes it's hard for me to follow rules full of SEMI, LCURL(E)?Y,
> LBRACK, LPARENS and so on
>
> For example, I would like to see my parser rules look like:
>
> assign:
> ID "="^ ID ";"!
> ;
> I browsed throw the documentation/big examples, yet I couldn't find any
> similar approach as a guideline or something.
>
> Are there any disadvantages/risks related to this approach?
As long as you keep things well synchronized you'll be ok. Easiest (in my
experience) is to 'incrementally' build your tokens with
export/importvocabs. I personally already start doing this 'trick' in the
lexer so after that I can use in parser and treewalkers the real token in
stead of some enumerated value (e.g."=" in stead of ASSIGN).
Basically I repeat in the tokens section of the lexer the strings used when
matching in the rules.
class Example_Lexer extends Lexer;
options {
k = 2;
charVocabulary= '\u0000' .. '\u00FF';
// Settings for literal matching
caseSensitiveLiterals = false; // case matters!
testLiterals = false;
defaultErrorHandler = true;
exportVocab = Lexer_;
}
tokens { /*{{{*/
MCONST="const"; // matched via IDENTIFIER rule
MEXTERN="extern"; // use M<id> here to prevent clashes with some
<.snip.> // and often used defines (Tcl/Tk to name one)
BOR="|"; // duplicated from rules so I can use "|" in parser
NOT_OP="!";
<.snip.>
GE_OP=">=";
<.snip.>
AT="@";
HASH="#";
INT;
FLOAT;
STRING;
IDENTIFIER;
/*}}}*/
}
protected EXPONENT_PART: ( 'e' | 'E' ) ( '+' | '-' )? ('0'..'9')+ ;
protected FLOAT_SUFFIX: ('F'|'f'|'L'|'l') ;
DOT_OR_DOTDOT:
".." { $setType(DOTDOT); }
| '.' { $setType(DOT); }
;
NUMERIC:
('0'..'9')+ { $setType(INT); }
(
| { LA(2) >= '0' && LA(2) <= '9' }? '.' ('0'..'9')+ (EXPONENT_PART)? (FLOAT_SUFFIX)? { $setType(FLOAT); }
| EXPONENT_PART (FLOAT_SUFFIX)? { $setType(FLOAT); }
| FLOAT_SUFFIX { $setType(FLOAT); }
)
;
IDENTIFIER options { testLiterals = true; }:
( 'a' .. 'z' | 'A' .. 'Z' | '_' )
( 'a' .. 'z' | 'A' .. 'Z' | '0'..'9' | '_' | '#' )*
;
NOT_OP: "!";
QUESTION: "?";
AND_OP: "&&";
OR_OP: "||";
EQ_OP: "==";
NE_OP: "!=";
LT_OP: "<";
GT_OP: ">";
LE_OP: "<=";
GE_OP: ">=";
PLUS: "+";
MINUS: "-";
MULT: "*";
AMPERSAND: "&";
BOR: "|";
EOR: "^";
MOD: "%";
SHIFTR_OR_ASGN: ">>" { $setType(SHIFTR); } ( "=" { $setType(SR_ASSIGN); } )?;
SHIFTL_OR_ASGN: "<<" { $setType(SHIFTL); } ( "=" { $setType(SL_ASSIGN); } )?;
COMMA: ",";
ASSIGN: "=";
PLUS_ASSIGN: "+=";
MINUS_ASSIGN: "-=";
MULT_ASSIGN: "*=";
DIV_ASSIGN: "/=";
MOD_ASSIGN: "%=";
BAND_ASSIGN: "&=";
BXOR_ASSIGN: "^=";
BOR_ASSIGN: "|=";
ASSIGN_START: "{=";
ASSIGN_END: "=}";
LBRACE: "(";
RBRACE: ")";
LCURL: "{";
RCURL: "}";
LBRACKET: "[";
RBRACKET: "]";
DCOLON: "::";
COLON: ":";
SEMICOLON: ";";
AT: "@";
HASH: "#";
------snip----
Another approach is to make you xxxTokenTypes.txt and .hpp/.java yourself
and import that one into all lexer/parser/treeparsers.
> And the walkers import the lexers vocabulary (see the attached files).
I always import from the lexer/parser/treewalker one stage below in the
hierarchy. So: lexer exports to parser exports to treewalker exports to
treewalker exports to treewalker etc. That way you'll always be sure to
import tokens that were introduced in the stage below.
> Or it's just a matter of taste?
It might well be :)
Cheers,
Ric
--
-----+++++*****************************************************+++++++++-------
---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893722 ----
-----+++++*****************************************************+++++++++-------
Chaos often breeds life, when order breeds habit.
--- Henry B. Adams, The Education of Henry Adams
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list