[antlr-interest] antlrworks confused by imaginary tokens?
Stefan Mätje
Stefan.Maetje at esd-electronics.com
Fri Mar 16 05:22:02 PDT 2012
Hi,
some hints on defining imaginary and real tokens in an ANTLR grammar.
The ANTLR grammar for *.g files prescribes a certain order of sections in the
grammar file. Therefore you must follow this order in your grammar file. See
this short excerpt from the ANTLR grammar for ANTLR grammar files:
grammarDef
: DOC_COMMENT?
( 'lexer' {gtype=LEXER_GRAMMAR;} // pure lexer
| 'parser' {gtype=PARSER_GRAMMAR;} // pure parser
| 'tree' {gtype=TREE_GRAMMAR;} // a tree parser
| {gtype=COMBINED_GRAMMAR;} // merged parser/lexer
)
g='grammar' id ';' optionsSpec? tokensSpec? attrScope* action*
rule+
EOF
-> ^( {adaptor.create(gtype,$g)}
id DOC_COMMENT? optionsSpec? tokensSpec? attrScope* action*
rule+
)
;
You see the tokensSpec follows the optionsSpec and so on ...
The token section should look like the following example:
token {
VIRTUAL_TOKEN1;
VIRTUAL_TOKEN2;
REAL_TEXT = 'TEXT'; // Only a single char / string allowed!
REAL_INFO = 'INFO';
}
Please observe the fact that between the token name and the token text there
is an equals ("=") sign! This is an deviation from the syntax of a lexer rule
to define a token.
Please also observe that in the token section only a single string or char
literal is allowed. If you need something like a keyword which may have an
abbreviated form then you must use a lexer rule like this:
KW_IDENT: ('IDENT' | 'IDENTICAL');
If you look at these rules you see that your posted tokens section violates
these rules.
I hope that helps,
Stefan
PS.: You may look for the ANTLR grammar for grammar files in the source
distribution. Look for file ./antlr-3.4/tool/src/main/antlr3/org/antlr/
grammar/v3/ANTLRv3.g
Am 16.03.2012 05:03:44 schrieb(en) Michael Roberts:
> I've been happily hacking on my little grammar using antlrworks.
> Everything was going swimmingly until I introduced a section of imaginary
> tokens for use in rewrite rules. For some reason, antlr/antlrworks really
> wanted this section of imaginary tokens at the start of the file, directly
> behind the options section. It didn't seem to like it in other places, and
> would not recognize the imaginary tokens otherwise.
>
> However, oddly, it didn't like it if I defined my regular tokens inside the
> tokens sections and refused to recognize them, flagging mismatched token
> exceptions all over the place. So, accepting defeat, I moved these
> non-imaginary tokens back to the end of the file, where they'd previously
> been living. No missing tokens, everything generates fine now.
>
> However, when I attempt to debug my parser, the generated test code
> references the first non-imaginary token it finds as the top level
> construct, in my case CLOSE_PAREN, and not my top-level compilationUnit
> production (which is ahead of it in the file). Thus:
>
> public class __Test__ {
>
> public static void main(String args[]) throws Exception {
> JLG2Lexer lex = new JLG2Lexer(new
> ANTLRFileStream("C:\\src\\Core\\src\\org\\veve\\reflect\\interpreter\\output
> \\__Test___input.txt",
> "UTF8"));
> CommonTokenStream tokens = new CommonTokenStream(lex);
>
> JLG2Parser g = new JLG2Parser(tokens, 49100, null);
> try {
> g.CLOSE_PAREN(); // <-- BAD, was expecting to see
> compilationUnit here ...
> } catch (RecognitionException e) {
> e.printStackTrace();
> }
> }
> }
>
> So, my main question is .. why doesn't this form of token definition
> (below) work:
>
>
> tokens
> {
>
> // Imaginary tokens for AST rewrite ops
> IDENTIFIER_PATH;
> INVOCATION;
> STATEMENT_BLOCK;
> AMPERSAND_INVOCATION;
> INVOCATION_STAT;
> OBJECT;
> ARRAY;
> ELEMENT_STAT;
> MEMBERS;
> PAIR;
> PAIR_LIST;
> METHOD_INVOCATION;
> NEW_COMMAND;
> STRING;
> NUMBER;
> ARRAY;
> BOOLEAN;
> NULL;
> PATH;
>
> // Real, defined tokens
> CLOSE_PAREN : ')';
> AMPERSAND : '@';
> WS : (' '|'\t'|'\f'|'\n'|'\r')+{ skip(); };
> COLON : ':';
> EQUALS : '=';
> INJECT : '<-';
> COMMA : ',';
> SLASH : '/';
> OPEN_PAREN : '(' ;
> OPEN_BRACE : '{';
> CLOSE_BRACE
> : '}';
> DOT
> : '.';
> SEMI_COLON
> : ';';
> BLOCK : '|' ;
> }
>
> is the token section just for imaginary tokens, then, and, if not how do I
> define regular tokens in it .. and, in essence, what could I possibly be
> doing to so confuse the test jig generator code so that it's generating
> something silly?
>
> MR
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
>
>
More information about the antlr-interest
mailing list