[antlr-interest] New faq entries - import/export

Mon Oct 29 09:57:54 PST 2001

Hi Ter,

I have some comments about the 3rd entry: Why can't I get the
exportVocab/importVocab directives to work with string literals across
multiple files?

I will argue that:
1. The real problem with that grammar is not the import/export relation, but
a bug in ANTLR.
2. The approach suggested for import/export, while correct (when not problem
1.), is not the best.

1. String literals in parser are not always converted to lexer tokens. For
instance, an occurrence of the string "if" in the parser grammar is
translated to LITERAL_if, but string "if0" (with a digit at the end) is not
converted to anything (just a commented line in XxxTokenTypes.txt). The same
happens in the grammar that originated this thread, were the string was
".accept" (starting with dot), and was not converted properly. This a bug in
ANTLR.

2.  If the parser imports the lexer vocabulary, as suggested, the lexer will
not know about the (implicit) tokens referred as strings in the parser.
Those tokens will have to be written down in the lexer itself, a tedious and
error prone activity.

A better solution is to make the lexer import the parser vocabulary:

class TestingParser extends Parser;
options {
  exportVocab=TestingParser;
...

class TestingLexer extends Lexer;
options {
  importVocab=TestingParser;
...

If the parser and lexer are in different files, the parser file has to be
compiled first.

Regards,
Bogdan

----- Original Message -----
From: "Terence Parr" <parrt at jguru.com>
Subject: [antlr-interest] New faq entries

> Folks, added 3 new FAQ entries (should be the top 3 in the list). :)
>
> http://www.jguru.com/faq/ANTLR
>
> Ter

Christopher writes:
The literal ".accept" matches using this set of import/export rules:
class TestingParser extends Parser;
options {
  exportVocab=TestingParser;
}

command : ".accept"
          {cout << "matched accept" << endl; }
        ;

class TestingLexer extends Lexer;
options {
  charVocabulary='\3'..'\377';
  exportVocab=TestingParser;
  caseSensitive=false;
}

ID : ( 'a' .. 'z' | '0' .. '9' | '.' )+ ;
WS : ( ' ' | '\t' | '\n' { newline(); }
       | '\r' | '\b' )+ { $setType(Token::SKIP); } ;

but ".accept" does not match using this set of import/export rules
(regardless of whether or not the parser and lexer definitions are in the
same file):

class TestingParser extends Parser;
options {
  importVocab=TestingLexer;
  exportVocab=TestingParser;
}

...

class TestingLexer extends Lexer;
options {
  ...
  exportVocab=TestingLexer;
  ...
}

Ric answers:
As a rule of thumb use this scheme when using importvocab/exportvocab with
grammars in different files (the parser/lexer in one file scheme is subtly
different):
First in the lexer you export the current vocabulary (say L) e.g.
exportVocab = L.
Then in the parser you have to get the definitions from the lexer e.g.
importVocab = L, in the parser you can extend the vocabulary with for
example imaginary tokens. These you might need in a treeparser so you do in
the parser a exportVocab = P (don't use the same name as for L!!!).
Last but not least in the treeparser you do a importVocab = P. If you
transform the tree and add new nodetypes and want to use those in subsequent
treewalkers, then you need to extend the chain of imports/exports similar to
the way that is done with parser and lexer.
Also make sure you have your build dependencies set up right. It can be very
frustrating to debug something that comes from parser/lexer/walker differing
in opinions on tokensets because of a incomplete build... (e.g. build in the
order lexer, parser, treewalker)

You can check for errors in this stuff by looking at the xxxTokenTypes files
and looking for discrepancies.

To come back to your question: Your last scheme is the right one.. So my
guess is that you run your parser through antlr before you run the lexer
through it. Best guess is to remove all antlr generated stuff then build the
lexer and the parser in that order, then it should work.