[antlr-interest] about the paraphrase option...

antlrlist antlrlist at yahoo.com
Tue Apr 1 09:08:26 PST 2003


SHORT VERSION OF THIS POST
   - ¿Shouldn't the "paraphrase" option be available for tokens
defined in the "tokens" section?
   - ¿Why the options for a token in the tokens section are
enclosed
with "<" and ">" instead of "options {" and "}", like in the rest?


LONG VERSION OF THIS POST

Hi to everyone!

I'm a spanish student of Computer Sience here at Sevilla University.
My end-of-studies project is about ANTLR. Right now I've written down
160 pages of documentation aprox., and I'm planning to reach 200. Of
course I'll post it here when it's finished, but keep in mind that
it's in spanish...

There's this chapter about error recovery. ANTLR's default error
recovery strategy is difficult to improve, so I've pointed some others
(like FIRST-and-FOLLOW instead or only FOLLOW, and synchronizing
symbols). Then I concentrated on message printing. That's where I talk
about the "paraphrase" option.

This option lets you define a "token name" to be used when an error is
thrown. For example, in identifiers:

------------------------------------------------------------
IDENT
options { paraphrase="an identifier"; }
   : (LETTER|'_') (LETTER|'_'|DIGIT)*
   ;
------------------------------------------------------------

It allows generating error messages like "an identifier is missing",
instead of "IDENT is missing".

I'm trying to use paraphrase whenever possible, and I'm having
problems. The grammar I'm using recognizes both real (LIT_REAL) and
integer (LIT_INT) literals. For lookahead problems, I'm forced to
recognize both in the same rule, LIT_NUMBER, like this:

------------------------------------------------------------
class myLexer extends Lexer ;
[...]
tokens {
[...]
   LIT_REAL; LIT_REAL;
}
[...]
LIT_NUMBER
   : ( (DIGIT)+ '.' ) =>
      (DIGIT)+ '.' (DIGIT)+
      { $setType(LIT_REAL); }
   | (DIGIT)+
     { $setType(LIT_INT); }
   ;
------------------------------------------------------------

My question is simple: How do I add a paraphrase to LIT_REAL and
LIT_INT?

I feel that the best and simplest solution would be allowing the use
of paraphrase in the tokens section, this way:

------------------------------------------------------------
tokens{
   LIT_INT  <paraphrase="an integuer"; >;
   LIT_REAL <paraphrase="a real"; >;
}
------------------------------------------------------------

That doesn't work; paraphrase is not a valid option for the tokens
section - it should, should'nt it? It appears that only tokens defined
with a rule can benefit from it.

(BTW, why the way options are specified in the tokens section are
different from the rest? This is, why use "<" and ">" when we have
"options {" and "}" in the rest?)

If somebody has this problem, I'm using this crude hack, consisting in
defining rules that are never used:

------------------------------------------------------------
class myLexer extends Lexer ;
[...]
tokens {...} // it does not contain LIT_INT or LIT_REAL any more
[...]
LIT_NUMBER : ... ; // doesn't change

LIT_REAL
options { paraphrase="a real"; }
   : '%'          // Unused char
     { false }?   // Use this or throw a CharScannerException
   ;

LIT_INT
options { paraphrase="an integuer"; }
   : '@'          // Unused char
     { false }?   // Use this or throw a CharScannerException
   ;
------------------------------------------------------------

So the final question is : ¿how do I add paraphrasing to my
numbers?
¿Should I modify antlr to accept paraphrasing in the tokens
section?
If yes, how? Or should I modify my grammar somehow?

Any thoughts are wellcome. Thanks!

Enrique.


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list