[antlr-interest] Using long literal definitions in ANTLR
Micheal J
open.zone at virgin.net
Fri Nov 24 07:27:06 PST 2006
Hi,
Any reason why you're still using ANTLR 2.7.2?. I'd recommend V3 for what
you're doing (or at least ANTLR 2.7.6+ if you must use ANTLR 2.7.x).
You could try something like (check for typos/syntax)
class TestLexer extends...
options {...}
tokens {
BEGIN,
END,
....
FRAMESTYLE,
STYLE
}
FORMS_LITERALS
: "BEGIN" { $setType(BEGIN); }
| "END" { $setType(END); }
.......
| "FRAMESTYLE" { $setType(FRAMESTYLE); }
| "STYLE" { $setType(STYLE); }
;
Micheal
-----------------------
The best way to contact me is via the list/forum. My time is very limited.
-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Andrew Monaghan
Sent: 24 November 2006 00:35
To: antlr-interest at antlr.org
Subject: [antlr-interest] Using long literal definitions in ANTLR
I'm using an older version of ANTLR 2.7.2 to build a relatively simple forms
parser, but I've come across an issue that has stumped me for a couple of
days now. I'm no expert on parsers so I'm probably missing something very
obvious - so any help would be gratefully received. The issue stems for a
set of long, similar literals in the form definition.
The grammer has a number of definitions for 'form' style including
"DOULBLE_THIN", "DOUBLE_THICK", "DOUBLE" and "DOUBLEUNDER". To
accommodate the long literal length allowing ANTLR to distinguish between
the literals i've upped k to 11, but I'm still getting nondeterminism
between rules FRAMESTYLETYPE and STYLETYPE. And when the parser is run I
get an exception in the generated nextToken() operation in the lexer.
For example, the parser may be parsing a STYLE line and expects one of the
STYLETYPES to follow but due to the order of the rules in the grammer
nextToken() matches the DOUBLE (from DOUBLE_THIN) from the FRAMESTYLETYPE
first and throws an exception. Changing the rules order isn't an option
because it will throw an exception on the FRAMESTYLE line instead.
I'm sure this is a problem because i've increased k but I'm at a loss at any
alternative strategies.
Cheers,
Andy
---------------------------------------------------------
Below is a simplified version of the grammer...
options
{
language = "CSharp";
}
class TestParser extends Parser;
form: (formbody)+ ;
formbody: BEGIN formcontent END ;
formcontent: (formentry)+ ;
formentry : styleline
| framestyleline;
styleline: STYLE style1:STYLETYPE;
framestyleline: FRAMESTYLE style:FRAMESTYLETYPE;
class TestLexer extends Lexer;
options
{
k = 11;
charVocabulary = '\3'..'\377';
charVocabulary = '\u0000'..'\uFFFE';
}
BEGIN: "BEGIN" ;
END: "END" ;
STYLE: "STYLE" ;
FRAMESTYLE: "FRAMESTYLE" ;
FRAMESTYLETYPE: "SINGLE_THIN" | "DOUBLE_THIN" | "SINGLE_THICK" |
"DOUBLE_THICK" | "DOTTED" ;
STYLETYPE: "NORMAL" | "BOLD" | "ITALIC" | "UNDER" |
"DOUBLEUNDER" | "DOUBLE" | "TRIPLE" | "QUADRUPLE" |
"STRIKETHROUGH" | "ROTATE90" | "ROTATE270" |
"UPSIDEDOWN" | "PROPORTIONAL" | "DOUBLEHIGH" |
"TRIPLEHIGH" | "QUADRUPLEHIGH" | "CONDENSED" |
"SUPERSCRIPT" | "OVERSCORE" | "LETTERQUALITY" |
"NEARLETTERQUALITY" | "DOUBLESTRIKE" | "OPAQUE" ;
WS : (' ' | '\t')+ { $setType(Token.SKIP); }
;
NEWLINE
: '\r' '\n' { newline(); $setType(Token.SKIP);}
| '\n' { newline(); $setType(Token.SKIP);}
| '\r' { newline(); $setType(Token.SKIP);}
;
------------------------------------------------
public new Token nextToken() //throws
TokenStreamException
{
...
default:
if ((LA(1)=='D'||LA(1)=='S') &&
(LA(2)=='I'||LA(2)=='O') && (LA(3)=='N'||LA(3)=='T'||LA(3)=='U') &&
(LA(4)=='B'||LA(4)=='G'||LA(4)=='T') && (LA(5)=='E'||LA(5)=='L') &&
(LA(6)=='D'||LA(6)=='E') && (true) && (true) && (true) && (true) && (true))
{
mFRAMESTYLETYPE(true);
theRetToken = returnToken_;
}
else if
((tokenSet_0_.member(LA(1))) && (tokenSet_1_.member(LA(2))) &&
(tokenSet_2_.member(LA(3))) && (tokenSet_3_.member(LA(4))) && (true) &&
(true) && (true) && (true) && (true) && (true) && (true)) {
mSTYLETYPE(true);
theRetToken = returnToken_;
}
else if ((LA(1)=='B') &&
(LA(2)=='E') && (LA(3)=='G')) {
mBEGIN(true);
theRetToken = returnToken_;
}
else if ((LA(1)=='S') &&
(LA(2)=='T') && (LA(3)=='Y')) {
mSTYLE(true);
theRetToken = returnToken_;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061124/14275e15/attachment-0001.html
More information about the antlr-interest
mailing list