[antlr-interest] Rookie attempt at ANTLR 3 (Using ANTLRWORKS second correction attempt)

Jim Idle jimi at intersystems.com
Thu Oct 26 14:35:37 PDT 2006


First ALPHANUMSTRING can end up matching nothing as it does not force any character to be there, having a * rather than a +. I think that that is probably your start rule issue.

Next issue is that all your rules are the same thing. Your lexer only recognizes ALPHANUMSTRING and so every rule is just str=ALPHANUMSTRING. 

Next, it is difficult to see exactly what your start rule is trying to achieve but I guess you trying to get it to follow multiple lines and end when you see end. I think that you can throw away the newline tokens unless they end up being significant as you expand the grammar to cover the whole language, which is certainly possible. But you need to formulate this such that there is a rule that can match a valid construct, then use a higher rule to say how this repeats. Try thinking out in words how you can describe it (there you go Anthony ;-), such as a line of code is one statement followed by any number of additional statements separated by a semi-colon, then a NEWLINE. A statement block is any number of statements, including zero, surrounded by {} etc. Once you can describe it to yourself in English, 

However, I am afraid to say that I don't think that this approach is at all correct; basically you are just telling the lexer to tokenize everything that isn't a whitespace into one thing, then are trying to do all the tokenizing in the parser, and not actually doing any parsing. You would be better off, dare I say it, hand crafting such a beast ;-).

All is not lost however, as ANTLR3 can handle your language I believe (but then I believe it can be made to handle anything).

I think that what you should do is lex the keywords, and provide a lex rule, say IDORSTRING that matches anything that isn't a keyword. Then in the parser, at the points you know you can have an non-delimited string, match any possible token that can be a string (with suitable predicated rules to avoid ambiguities where necessary) and interpret it as an non-delimited string. Difficulties arise when an undelimited string is optional and you have to lookahead and use predicates and stuff, but that's what ANTLR is good at. 

Next, if your keywords can be: P PR PRI PRIN PRINT, then code the keyword, accordingly, and distinguish it as a string back in the parser:

PRINT: 'P' ( 'R' ( 'I' ( 'N' ( 'T')? )? )? )? ;

Be careful about ambiguities here. Basically ANTLR will match the first sequence listed (but you may end up with warnings and so on - you will need to experiment).

In order that you have an example of all this, I took the liberty of making something close to your sample, that produces a tree (which is what you want to do here, get your grammar/parser to produce an unambiguous and correct tree, then write your action code to do whatever it is you want to do with this in the tree parser)... that you might try to expand (tested with ANTLRWorks 1.0b5):

grammar TestMe;

options
{
	output=AST;
}
	
tokens
{
	STRING;
	CODEBLOCK;
	CODELINE;
	MONTH;
}

codeBlock
	: (c+= codelines)+
	  END
	  
	  -> ^(CODEBLOCK $c+)
	;
	 
codelines
	: m=month		-> ^(CODELINE ^(MONTH $m))
	| PRINT s=string	-> ^(CODELINE ^(PRINT $s))
	;

string
	: i=IDORSTRING			     	-> ^(STRING[$i.text] )
	| (keyword_strings)=> k=keyword_strings -> ^(STRING[$k.text] )
	;
	
keyword_strings
	: month
	| PRINT
	| END
	;

month	: JAN | FEB | MAR | APR | JUN | JUL | SEP | OCT | NOV | DEC ;

JAN	:	'jan' ;
FEB	:	'feb' ;
MAR	:	'mar' ;
APR	:	'apr' ;
MAY	:	'may' ;
JUN	:	'jun' ;
JUL	:	'aug' ;
SEP	:	'sep' ;
OCT	:	'oct' ;
NOV	:	'nov' ;
DEC	:	'dec' ;

END	:	'e' 'n' 'd'
	;
	
PRINT	:	'p' ( 'r' ( 'i' ( 'n' ( 't' )? )? )? )? ;

IDORSTRING
	: (ALPHA | DIGIT)+
	;

fragment DIGIT 
	:	('0'..'9')
	;

fragment ALPHA
	:	('a'..'z')
	;

WS	: (' ' | '\t')+ {channel=99;}
	;

NEWLINE	: ('\r' '\n'? | '\n') { channel=99;}
	;









-----Original Message-----
From: Foolish Ewe [mailto:foolishewe at hotmail.com] 
Sent: Thursday, October 26, 2006 11:43 AM
To: Jim Idle; antlr-interest at antlr.org
Subject: Rookie attempt at ANTLR 3 (Using ANTLRWORKS second correction attempt)

Hello All:

I had a catastrophe during the edit of my previous attempt at a correction,
so now I'm really groveling, please forgive me if you get a redundant reply.
I'm using ANTLR3 using ANTLRworks (which seems very nice so far) under
Windows XP in case you are wondering. There should be a MIME attached
ANTLR3 grammar to this message.

When I try to compile TestGrammar.g (a MIME attached file), I get the 
following errors in the
console tab in the bottom subwindow.  Although the prior posting omitted the 
grammar
(just as well, since I got to correct the java code in the @members 
section), there really
was some code generating that message.
[14:40:33] grammar TestGrammar: no start rule (no rule can obviously be 
followed by EOF)
[14:40:33] [Long path omitted]TestGrammar.g:44:3: The following alternatives 
are unreachable: 3

Note that I'm trying this approach because I've got a strange language that 
I'm trying
to scan which has "undelimited" strings (for historical reasons, this wasn't 
my doing),
so I sometimes would like to suppress key word recognition.  If I could scan 
in the language
properly, I think the parsing itself might not be too bad.

If I comment out the first and second alternative, (so that startRule->end 
NEWLINE) then
ANTLR will generate source but instead I get  (what seems to be) a Java code 
generation error.

13:06:08] [Long Path Snipped]\TestGrammar.java:78: illegal start of 
expression
[13:06:08]         void endtoken = null;
[13:06:08]         ^
[13:06:08] 1 error

Once again, sorry about cluttering up the mailing list with the prior 
malformed message,
I hope this one is well formed.

Thanks:

Bill M.

>From: "Jim Idle" <jimi at intersystems.com>
>To: "Foolish Ewe" <foolishewe at hotmail.com>,<antlr-interest at antlr.org>
>Subject: Re: [antlr-interest] Rookie attempt at ANTLR 3 (using 
>thecurrentANTLRWorks under Window XP)
>Date: Wed, 25 Oct 2006 18:24:46 -0400
>
>Bill,
>
>Unless you have missed some of the grammar out from this post, it looks
>to me like you don't actually have any rules in the grammar, only some
>member functions? I would think that that you do really have some rules
>but just have not posted them? ;-)
>
>If I take out the java code from your post, we are left with:
>
>// Test hoisting and use of predicates to allow us to use "undelimited
>strings"
>grammar TestGrammar;
>
>// I'm not using tokens in this langauge yet.
>//tokens = { }
>
>
>If this is really your grammar, then I would think it is pretty obvious
>;-), that there is no rule for ANTLR to look for EOF in.
>
>Jim
>
>-----Original Message-----
>From: antlr-interest-bounces at antlr.org
>[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Foolish Ewe
>Sent: Wednesday, October 25, 2006 1:30 PM
>To: antlr-interest at antlr.org
>Subject: [antlr-interest] Rookie attempt at ANTLR 3 (using the
>currentANTLRWorks under Window XP)
>
>Hi Folks:
>
>I'm trying ANTLR 3 today, using ANTLRworks (so far it seems like Bovet
>and
>Parr have some
>reallly neat stuff in there).
>
>I'm trying to compile the attached grammar in the tool and am getting a
>message:
>
>Cannot generate the grammar because grammar TestGrammar : no start rule
>(no
>rule can
>obviously be followed by EOF).
>
>This will probably out me to my coauthors and students, but I'm not a
>big
>fan of the words
>obviously/easily or their variants :-).
>
>What does this message mean, how can I better convey to ANTLR that
>startRule
>is the start rule?
>
>Thanks:
>
>Bill M.
>
>_________________________________________________________________
>Use your PC to make calls at very low rates
>https://voiceoam.pcs.v2s.live.com/partnerredirect.aspx
>
>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.408 / Virus Database: 268.13.11/496 - Release Date:
>10/24/2006
>

_________________________________________________________________
Stay in touch with old friends and meet new ones with Windows Live Spaces 
http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx&mkt=en-us

-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.408 / Virus Database: 268.13.11/498 - Release Date: 10/26/2006
 
  

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.408 / Virus Database: 268.13.11/498 - Release Date: 10/26/2006
 


More information about the antlr-interest mailing list