[antlr-interest] A newbie question and is this mailing listablack hole for

Mon Oct 23 19:47:02 PDT 2006

Bill--

Congratulations!  You have discovered the lack of
semantic predicate hoisting in ANTLR 2!  Not many do
that: apart from those of us who sorely missed this
feature in going from PCCTS (ANTLR 1) to ANTLR 2,
yours is the first post on the subject in the past six
years.  One of the pluses of ANTLR 3 is that it is
bringing back predicate hoisting.

What happens in your grammar is that the predicate in
getString (and other such rules) is not part of the
lookahead decision in the calling rule.  startRule
sees getString and looks for any ALPHANUMSTRING; the
predicate is only triggered within getString.  If you
change the
getstring:getString to
{kwPrefixMatch(LT(1).getText(), "getstring", 4)}?
		getstring:ALPHANUMSTRING
(that is, don't bury it in a subrule), the error
reported for startRule will disappear.  Alternatively,
you can manually hoist the predicate and do
{kwPrefixMatch(LT(1).getText(), "getstring", 4)}?
        getstring:getString

with the same result.

--Loring

--- Foolish Ewe <foolishewe at hotmail.com> wrote:

> Hi Dave and ANTLR list members:
> 
> Some early experiences show that I may either be
> executing this method 
> wrong,
> or that there may be some limitations in the
> approach.
> 
> Attached is a small sample attempt at doing the sort
> of stuff Dave seems to 
> be
> hinting at.  I've noticed that I'm getting
> nondeterminism messages in the 
> parser for both
> startRule and Month, probably due to the fact all
> keywords are scanned in as 
> ALPHANUMSTRING
> tokens, which doesn't give much distinguishing
> structure at the leaf nodes 
> of the parse tree.
> Is my solution prone to this?
> 
> The grammar also accepts language constructs which I
> don't think it should 
> accept,
> but I haven't tried to hard to shake out bugs from
> it at this point.
> What should the parser be doing if the keyword does
> NOT match the expected 
> string
> (e.g. do we make it throw an exception, if so what
> exception is a good 
> choice?)?
> 
> Thanks for the help, I'm just trying to do this the
> smart way.
> A revised ANTLR file and Java file are below.
> 
> Regards:
> 
> Bill M.
> 
> *****************Begin ANTLR
> Source*********************************
> //My play area for diagnosing strange ANTLR symptoms
> //Version History: 1.0 WAM created
> 
> 
> // WAM - Need to add some boilerplate to let Antlr
> generated files know that 
> they are part of the ZTestParser package
> header{
> 	package testing;
> }
> 
> class P extends Parser;
> 
> // Parser options
> options{
> 	k = 2; // Token stream lookahead, remember ANTLR
> uses LL(k) parsing
> }
> {
> 	private boolean recognizeKeyWords = true;
> 
> 	// checks to see if minlength or more leading
> characters in pattern are the 
> prefix of str
> 	// note, references the private recognizeKeywords
> member
> 	private boolean kwPrefixMatch(	String str,
> 									String pattern,
> 									int minlength)
> 	{
> 		boolean result;
> 		if (!recognizeKeyWords){
> 			result = false; // don't bother to do additional
> tests at this point
> 		} else if (str.length() > pattern.length()){
> 			result = false; // the string is longer than the
> pattern, so it cannot 
> match
> 		} else if (str.length() < minlength){
> 			result = false; // the string is too to match the
> minimum pattern length
> 		} else {
> 			String strval = new String(str.toLowerCase()); //
> For case sensitivity 
> reasons
> 			result = str.startsWith(pattern);
> 		}
> 		return result;
> 	}
> 
> }
> 
> // Antlr requires Terminals have names beginning
> with uppercase letters, 
> Nonterminals should use lowercase I guess
> startRule
> 	:
> 		// the actual prefix tokens are different in
> practice
> 		getstring:getString
> 		// I would like to do something like the following
> actions where lexer is 
> a type L object used in lexing
> 		// This is not the right syntax for this, but it
> shows the general idea
> 		// {this.lexer.recognizeKeyWord = false;}
> 		strval:ALPHANUMSTRING
> 		// {this.lexer.recognizeKeyWord = true;}
> 		nl1:NEWLINE sr1:startRule// breaks if the user
> types in "dun\n" where \n 
> is newline
> 	|
> 		monthval:month nl2:NEWLINE sr2:startRule
> 	|
> 		// added for testing, but won't work for my
> requirements.
> 		toggleval:toggle nl3:NEWLINE sr3:startRule
> 	|
> 		endval:end nl4:NEWLINE
> 	;
> 
> month
> 	:
> 		(jan | feb)// | mar | apr | may | jun | jul | aug
> | sep | oct | nov | dec)
> 	;
> 
> jan
> 	:
> 		{kwPrefixMatch(LT(1).getText(), "jan", 3)}?
> 		ALPHANUMSTRING
> 	;
> 
> feb
> 	:
> 		{kwPrefixMatch(LT(1).getText(), "feb", 3)}?
> 		ALPHANUMSTRING
> 	;
> 
> 
> getString
> 	:
> 		{kwPrefixMatch(LT(1).getText(), "getstring", 4)}?
> 		ALPHANUMSTRING
> 	;
> 
> toggle
> 	:
> 		{kwPrefixMatch(LT(1).getText(), "toggle", 3)}?
> 		ALPHANUMSTRING
> 	;
> 
> end
> 	:
> 		{kwPrefixMatch(LT(1).getText(), "end", 3)}?
> 		ALPHANUMSTRING
> 	;
> 
> class L extends Lexer;
> 
> // Lexer options
> options{
> 	k=3; // lookahead (need 2 for new line, 3 should be
> enough for months)
> 	charVocabulary='\u0000'..'\u007F'; // Only support
> printable ASCII 
> characters, don't try fancy unicode stuff
> 	// case sensitivitity turned off
> 	caseSensitiveLiterals=false;
> 	caseSensitive=false;
> }
> 
> 
> NEWLINE
>     :   '\r' '\n'    {newline();}        // DOS
>     |   '\r'         {newline();}        //
> Macintosh
>     |   '\n'         {newline();}        // UNIX
>     ;
> 
> 
> WHITESPACE :   ' '  {$setType(Token.SKIP);} // space
> character
>              | '\t' {System.out.println("Found a
> tab"); tab(); 
> $setType(Token.SKIP);};
> 
> protected ALPHANUMERIC: ('a'..'z') | ('0'..'9');
> 
> ALPHANUMSTRING: (ALPHANUMERIC)+;
> ************************Begin Java 
> Source*************************************
> package testing;
> import java.io.*;
> 
> public class Main {
> 
> 
=== message truncated ===

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com