[antlr-interest] A newbie question and is this mailing listablack hole for

Mon Oct 23 14:21:59 PDT 2006

Hi Dave and ANTLR list members:

Some early experiences show that I may either be executing this method 
wrong,
or that there may be some limitations in the approach.

Attached is a small sample attempt at doing the sort of stuff Dave seems to 
be
hinting at.  I've noticed that I'm getting nondeterminism messages in the 
parser for both
startRule and Month, probably due to the fact all keywords are scanned in as 
ALPHANUMSTRING
tokens, which doesn't give much distinguishing structure at the leaf nodes 
of the parse tree.
Is my solution prone to this?

The grammar also accepts language constructs which I don't think it should 
accept,
but I haven't tried to hard to shake out bugs from it at this point.
What should the parser be doing if the keyword does NOT match the expected 
string
(e.g. do we make it throw an exception, if so what exception is a good 
choice?)?

Thanks for the help, I'm just trying to do this the smart way.
A revised ANTLR file and Java file are below.

Regards:

Bill M.

*****************Begin ANTLR Source*********************************
//My play area for diagnosing strange ANTLR symptoms
//Version History: 1.0 WAM created

// WAM - Need to add some boilerplate to let Antlr generated files know that 
they are part of the ZTestParser package
header{
	package testing;
}

class P extends Parser;

// Parser options
options{
	k = 2; // Token stream lookahead, remember ANTLR uses LL(k) parsing
}
{
	private boolean recognizeKeyWords = true;

	// checks to see if minlength or more leading characters in pattern are the 
prefix of str
	// note, references the private recognizeKeywords member
	private boolean kwPrefixMatch(	String str,
									String pattern,
									int minlength)
	{
		boolean result;
		if (!recognizeKeyWords){
			result = false; // don't bother to do additional tests at this point
		} else if (str.length() > pattern.length()){
			result = false; // the string is longer than the pattern, so it cannot 
match
		} else if (str.length() < minlength){
			result = false; // the string is too to match the minimum pattern length
		} else {
			String strval = new String(str.toLowerCase()); // For case sensitivity 
reasons
			result = str.startsWith(pattern);
		}
		return result;
	}

}

// Antlr requires Terminals have names beginning with uppercase letters, 
Nonterminals should use lowercase I guess
startRule
	:
		// the actual prefix tokens are different in practice
		getstring:getString
		// I would like to do something like the following actions where lexer is 
a type L object used in lexing
		// This is not the right syntax for this, but it shows the general idea
		// {this.lexer.recognizeKeyWord = false;}
		strval:ALPHANUMSTRING
		// {this.lexer.recognizeKeyWord = true;}
		nl1:NEWLINE sr1:startRule// breaks if the user types in "dun\n" where \n 
is newline
	|
		monthval:month nl2:NEWLINE sr2:startRule
	|
		// added for testing, but won't work for my requirements.
		toggleval:toggle nl3:NEWLINE sr3:startRule
	|
		endval:end nl4:NEWLINE
	;

month
	:
		(jan | feb)// | mar | apr | may | jun | jul | aug | sep | oct | nov | dec)
	;

jan
	:
		{kwPrefixMatch(LT(1).getText(), "jan", 3)}?
		ALPHANUMSTRING
	;

feb
	:
		{kwPrefixMatch(LT(1).getText(), "feb", 3)}?
		ALPHANUMSTRING
	;

getString
	:
		{kwPrefixMatch(LT(1).getText(), "getstring", 4)}?
		ALPHANUMSTRING
	;

toggle
	:
		{kwPrefixMatch(LT(1).getText(), "toggle", 3)}?
		ALPHANUMSTRING
	;

end
	:
		{kwPrefixMatch(LT(1).getText(), "end", 3)}?
		ALPHANUMSTRING
	;

class L extends Lexer;

// Lexer options
options{
	k=3; // lookahead (need 2 for new line, 3 should be enough for months)
	charVocabulary='\u0000'..'\u007F'; // Only support printable ASCII 
characters, don't try fancy unicode stuff
	// case sensitivitity turned off
	caseSensitiveLiterals=false;
	caseSensitive=false;
}

NEWLINE
    :   '\r' '\n'    {newline();}        // DOS
    |   '\r'         {newline();}        // Macintosh
    |   '\n'         {newline();}        // UNIX
    ;

WHITESPACE :   ' '  {$setType(Token.SKIP);} // space character
             | '\t' {System.out.println("Found a tab"); tab(); 
$setType(Token.SKIP);};

protected ALPHANUMERIC: ('a'..'z') | ('0'..'9');

ALPHANUMSTRING: (ALPHANUMERIC)+;
************************Begin Java 
Source*************************************
package testing;
import java.io.*;

public class Main {

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		try{
			System.out.println("Enter a string for the test parser (note this is for 
simple ANTLR test cases)");

			L lexer = new L(new DataInputStream(System.in));

			System.out.println("After lexer instantiated before Parser 
instantiation");
			P parser = new P(lexer);
			System.out.println("After Parser instantiation before StartRule");
			parser.startRule();
			System.out.println("After startRule: Done?");
		} catch(Exception e) {
			System.err.println("exception: "+e);
		}
	}

}

>From: "Foolish Ewe" <foolishewe at hotmail.com>
>To: dave at badgers-in-foil.co.uk, antlr-interest at antlr.org
>Subject: Re: [antlr-interest] A newbie question and is this mailing 
>listablack hole for
>Date: Mon, 23 Oct 2006 17:24:06 +0000
>
>David:
>
>Thanks for the reply, I appreciate it!
>
>Regarding the keyword recognition, the language specifies what I call 
>"keyword completion",
>so that if say "namespace" was a keyword and I wanted to recognize "names", 
>"namesp", ...,
>"namespace", then in the lexer to recognize the token I do:
>
>NAMESPACE: "names" ('p' ('a' ('c' ('e')?)?)?)?;
>
>I suspect I may need to roll a comparison method to allow for completion 
>matching.
>
>I hadn't really considered your approach, I guess I could push the keyword 
>recognition
>back on the parser (although I wonder about the performance hit and how to 
>generate
>meaningful error messages).  Off the top of my head, I can't see a show 
>stopper in this
>approach, but I want to think a bit before I try this transformation.
>
>Thanks Again:
>
>Bill M.
>
>
>
>>From: David Holroyd <dave at badgers-in-foil.co.uk>
>>To: antlr-interest at antlr.org
>>Subject: Re: [antlr-interest] A newbie question and is this mailing list 
>>ablack hole for me?
>>Date: Mon, 23 Oct 2006 16:06:27 +0000
>>
>>On Mon, Oct 23, 2006 at 03:46:19PM +0000, Foolish Ewe wrote:
>> > For my job, I am writing a tool to parse a language, that for
>> > historical reasons has what I'll call "undelimited strings", which are
>> > positional string parameters with white space delimiiters.  The
>> > problem becomes that if the undelimited string has a prefix that
>> > matches a keyword, then the scanner will call it a keyword and not a
>> > string (which is understandable but not the behavior I want).
>>
>>I dunno if this helps you, but in the cases where I had the 'is it a
>>keyword or an IDENT?' problem, I just dropped the keyword def from the
>>lexer, and then had a parser rule with a predicate testing the IDENT
>>value.
>>
>>e.g. 'namespace' is sometimes a keyword, and sometimes an identifier,
>>depending on context, so I drop the NAMESPACE definition in the lexer,
>>and then replace all references to NAMESPACE in the grammar to a
>>namespaceKeyword rule, defined like this:
>>
>>namespaceKeyword
>>	:	{input.LT(1).getText().equals("namespace")}? IDENT
>>	;
>>
>>(You could also change the type of the token with a rewrite, if that
>>were useful for your app.)
>>
>>
>>Any good?
>>dave
>>
>>--
>>http://david.holroyd.me.uk/
>
>_________________________________________________________________
>Try the next generation of search with Windows Live Search today!  
>http://imagine-windowslive.com/minisites/searchlaunch/?locale=en-us&source=hmtagline
>

_________________________________________________________________
Add a Yahoo! contact to Windows Live Messenger for a chance to win a free 
trip! 
http://www.imagine-windowslive.com/minisites/yahoo/default.aspx?locale=en-us&hmtagline