[antlr-interest] recursive semantic scanning (recursive lookahead?)

thereisnofreeid chantal.ackermann at web.de
Tue Apr 1 04:19:36 PST 2003


hello all,

I want to do the following:

1. Split a sentence into words
2. check whether a word is equal to a certain term, or whether it is
the beginning of a certain term (if the phrase is a term). In the
latter case, checking should be recursive to detect when several words
match one term.
3. all words shall be counted (if a phrase is detected, it counts as one).

I have no problems with point 1 (done in the Lexer) and 3 (done in the
Parser). For 2, I am stuck with the following code (in the lexer):

TERM1
	:	{ searchedTerms.equalsFirstTerm($getText()) }? PART_TERM1
	|	{ searchedTerms.equalsFirstTerm($getText()) }? WORD
	// reset token type for terms that start like but are not equal to term1
	|	PART_TERM1 { $setType(Token.WORD); }
	;

protected PART_TERM1
	:	( pt:PART_TERM1 WS w:WORD
		{ searchedTerms.firstTermStartsWith(pt.getText() + " " +
w.getText()) }? )
			=> ( PART_TERM1 )
	|	( { searchedTerms.firstTermStartsWith($getText()) }? WORD ) => (
PART_TERM1 )
	//|	( PART_TERM1 WS )* WORD
	;

(I get infinite recursion messages from antlr with this code.)

searchedTerms is an instance of a custom java class that does specific
string operations. searchedTerms stores several terms that can match
the so-called "first term" (which is rather a set of terms with equal
meaning). searchedTerms will be provided during runtime, thus I cannot
hard code any terms into the lexer/parser.

My problems are:

- where shall I do the checking - in the Lexer, while recognizing the
words? or in the parser, after splitting into words?
- how can I tell the lexer (or parser) that if the word is the
beginning of the term (rule PART_TERM1), it shall try to match the
term (rule TERM1), or concatenate it and try matching again (first
with PART_TERM1, than with TERM1) and so on, until PART_TERM1 _and_
TERM1 both "return true" (well, match). and if they do not match, all
words shall get the type token and be send to the parser for counting.

Or shall I rather make one (protected) rule that does all the checking
and returns some value to indicate the result (that has just come to
my mind).

Maybe there is already lots of sample code around but I don't know the
terms that would describe it and that I could use to search for it.

Thanks for your help
Chantal


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list