[antlr-interest] Unexpected Token dearly expected...

Chantal Ackermann chantal.ackermann at web.de
Wed Dec 13 07:54:26 PST 2006


Hi to all!

I am using ANTLR to parse some "natural" German language input. It is meant as the part of a language processing engine that does the semantic parsing of the language input.

As the language input determines which domain is meant and which filter/action is to apply to this domain's data, I went for several small parsers:
1. one "DomainParser" which extracts the domain information from the input.
    This works quite well. It's straight forward, and ignores everything except the keywords that specify the domain.
2. several "Filter*Parser" which extract any additional information applying to their specific domain.
    So there is a FilterCalendarParser, e.g., which should understand input of the kind "today's dates" (well, at least in German).

I do have some problems, though, and so I'm asking you for advice. Thanks a lot for any help or input on that matter!

1. Problem:
Input like "meine heutigen termine" works but I get a NoViableAltException for "termine die nächsten 2 stunden"

2006-12-13 16:22:01,742 DEBUG [VuiController] Normalized: termine die naechsten 2 stunden
2006-12-13 16:22:01,743 DEBUG [DomainParser] Extracting domain...
Domain: calendarMenu
2006-12-13 16:22:01,745 DEBUG [SpeechParser] Extracting calendar fields...
line 1:32: unexpected token: null
	at de.infoman.voice.parser.FilterCalendarParser.extractFields(FilterCalendarParser.java:94)

After testing around with inputs like "termine 2" where the filter information would consist of just a DIGIT token, I am at a loss. A lexer token

DIGIT: (0..9);

works well, and is accepted by the parser. But a more complex token that contains DIGIT and some other string:

STUNDEN_ANGABE
	:	DIGIT " Stunden"
	;

isn't recognized at all. In that case, I get the above unexpected token exception (NoViableAltException). At the end of this mail, I'll paste the FilterCalendarParser.g file. I'd gladely appreciate if one of you experts could have a look at it and point my nose on the mistake.

2. Question:
How do I handle the case when there really is no input of the expected kind? Say, there actually wasn't any domain information in the input. What is the best way to go about that? Do I just catch the exception or do I specify a custom errorhandler?

Kind regards,
Chantal


************************************************
header
{
	package de.infoman.voice.parser;
	
	import java.util.Map;
	import java.util.HashMap;
	import de.infoman.voice.parser.FieldNameConstants;
	import de.infoman.voice.parser.ControlConstants;
	import de.infoman.ThemeConstants;
	import org.apache.log4j.Logger;
}

/******************** PARSER **************************/

class FilterCalendarParser extends Parser;

options
{
	defaultErrorHandler = false;
}

{
	private static final Logger log= Logger.getLogger(FilterCalendarParser.class);
}
	
extractFields [Map container]
returns [Map map]
{
	log.debug("Extracting calendar fields...");
	map = container;
}
	:	HEUTE
			{
				map.put(FieldNameConstants.SELECTION, ControlConstants.TODAY);
			}
	|	TAG
			{
				map.put(FieldNameConstants.SELECTION, ControlConstants.TODAY_DAY);
			}
	|	STUNDEN_ANGABE
			{
//				System.out.println("Stunden angegeben: " + h);
				map.put(FieldNameConstants.SELECTION, ControlConstants.HOURS_FUTURE);
//				map.put(FieldNameConstants.NUMBER, h);
			}
	;
			

/******************** LEXER **************************/

class FilterCalendarLexer extends Lexer;

options
{
	//charVocabulary = '\3'..'\377';
	//defaultErrorHandler = false;
	k=3;
	filter=true;
	//caseSensitive=false;
}

STUNDEN_ANGABE
	:	DIGIT " Stunden"
	;
	

TAG
	:	"tagsueber"
	;

HEUTE
	:	"heut" ('e' | "ige")
	;

protected DIGIT
	:	('0'..'9')
	;
______________________________________________________________________________
"Ein Herz für Kinder" - Ihre Spende hilft! Aktion: www.deutschlandsegelt.de
Unser Dankeschön: Ihr Name auf dem Segel der 1. deutschen America's Cup-Yacht!



More information about the antlr-interest mailing list