[antlr-interest] Insert Node Into AST

Chantal Ackermann chantal.ackermann at web.de
Wed Mar 23 08:15:32 PST 2005


Hello all,

I'm parsing natural language (German) but with a very limited grammar. 
Here are some examples translated in English.

'"Navigation to Foo Street 2 in Bla-Town"
(keywords: Navigation (menu), Foo Street (street), 2 (Streenumber), 
Bla-Town (city))
"I want to go to Foo Street"
(keywords: Foo Street (street))

Note, the input needs not be correct German (or English, in this case) 
grammar as only some keywords are of interest. There are some semantic 
rules, though, as to what keywords can appear together, in one input, 
which should show up in the AST.

It ist possible to have these combinations in any order:
(navigation)? (street (streetnumber)? )? (city)?
? - means all are optional though certainly at least one should show up 
to make some sense

it's not possible to have the keywords
(telephone) (street)
in one input as (telephone) directs to the menu "telephone" while 
(street) implies the menu "navigation".

And here comes my problem:
I managed to parse these input sentences and the parser (without and 
with AST) correctly extracted the keywords, but I'd like to insert the 
implied menu information while parsing to have the AST always reflect 
the menu structure.

So, input of the kind: "I want to go to Foo Street"
should not just produce: "Foo Street (street)" as it does now, but 
"Navigation (menu), Foo Street (street)"

Please, don't bother with the form of the output here. I just chose 
this notation to show the name-value pairs of the keywords. The AST 
toStringList method would rather output something like:
( Navigation ( Foo Street ) ) -- hopefully.

This is the body of my Paser class with buildAST=true (I'm working with 
Java, by the way):
It does not compile (="run through antlr") because of the rule 
"naviInput". (it compiles after removing "naviInput" and the comments 
in "topLevelInput" and the call to "naviInput" in there, though.)

topLevelInput
//	:	( NAVIGATION^ ) => NAVIGATION^ ( adresse )?
//	|	adresse // doesn't work: { #NAVIGATION = #( NAVIGATION ); }
	:	naviInput
	|	( TELEFON^ ) => TELEFON^ ( NUMMER! ) NUMBER ( telefonaktion )?
	|	( NUMMER! ) NUMBER telefonaktion
	;

naviInput!
	:	( NAVIGATION ) => NAVIGATION ( adresse )?
	|	a:adresse
	        { #naviInput = #([naviInput], a ); }
	;

anschrift
	:	STREET^ (( HAUSNUMMER! | NUMMER! )? n:NUMBER )?
	;

adresse
	:	( CITY ) => CITY ( anschrift )?
	|	anschrift ( CITY )?
	;

telefonaktion
	:	ANRUFEN
	|	SPEICHERN
	;

The TreeWalker looks like this:

class ProtoTreeWalker extends TreeParser;

topLevelInput
	:	#( naviInput CITY STREET )
	|	#( TELEFON NUMBER ( ANRUFEN | SPEICHERN )? )
	;

Well, this doesn't work at all. Changing "naviInput" to "Navigation" 
after reverting the above Parser to functional state (see above comment 
on the parser), works.

So, shortly, I'd like to insert the node "Navigation" even if it is not 
in the input -- if that contains a street or city or anything alike. 
Maybe I'm thinking along the wrong tracks here -- I admit, I don't 
quite understand what to put into the Parser and what into the 
TreeWalker. Why do I need the TreeWalker, anyway.

Did any of you gurus reach this point? Thanks a lot for reading through!
I'll gladly appreciate advise of any kind (concerning any aspect of my 
code).

Thanks!
Chantal

P.S.: Just for the records -- in the documentation there is an error in 
the following example:
decl!:
     modifiers type ID SEMI;
         { #decl = #([DECL], ID, ([TYPE] type),
                     ([MOD] modifiers) ); }
     ;
trees.html -> "A few examples"
There is one superficial semicolon after "SEMI". What I am missing 
about this example is: What happens to the node declaration when there 
is more than one alternative -- does it only apply to the one it comes 
right after, does it work at all? It's not working for me, at the 
moment. :-/



More information about the antlr-interest mailing list