[antlr-interest] Insert Node Into AST
Chantal Ackermann
chantal.ackermann at web.de
Wed Mar 23 08:15:32 PST 2005
Hello all,
I'm parsing natural language (German) but with a very limited grammar.
Here are some examples translated in English.
'"Navigation to Foo Street 2 in Bla-Town"
(keywords: Navigation (menu), Foo Street (street), 2 (Streenumber),
Bla-Town (city))
"I want to go to Foo Street"
(keywords: Foo Street (street))
Note, the input needs not be correct German (or English, in this case)
grammar as only some keywords are of interest. There are some semantic
rules, though, as to what keywords can appear together, in one input,
which should show up in the AST.
It ist possible to have these combinations in any order:
(navigation)? (street (streetnumber)? )? (city)?
? - means all are optional though certainly at least one should show up
to make some sense
it's not possible to have the keywords
(telephone) (street)
in one input as (telephone) directs to the menu "telephone" while
(street) implies the menu "navigation".
And here comes my problem:
I managed to parse these input sentences and the parser (without and
with AST) correctly extracted the keywords, but I'd like to insert the
implied menu information while parsing to have the AST always reflect
the menu structure.
So, input of the kind: "I want to go to Foo Street"
should not just produce: "Foo Street (street)" as it does now, but
"Navigation (menu), Foo Street (street)"
Please, don't bother with the form of the output here. I just chose
this notation to show the name-value pairs of the keywords. The AST
toStringList method would rather output something like:
( Navigation ( Foo Street ) ) -- hopefully.
This is the body of my Paser class with buildAST=true (I'm working with
Java, by the way):
It does not compile (="run through antlr") because of the rule
"naviInput". (it compiles after removing "naviInput" and the comments
in "topLevelInput" and the call to "naviInput" in there, though.)
topLevelInput
// : ( NAVIGATION^ ) => NAVIGATION^ ( adresse )?
// | adresse // doesn't work: { #NAVIGATION = #( NAVIGATION ); }
: naviInput
| ( TELEFON^ ) => TELEFON^ ( NUMMER! ) NUMBER ( telefonaktion )?
| ( NUMMER! ) NUMBER telefonaktion
;
naviInput!
: ( NAVIGATION ) => NAVIGATION ( adresse )?
| a:adresse
{ #naviInput = #([naviInput], a ); }
;
anschrift
: STREET^ (( HAUSNUMMER! | NUMMER! )? n:NUMBER )?
;
adresse
: ( CITY ) => CITY ( anschrift )?
| anschrift ( CITY )?
;
telefonaktion
: ANRUFEN
| SPEICHERN
;
The TreeWalker looks like this:
class ProtoTreeWalker extends TreeParser;
topLevelInput
: #( naviInput CITY STREET )
| #( TELEFON NUMBER ( ANRUFEN | SPEICHERN )? )
;
Well, this doesn't work at all. Changing "naviInput" to "Navigation"
after reverting the above Parser to functional state (see above comment
on the parser), works.
So, shortly, I'd like to insert the node "Navigation" even if it is not
in the input -- if that contains a street or city or anything alike.
Maybe I'm thinking along the wrong tracks here -- I admit, I don't
quite understand what to put into the Parser and what into the
TreeWalker. Why do I need the TreeWalker, anyway.
Did any of you gurus reach this point? Thanks a lot for reading through!
I'll gladly appreciate advise of any kind (concerning any aspect of my
code).
Thanks!
Chantal
P.S.: Just for the records -- in the documentation there is an error in
the following example:
decl!:
modifiers type ID SEMI;
{ #decl = #([DECL], ID, ([TYPE] type),
([MOD] modifiers) ); }
;
trees.html -> "A few examples"
There is one superficial semicolon after "SEMI". What I am missing
about this example is: What happens to the node declaration when there
is more than one alternative -- does it only apply to the one it comes
right after, does it work at all? It's not working for me, at the
moment. :-/
More information about the antlr-interest
mailing list