[antlr-interest] Newbie question on ANTLR

Gerald Rosenberg gerald at certiv.net
Tue Feb 1 13:50:23 PST 2011


Welcome.

Take a look at the SQL grammars to get an idea what you are getting 
into.  Look for the Antlr Wiki page on handling when keywords overlap 
other strings.  If your language is going to grow much beyond what you 
show below, a real NLP tool is likely to be required.  Take a look at 
openNLP on sourceforge.

That said, very difficult to quickly pinpoint the problem with your 
current grammar.  Beyond being a matter of good style, you need to move 
all keywords/constant strings in the lexer.  That is the only hope to 
quickly see how the stream will tokenize.  Lexer rules are evaluated top 
down and nominally with a fixed look-ahead of 1.  So, for example, the 
lexer rules

THERE: 'there';
THE: 'the';

will fail on the input 'the' and 'then' -- expecting an 'r' and got 
either nothing or an 'n'.

Also, spaces are typically only token delimiters, for human 
convenience.  Theycontainnootherinformation.  Best to hide them from the 
parser.

BTW, if I had to guess, the problem you are seeing is that the grammar 
is expecting no space between 'a' and 'person'.

Best,
Gerald



------ Original Message (Tuesday, February 01, 2011 11:32:55 
AM) From: John Ibbotson ------
Subject: [antlr-interest] Newbie question on ANTLR
> Hi,
> I'm a newcomer to ANTLR and am trying to write a grammar to parse
> controlled natural language. The idea is to parse sentences then convert
> to RDF using Jena. A colleague has already written a version in Prolog so
> I'm looking to do a Java version. My starting point is to write the
> following grammar:
>
> rule: cesentence;
>
> cesentence:             sentence FSTOP;
> sentence:               declarative;
> declarative:            simpleds name?;
> simpleds:               ('there is' existentialnp) |
>                          (nounp verbp) |
>                          ('it is' ('true' | 'false' | 'unknown') 'that'
> generalproposition);
> existentialnp:          ('a' | 'an') description;
> description:            noun namedecl? relativeclause?;
> namedecl:               ('named' name) |
>                          variable |
>                          ('known as' name);
> nounp:                  existentialnp |
>                          referentialnp;
> verbp:                  simplevp ('and' simplevp)*;
> simplevp:               (('has' | 'does not have') simplenp 'as'
> functionalnoun);
> verbcomp:               simplenp;
> simplenp:               ('(' simplenp ')') |
>                          existentialnp |
>                          referentialnp;
> referentialnp:          ('the' noun (name | variable)) |
>                          variable |
>                          ('the type' noun) |
>                          ('the' noun 'known as' name);
> generalproposition:     simpleds |
>                          QUOTE;
> relativeclause:         ('that' verbp) |
>                          ('described as' QUOTE);
>
> // CE Lexical categories
> name:                   STRNG;
> noun:                   'person';
> functionalnoun: 'brother';
> variable:               UCSTR;
>
> // Terminators
> FSTOP:          '.';
> UCSTR:                  ('A'..'Z' | '0'..'9')+;                 // String
> with upper and digits
> STRNG:          (UCSTR | 'a'..'z')+;                    // String with
> upper, lower and digits
> SPACE:                  ' ';
> QUOTE:          '"' (STRNG | SPACE)* '"';
>
> which was pretty straightforward. I then tested it using the interpreter
> (Eclipse IDE plugin) on the following sentence:
>
> there is a person named Fred.
>
> but the parser falls over on the existentialnp rule. Any help appreciated.
>
> Regards,
> John
>
> John Ibbotson PhD CEng FIET
> Master Inventor
> ITA  Project, Emerging Technology Services
> Hursley Park, MP137, Winchester, Hants. SO21 2JN, UK
>
> Tel:         +44 1962 815188
> Email:     john_ibbotson at uk.ibm.com
>
> ITA:                   http://www.usukita.org
>
> Technical Solutions to business problems that require innovation across
> IBM knowledge portfolio.
>
> "A doctor can bury his mistakes but an architect can only advise his
> clients to plant vines." Frank Lloyd Wright
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list