[antlr-interest] Newbie question on ANTLR
Gerald Rosenberg
gerald at certiv.net
Tue Feb 1 13:50:23 PST 2011
Welcome.
Take a look at the SQL grammars to get an idea what you are getting
into. Look for the Antlr Wiki page on handling when keywords overlap
other strings. If your language is going to grow much beyond what you
show below, a real NLP tool is likely to be required. Take a look at
openNLP on sourceforge.
That said, very difficult to quickly pinpoint the problem with your
current grammar. Beyond being a matter of good style, you need to move
all keywords/constant strings in the lexer. That is the only hope to
quickly see how the stream will tokenize. Lexer rules are evaluated top
down and nominally with a fixed look-ahead of 1. So, for example, the
lexer rules
THERE: 'there';
THE: 'the';
will fail on the input 'the' and 'then' -- expecting an 'r' and got
either nothing or an 'n'.
Also, spaces are typically only token delimiters, for human
convenience. Theycontainnootherinformation. Best to hide them from the
parser.
BTW, if I had to guess, the problem you are seeing is that the grammar
is expecting no space between 'a' and 'person'.
Best,
Gerald
------ Original Message (Tuesday, February 01, 2011 11:32:55
AM) From: John Ibbotson ------
Subject: [antlr-interest] Newbie question on ANTLR
> Hi,
> I'm a newcomer to ANTLR and am trying to write a grammar to parse
> controlled natural language. The idea is to parse sentences then convert
> to RDF using Jena. A colleague has already written a version in Prolog so
> I'm looking to do a Java version. My starting point is to write the
> following grammar:
>
> rule: cesentence;
>
> cesentence: sentence FSTOP;
> sentence: declarative;
> declarative: simpleds name?;
> simpleds: ('there is' existentialnp) |
> (nounp verbp) |
> ('it is' ('true' | 'false' | 'unknown') 'that'
> generalproposition);
> existentialnp: ('a' | 'an') description;
> description: noun namedecl? relativeclause?;
> namedecl: ('named' name) |
> variable |
> ('known as' name);
> nounp: existentialnp |
> referentialnp;
> verbp: simplevp ('and' simplevp)*;
> simplevp: (('has' | 'does not have') simplenp 'as'
> functionalnoun);
> verbcomp: simplenp;
> simplenp: ('(' simplenp ')') |
> existentialnp |
> referentialnp;
> referentialnp: ('the' noun (name | variable)) |
> variable |
> ('the type' noun) |
> ('the' noun 'known as' name);
> generalproposition: simpleds |
> QUOTE;
> relativeclause: ('that' verbp) |
> ('described as' QUOTE);
>
> // CE Lexical categories
> name: STRNG;
> noun: 'person';
> functionalnoun: 'brother';
> variable: UCSTR;
>
> // Terminators
> FSTOP: '.';
> UCSTR: ('A'..'Z' | '0'..'9')+; // String
> with upper and digits
> STRNG: (UCSTR | 'a'..'z')+; // String with
> upper, lower and digits
> SPACE: ' ';
> QUOTE: '"' (STRNG | SPACE)* '"';
>
> which was pretty straightforward. I then tested it using the interpreter
> (Eclipse IDE plugin) on the following sentence:
>
> there is a person named Fred.
>
> but the parser falls over on the existentialnp rule. Any help appreciated.
>
> Regards,
> John
>
> John Ibbotson PhD CEng FIET
> Master Inventor
> ITA Project, Emerging Technology Services
> Hursley Park, MP137, Winchester, Hants. SO21 2JN, UK
>
> Tel: +44 1962 815188
> Email: john_ibbotson at uk.ibm.com
>
> ITA: http://www.usukita.org
>
> Technical Solutions to business problems that require innovation across
> IBM knowledge portfolio.
>
> "A doctor can bury his mistakes but an architect can only advise his
> clients to plant vines." Frank Lloyd Wright
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list