[antlr-interest] simple query language EBNF

Tue Jan 1 09:45:18 PST 2008

Did you succeed?
I see at least the following problem with your grammar: WS is to be hidden
from the parser ...

WS  
    :    (' '|'\t'|'\r'? '\n')+ {$channel=HIDDEN;} ;

... but you use it in your rules, e.g.

fromSpec returns [IDateRange result]
    : FROM WS SPECTEXT

The rule should instead be

fromSpec returns [IDateRange result]
    : FROM SPECTEXT

For the rest, I would say that you do NOT want "everything behind the
keyword" - at least that would be a very bad language design (have you done
language design for a few languages already??).
A good language should allow the human reader to understand where the
boundaries between "parsed text" and "non-parsed text" are - therefore you
would design the language e.g. so that the "raw text" is embedded in some
delimiters:

from    <LastMonth MultipliedBy 3>
filter  <WeekDays>
filter  <Not Holidays>
set     <EachDay 8-hours>
with    <Expectations>

But no! - you'll exclaim at this ... my users can readily find out the
boundaries by ... what? Maybe it's the newlines? - is the following ok??

from    LastMonth MultipliedBy 3 filter WeekDays filter Not Holidays set
EachDay 8-hours with Expectations

If it is not, then you have at least an "end delimiter", and you can define
a symbol 
    REST_OF_TEXT : ~NL NL ; 
where NL is your definition of an NL character.

It the above one-liner IS ok (i.e. there need not be new-line separations
between clauses), then you should decree that at least the tokenization of
those "tails" is clear - so that you do NOT allow e.g.

set     EachDay with 'u'
with   Expectations 

(even though it looks nice: days with 'u' are tUesday, ThUrsday, satUrday
and sUnday ;-) ).
In that case, you define a list of tokens for those tails - e.g.,
identifiers (which in your case include dashes), numbers, and whatever. And
the specText then becomes 

   specText : ( ID | NUMBER | ...)*

To sum up:

* Either you define delimiters around the "open language", between which
"everything goes" (even there, you may want to track nested parentheses
etc.)
* Or you do not delimit the open segments - then you should define the
tokens allows in them.

Everything else is not so good; and comes usually under the heading "badly
designed language" ... ... ... ... IMVHO.

Regards
Harald

  _____  

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Pieter Breed
Sent: Friday, December 14, 2007 7:19 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] simple query language EBNF

Hi,

I am trying to get a small special purpose query language working with
ANTLR, and I am having some trouble sorting out the right way to do some
things.

The basic domain problem is this: 

you have some keywords: 'from', 'with', 'display', 'filter', 'set' 
an example of a valid "query" is this:

from    LastMonth MultipliedBy 3
filter  WeekDays
filter  Not Holidays
set     EachDay 8-hours
with    Expectations

The idea is that ANTLR only takes care of the big structure of the query
(sorting out what string value goes with from, what string value goes with
filter etc) and then I will use these strings and do custom parsing on them.
(Using reflections. Ex, LastMonth is a method on a specific object, it has a
method Multipliedby which takes a parameter 3 and so on) 

My ANTLR problem is that I want the raw text "LastMonth MultipliedBy 3" as
output from ANTLR, but I don't know how to specify that rule. I don't know
how say "everything but one of the commandwords". Below I tried to use
string quoting to delimit the text I am interested in, but that also doesn't
work. 

This is what I have at the moment (I am troubleshooting at the moment, so I
put the comments in queryLine rule to help with this.):

grammar WorkLogQL; 

tokens {
    FROM = 'from';
    WITH = 'with';
    FILTER = 'filter';
    SET = 'set';
    DISPLAY = 'display';
}

queryLine
    :    fromSpec 
        //(WS filterSpec)* 
        //WS actionSpec 
        //WS withSpec
    ;

fromSpec returns [IDateRange result]
    : FROM WS SPECTEXT
        {
            result = ParseDateRangeSpecification($SPECTEXT.value); 
        }
    ;

withSpec
    :    WITH WS SPECTEXT
    ;

actionSpec
    : DISPLAY
    |    SET WS SPECTEXT
    ;

filterSpec
    :    FILTER WS SPECTEXT
    ;

SPECTEXT 
    :    '\'' .+ '\''
    ;

WS  
    :    (' '|'\t'|'\r'? '\n')+ {$channel=HIDDEN;} ;

As is (ie, with the comments) and this input: 
from 'Today'

The parser falls over in SPECTEXT. When I am running in ANTLRWorks, in the
Interpreter mode, I get a tree that looks something like this: 
<grammar worklogql>
<queryLine> 
<fromSpec>
<from> - <MismatchedTokenException> 

How can I get this working? Any ideas?

Regards,
Pieter
-- 

Tempus est mensura motus rerum mobilium. 
Time is the measure of movement.

   -- Auctoritates Aristotelis 

+27 82 567 6207
http://pieterbreed.blogspot.com/ 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080101/77b430eb/attachment.html