[antlr-interest] simple query language EBNF

Pieter Breed antlr.org at pb.co.za
Wed Jan 2 00:49:13 PST 2008


Hi Harald,

Thank you very much for your post; to answer your questions: No, I have
never designed languages before, but I've had to write a few recursive
descent compilers from given EBNF definitions, but that's it. I find it very
hard going for a for-interest-only project, I am thinking of dropping this
for now and looking at this again in a few weeks.

I looked at your suggestions, and came up with the following:

queryLine
    :    fromSpec ;

fromSpec
    : FROM SPECTEXT
    ;

SPECTEXT
    :    (~NL)+ NL
    ;

NL
    : '\r'? '\n'
    ;

WS
    :    (' '|'\t')+ {$channel=HIDDEN;}
    ;

(I left out a few dead ends so the line numbers below won't make sense...)

This seems to hang ANTLRWorks (1.1.5); the interpreting dialog doesn't go
away... there are a few errors on the console though. This is what they say:

[10:48:10] error(100): WorkLogQL.g:0:0: syntax error: buildnfa: <AST>:40:4:
unexpected AST node: ?
[10:48:10] error(100): WorkLogQL.g:0:0: syntax error: buildnfa: <AST>:40:10:
expecting EOA, found ''\n''
[10:48:10] warning(200): WorkLogQL.g:36:8: Decision can match input such as
"'\r'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
[10:48:10] warning(200): WorkLogQL.g:36:8: Decision can match input such as
"'\n'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
[10:48:10] warning(201): WorkLogQL.g:36:8: The following alternatives are
unreachable: 2

[10:48:10] warning(208): WorkLogQL.g:39:1: The following token definitions
are unreachable: NL
[10:48:10] Interpreting...
[10:48:10] error(100): WorkLogQL.g:0:0: syntax error: buildnfa: <AST>:40:4:
unexpected AST node: ?
[10:48:10] error(100): WorkLogQL.g:0:0: syntax error: buildnfa: <AST>:40:10:
expecting EOA, found ''\n''
[10:48:10] warning(200): WorkLogQL.g:36:8: Decision can match input such as
"'\r'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
[10:48:10] warning(200): WorkLogQL.g:36:8: Decision can match input such as
"'\n'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
[10:48:10] warning(201): WorkLogQL.g:36:8: The following alternatives are
unreachable: 2

[10:48:10] warning(208): WorkLogQL.g:39:1: The following token definitions
are unreachable: NL

So the error messages are pretty clear, but I still don't get them... I
thought I was smarter than this ;)

Regards,
Pieter


On Jan 1, 2008 7:45 PM, Harald M. Müller <harald_m_mueller at gmx.de> wrote:

>  Did you succeed?
> I see at least the following problem with your grammar: WS is to be hidden
> from the parser ...
>
> WS
>     :    (' '|'\t'|'\r'? '\n')+ {$channel=HIDDEN;} ;
> ... but you use it in your rules, e.g.
>
> fromSpec returns [IDateRange result]
>     : FROM WS SPECTEXT
>
> The rule should instead be
>
> fromSpec returns [IDateRange result]
>     : FROM SPECTEXT
>
> For the rest, I would say that you do NOT want "everything behind the
> keyword" - at least that would be a very bad language design (have you done
> language design for a few languages already??).
> A good language should allow the human reader to understand where the
> boundaries between "parsed text" and "non-parsed text" are - therefore you
> would design the language e.g. so that the "raw text" is embedded in some
> delimiters:
>
> from    <LastMonth MultipliedBy 3>
> filter  <WeekDays>
> filter  <Not Holidays>
> set     <EachDay 8-hours>
> with    <Expectations>
> But no! - you'll exclaim at this ... my users can readily find out the
> boundaries by ... what? Maybe it's the newlines? - is the following ok??
>
>  from    LastMonth MultipliedBy 3 filter WeekDays filter Not Holidays
> set EachDay 8-hours with Expectations
>
> If it is not, then you have at least an "end delimiter", and you can
> define a symbol
>     REST_OF_TEXT : ~NL NL ;
> where NL is your definition of an NL character.
>
> It the above one-liner IS ok (i.e. there need not be new-line separations
> between clauses), then you should decree that at least the tokenization of
> those "tails" is clear - so that you do NOT allow e.g.
>
> set     EachDay with 'u'
> with   Expectations
> (even though it looks nice: days with 'u' are tUesday, ThUrsday, satUrday
> and sUnday ;-) ).
> In that case, you define a list of tokens for those tails - e.g.,
> identifiers (which in your case include dashes), numbers, and whatever. And
> the specText then becomes
>
>    specText : ( ID | NUMBER | ...)*
>
> To sum up:
>
> * Either you define delimiters around the "open language", between which
> "everything goes" (even there, you may want to track nested parentheses
> etc.)
> * Or you do not delimit the open segments - then you should define the
> tokens allows in them.
>
> Everything else is not so good; and comes usually under the heading "badly
> designed language" ... ... ... ... IMVHO.
>
> Regards
> Harald
>
>  ------------------------------
> *From:* antlr-interest-bounces at antlr.org [mailto:
> antlr-interest-bounces at antlr.org] *On Behalf Of *Pieter Breed
> *Sent:* Friday, December 14, 2007 7:19 AM
> *To:* antlr-interest at antlr.org
> *Subject:* [antlr-interest] simple query language EBNF
>
>  Hi,
>
> I am trying to get a small special purpose query language working with
> ANTLR, and I am having some trouble sorting out the right way to do some
> things.
>
> The basic domain problem is this:
>
> you have some keywords: 'from', 'with', 'display', 'filter', 'set'
> an example of a valid "query" is this:
>
> from    LastMonth MultipliedBy 3
> filter  WeekDays
> filter  Not Holidays
> set     EachDay 8-hours
> with    Expectations
>
> The idea is that ANTLR only takes care of the big structure of the query
> (sorting out what string value goes with from, what string value goes with
> filter etc) and then I will use these strings and do custom parsing on them.
> (Using reflections. Ex, LastMonth is a method on a specific object, it has a
> method Multipliedby which takes a parameter 3 and so on)
>
> My ANTLR problem is that I want the raw text "LastMonth MultipliedBy 3" as
> output from ANTLR, but I don't know how to specify that rule. I don't know
> how say "everything but one of the commandwords". Below I tried to use
> string quoting to delimit the text I am interested in, but that also doesn't
> work.
>
> This is what I have at the moment (I am troubleshooting at the moment, so
> I put the comments in queryLine rule to help with this.):
>
> grammar WorkLogQL;
>
> tokens {
>     FROM = 'from';
>     WITH = 'with';
>     FILTER = 'filter';
>     SET = 'set';
>     DISPLAY = 'display';
> }
>
> queryLine
>     :    fromSpec
>         //(WS filterSpec)*
>         //WS actionSpec
>         //WS withSpec
>     ;
>
> fromSpec returns [IDateRange result]
>     : FROM WS SPECTEXT
>         {
>             result = ParseDateRangeSpecification($SPECTEXT.value);
>         }
>     ;
>
> withSpec
>     :    WITH WS SPECTEXT
>     ;
>
> actionSpec
>     : DISPLAY
>     |    SET WS SPECTEXT
>     ;
>
> filterSpec
>     :    FILTER WS SPECTEXT
>     ;
>
> SPECTEXT
>     :    '\'' .+ '\''
>     ;
>
> WS
>     :    (' '|'\t'|'\r'? '\n')+ {$channel=HIDDEN;} ;
>
> As is (ie, with the comments) and this input:
> from 'Today'
>
> The parser falls over in SPECTEXT. When I am running in ANTLRWorks, in the
> Interpreter mode, I get a tree that looks something like this:
> <grammar worklogql>
> <queryLine>
> <fromSpec>
> <from> - <MismatchedTokenException>
>
> How can I get this working? Any ideas?
>
> Regards,
> Pieter
> --
>
> Tempus est mensura motus rerum mobilium.
> Time is the measure of movement.
>
>    -- Auctoritates Aristotelis
>
> +27 82 567 6207
> http://pieterbreed.blogspot.com/
>
>
>
>


-- 
"Things which matter most, should never be at the mercy of things which
matter least." - Goethe.

+27 82 567 6207
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080102/af865b1a/attachment-0001.html 


More information about the antlr-interest mailing list