[antlr-interest] Struggling with recursion error

Jim Idle jimi at temporal-wave.com
Sun Dec 13 10:01:29 PST 2009




> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> 
> Well, now I just have to tell a little story of how semicolons can
> cause a lot of pain, it happened to me recently. It begins with the
> horror that is T-SQL, which is one of my day-job languages. The other
> day I started working with Service Broker, a queue technology built
> into SQL Server. It's great stuff, but patching it into T-SQL
> apparently wasn't entirely painless. To send messages to the queue you
> use SEND, e.g.:
> 
> SEND ON CONVERSATION ...
> 
> Now, I had written a simple procedure to send a message, and I wanted
> to try it. But it didn't compile, complaining about some error at a
> line that completely didn't make sense.
> 
> I spent god knows how much time on this, and then accidentally bumped
> into a blog post where I noticed something strange. The writer always
> used a semicolons before his SEND statements, as in ";SEND ON
> CONVERSATION". And then I noticed a comment saying that the ; was
> there because of some issues with the T-SQL parser. I guess that in
> T-SQL semicolons are usually optional, but in the case of SEND,
> whatever comes before it must apparently end with a semicolon.
> Wonderful.

Your T-SQL issue comes from language 'non-design' ;-) and the fact it is a hand crafted parser trying to deal with ambiguity. You could try my parser at: www.temporal-wave.com and let me know if I can parse that (if you want to give me an example if I cannot, I would be grateful). Basically this isn't an example of SEMI causing problems, but the fact that as T-SQL does not enforce it, it must break its own rules and enforce it 'sometimes'!! If the SEMI was universally required, then parsing SQL/T-SQL would be much easier and as a consequence, the error messages received would be much more accurate. In other words, this example completely reinforces my point ;-)

> 
> Anyway, I care about syntax, otherwise I would have chosen

Od course.

> S-Expressions as the "syntax" for my language, and therefore I don't
> want to pollute the syntax with unnecessary parens and commas and
> semi-colons. 

Unnecessary ones, of course, but for your problem, the parens are necessary. 

> It may not be a big deal in practice, I guess it's one of
> those areas where programmers have very different views, but it's
> certainly useful in learning Antlr because it seems to give me a few
> extra challenges that require digging a little deeper.

Yes - it will also allow you to understand where such things are not really superfluous after all. Even humans cannot scan without backtracking if there is no punctuation at all.

> 
> But back to my problem. With a little guidance from you and John it
> looks like I've nailed it. 

Cool.

I just had to get over the idea of having
> completely arbitrary expressions as arguments, and instead only
> allowing atom expressions like literals or IDs as arguments, anything
> else must be wrapped in parens. 

That will work, but beware that you are reducing your ability to show errors semantically rather than syntactically. Then you get errors like your SQL one where it says "SEND unexpected here", which gives you no clue, but "Arguments for functions cannot be complicated..." does.

> Makes more sense in practice anyway.
> Also, as you noticed I didn't quite understand the precedence, leading
> me to pull the function application all the way up to the root
> expression, I've now moved it down so it has the highest precedence.

That is the usual mistake. Only takes learning once though.

> The final grammar is included below. The only thing I'm not perfectly
> happy with is the semantic predicate in the funcAppl rule. It is
> necessary to stop Antlr from emitting a warning, but it does seem like
> Antlr does exactly what I want without the predicate. 

You can ignore the warning if it does what you want. In the next release we will have warning suppression based on a cool idea from Ter.

> I understand the
> warning, as Antlr can match both the ID at the start of funcAppl and
> at the start of atomExpr. But since it all works just fine without the
> predicate it's tempting to lose it and and presumably save some
> resources, but I'd like to know for sure.

Left factor the ID's, then you only need a single token predicate as per the grammar I sent you in the earlier reply. Avoid entire rules in syntactic predicates wherever you can.

Jim






More information about the antlr-interest mailing list