[antlr-interest] Ambiguity question

Thu Apr 8 20:38:41 PDT 2004

I am uncertain how to best parse using ANTLR a language in which the
following is legal:

a b c.
d - e f - g.
- h.
i j k -
*
l -- m --

Assume all whitespace is skipped by the lexer (so the input could have
been in all one big line).  The task is to parse into sentences (as
visually indicated above).

My starting point is the following:

start: (sentence)+ EOF
    ;

sentence: contents terminator
    | JUNK
    ;

contents: (content)+
    ;

content: WORD
    | DASH {
       // want to leave as DASH for tree parser to use
    }
    ;

terminator: PERIOD
    | DASH  {
        // want to turn into an INTERRUPTION token for tree parser
        // to use
    }
    ;

This of course isn't enough, because context is needed in order to
determine whether a DASH is content or a terminator.

Is there a way to do this in one pass?  Or is it better to weaken the
grammar to parse into something for an intermediate tree parser to
transform?

-- 
Franklin

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/