[antlr-interest] How do I accept input ending with a newline or EOF?

Thu Feb 3 16:00:14 PST 2011

Kirby thanks! That helped a ton and thanks for that + vs * tip. A real life
saver.

I have another problem and I'm hoping you can point me in the right
direction. I'm trying to chose between two approaches for building for a
pre-processor. The first (1) approach is to have the pre-processor pass
tokens to the compiler. The second (2) approach is to have the pre-processor
pass strings (those that have not been #if defed out) to the compiler. The
former seems more natural but complicates the lexer because the the lexing
is context sensitive (see below). The latter simplifies both pre-processor
and compiler but feels ugly because it requires the input to be lexered
twice.

As I said, the problem I encountered with the first approach is that the
lexer is context sensitive. For example, consider the following toy grammar
where pre-processor identifiers can be upper or lower case but language
identifiers can only be lower case. The input "'#define HELLO" parses fine
but "#define hello" fails because (I assume) "hello" could be match by two
lexer productions -- ID and PP_ID. I tried inserting a predicate in ID
(e.g. ID : {false}?=> 'a'..'z';) to provide context but if I do then
ANTLRWorks spins when I try to interpret any input. I've also tried fiddling
with the order of ID and PP_ID but each ordering has it's own problems (e.g.
can only make one of the following for a given order: { "hello", "#define
hello" }).

start
        : input*
        ;
input
        : ID+ (NEW_LINE | EOF)
        | pp_input
        ;

pp_input
        : '#' 'define' PP_ID+ (NEW_LINE | EOF)
        ;

NEW_LINE
        : '\r' '\n'
        ;
ID
        : 'a'..'z';

PP_ID
        : 'a'..'z'
        | 'A'..'Z';

This seems like a standard 101 type problem space so hopefully you've
explored it and can direct me! :)

Thanks,
Chris

On Mon, Jan 31, 2011 at 4:03 PM, Kirby Bohling <kirby.bohling at gmail.com>wrote:

> No idea if it is related to the problem, but you likely really want to
> have ID use a '+' not a '*' after ('a'..'z'), otherwise ID to match
> nothing and be cause an infinite loop while lexing at points
> (generally speaking, any time rules like
>
> bar: (foo)*;
>
> foo: (baz)*;
>
> You are just asking for problems.  Whether foo and baz are lexers or
> parser rules.  Every time I do that it is a mistake (or a failure of
> imagination).  Generally speaking, low level items you want to force
> the consumption of something, and make them optional at a higher level
> (at least that has been true in my limited experience).
>
> I believe the EOF is precisely because of the lack of a + vs. a *
> there.  As rather then consume the EOF, you can spin consuming nothing
> forever.  But I didn't actually crack out ANTLR and check.
>
> Also, unless you really know what you are doing, you might want to
> skip using constants in your parser rules.  While many the examples do
> so, from what I've read, it can have complex interaction (it generates
> a token for it internally that can't be seen).  I'd try making a
> NEWLINE token and seeing if that helps make the error message any
> clearer.
>
> Kirby
>
>
> On Mon, Jan 31, 2011 at 5:49 PM, chris king <kingces95 at gmail.com> wrote:
> > Hello! I'm trying to write a grammar that will accept lines of zero or
> more
> > IDs and I'd like to allow the last line to end in a new line *or *EOF. I
> > came up with this grammar:
> >
> > grammar test;
> >
> > start
> >  : input*
> >  ;
> >
> > input
> >  : ID* ('\n' | EOF)
> >  ;
> >
> > ID
> >  : ('a'..'z')*
> >  ;
> >
> > WHITESPACE
> >  : ' '+ {skip();}
> >  ;
> >
> > But got this error from ANTLRWorks saying start has un-reachable
> > alternatives:
> >
> > [15:38:33] error(201): test2.g:9:5: The following alternatives can never
> be
> > matched: 2
> >
> > If I remove the reference to EOF than everything works but I have to end
> the
> > last line in a new line and I don't want to have to do that. Any
> > suggestions?
> >
> > Thanks,
> > Chris
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>

[antlr-interest] How do I accept input ending with a newline *or* EOF?

[antlr-interest] How do I accept input ending with a newline or EOF?