[antlr-interest] Reading all text to end-of-line in a rule

Daniels, Troy (US SSA) troy.daniels at baesystems.com
Mon Nov 29 08:29:12 PST 2010


 

> 
> The basic issue seems to be that I want this basic form:
> 
>     <command> [-timeout <NN>] [-notify <email_address>]
> 
> examples of which are:
> 
>     cleanlogs -timeout 20 -notify email1 at biz.com
>     cleanup -timeout 10 -notify "email1 at biz.com email2 at biz.com"
>     deploy -notify me at me.com -list "compA compB compC"
> 
> etc., along with the less-structured shell command types:
> 
>     // with timeout
>     shell -timeout 20 find /x/web -name '*.logs.bak' | xargs rm -f
> 
>     // without timeout
>     shell find /x/web -name '@*' | xargs mv /tmp/
> 

I think this is the main problem that you need to resolve.  The basic form is a highly structured, simple language, that can easily be handled with a small grammar.  The shell command is a complex language that could potentially match valid tokens in your simple language.  (It's generally not illegal to have a shell command called "-notify", just a bad idea.  But some user will do it anyway.)  

I think what you want to do is look at island grammars.  These are typically used when you have two different languages with very different structure in the same input.  (A common example is parsing javadoc comments within a java file.)  You also have a clean entry and exit point for the island grammar.  The lexer normally parses the basic form.  When the lexer encounters "shell", it switches to the island grammar to parse the remainder of the line, then switches back to the basic form for the next line.  This allows you to have a grammar which consumes the rest of the line regardless of content without the need to avoid conflicts with the basic form.

I think either 3.3 or 4 will have better support for this.

> The fact that I want an unquoted email address to be parsed 
> (i.e., foo at bar.com and not 'foo at bar.com') seems to be causing 
> the problem.
> 
> I'm going to try to redo things a bit more cleanly, try to 
> boil down the problem further, and repost if I still have problems.
> 

If you try to keep everything in one grammar, I suspect you will continually have problems like this arise.  If you fix the unquoted email, you might uncover another problem or your next change will introduce a similar problem.

Troy

> Thanks for the help.
> 
> 
> Bill
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 


More information about the antlr-interest mailing list