[antlr-interest] Reading all text to end-of-line in a rule

Michael Matera mike.matera at xilinx.com
Mon Nov 29 10:51:49 PST 2010


Hi,

One way to create island grammars in existing ANTLR is to use 
delimiters.  In other words your language would be much easier to parse 
if it took input like this:

<command> [-timeout <NN>] [-notify <addr>] "shell_command"

Your outside language constructs are now safely tucked away in quotes. 
Now your lexer doesn't need to see them in any meaningful way.  When you 
encounter a quoted string you can produce a STRING token and let higher 
level code deal with what it really means.

Cheers
./m

Daniels, Troy (US SSA) wrote:
>  
> 
>> The basic issue seems to be that I want this basic form:
>>
>>     <command> [-timeout <NN>] [-notify <email_address>]
>>
>> examples of which are:
>>
>>     cleanlogs -timeout 20 -notify email1 at biz.com
>>     cleanup -timeout 10 -notify "email1 at biz.com email2 at biz.com"
>>     deploy -notify me at me.com -list "compA compB compC"
>>
>> etc., along with the less-structured shell command types:
>>
>>     // with timeout
>>     shell -timeout 20 find /x/web -name '*.logs.bak' | xargs rm -f
>>
>>     // without timeout
>>     shell find /x/web -name '@*' | xargs mv /tmp/
>>
> 
> I think this is the main problem that you need to resolve.  The basic form is a highly structured, simple language, that can easily be handled with a small grammar.  The shell command is a complex language that could potentially match valid tokens in your simple language.  (It's generally not illegal to have a shell command called "-notify", just a bad idea.  But some user will do it anyway.)  
> 
> I think what you want to do is look at island grammars.  These are typically used when you have two different languages with very different structure in the same input.  (A common example is parsing javadoc comments within a java file.)  You also have a clean entry and exit point for the island grammar.  The lexer normally parses the basic form.  When the lexer encounters "shell", it switches to the island grammar to parse the remainder of the line, then switches back to the basic form for the next line.  This allows you to have a grammar which consumes the rest of the line regardless of content without the need to avoid conflicts with the basic form.
> 
> I think either 3.3 or 4 will have better support for this.
> 
>> The fact that I want an unquoted email address to be parsed 
>> (i.e., foo at bar.com and not 'foo at bar.com') seems to be causing 
>> the problem.
>>
>> I'm going to try to redo things a bit more cleanly, try to 
>> boil down the problem further, and repost if I still have problems.
>>
> 
> If you try to keep everything in one grammar, I suspect you will continually have problems like this arise.  If you fix the unquoted email, you might uncover another problem or your next change will introduce a similar problem.
> 
> Troy
> 
>> Thanks for the help.
>>
>>
>> Bill
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: 
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 

This email and any attachments are intended for the sole use of the named recipient(s) and contain(s) confidential information that may be proprietary, privileged or copyrighted under applicable law. If you are not the intended recipient, do not read, copy, or forward this email message or any attachments. Delete this email message and any attachments immediately.




More information about the antlr-interest mailing list