[antlr-interest] section identifiers

Tue Apr 1 10:10:36 PDT 2008

You can either use a filtering lexer, or give your lexer some states using @lexer::header { int state; }. If all you want to do is capture the stuff in between the begin and end and return a token for it, then you could probably just use a filtering lexer, or if it really is just that simple, then just a rule to capture the being and end stuff with some embedded logic to only recognize the end when the following word matches the start word (which you will capture after 'being' and check for after 'end'.

However, you might just find that awk does this just as well if this is all you need out of the file. If this is just part of a larger picture, then a single lexer rule for 'being' and a bit of custom code using input.LA() and input.consume() to find the end will do just as well.

Jim

From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of David Brunton
Sent: Tuesday, April 01, 2008 8:52 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] section identifiers

I am parsing some files that have sections demarcated with the "begin" and "end" at the start of a line, followed by a (matching) arbitrary identifier:

<snip>
begin foo
some stuff
some other stuff
end but don't end the section!
other stuff
end foo
</snip>

I don't need to parse anything inside of the section.

I imagine this is a fairly common problem- any pointers?

Best,
David.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080401/28ef44b4/attachment.html