[antlr-interest] Re: special c/c++ parsing

Wed May 14 13:29:14 PDT 2003

What to do if it succeeds?  Call all those tokens consumed and start at the
following?

method might have expressions in it that don't parse--causing method to
fail, however it's pretty easy to recognize a method by the signature and
curlies ignoring what's inside.  So do you make your algorithm recursive
noting the top level entry points--so when method looks for statements you
apply this algorithm to see if you have a statement but it's ok if not?

I like the LAPIS http://graphics.lcs.mit.edu/lapis/ approach better of
tokenizing and then searching with "structured (nested) regular
expressions."  Parsing seems precise to me, searching is not.  If you only
write half a parser you're left searching for where to apply the rules and
where not to....

Monty

-----Original Message-----
From: Terence Parr [mailto:parrt at jguru.com]
Sent: Wednesday, May 14, 2003 11:51 AM
To: antlr-interest at yahoogroups.com
Subject: Re: [antlr-interest] Re: special c/c++ parsing

On Wednesday, May 14, 2003, at 11:41  AM, lgcraymer wrote:

> I'll echo Monty's comment.  Function calls can appear in enough places
> (including complex expressions and argument lists to functions) that
> it would be difficult to identify a subset grammar.  It is much easier
> to prune, even when you are dealing with a language as cumbersome as
> C++.

I've often wondered if something like the following (insanely slow) 
approach would work:

1. You provide a set of possible top-level match rules you are 
interested in matching like expr and method.

2. You provide a lexer that knows how to ignore comments and how to 
identify all tokens that could be seen (not just ones you are 
interested in).

3. Start walking the input token-by-token, attempting to match one of 
the top-level rules starting at token i.  If an attempt fails, try 
another top-level rule.  Failing that, move to next token and try again.

This mirrors the naive string search algorithm done by freshman CS 
students, but might actually work.  If you didn't care about speed, 
just ease of building the translator, I wonder if this would work.  It 
sounds actually like a very simple TokenStream object :)

Anybody wanna comment on the cases where this would fail?

Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Co-founder, http://www.peerscope.com link sharing, pure-n-simple
Lecturer in Comp. Sci., University of San Francisco

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/