[antlr-interest] greedy vs nongreedy lexer rules

Cliff Hudson cliff.s.hudson at gmail.com
Sun Apr 18 16:52:43 PDT 2010


On the subject of multi-threaded implementation, let me vote +1 for
providing a mechanism for target implementors to take advantage of this. :)
 Multi-core systems are the norm now.  In my job, we spend a LOT of time
determining how best to extract maximum work in minimum time, and parallel
programming is a big part of that.

On Sun, Apr 18, 2010 at 4:28 PM, Terence Parr <parrt at cs.usfca.edu> wrote:

>
> On Apr 18, 2010, at 4:18 PM, Cliff Hudson wrote:
>
> > Hmm, it seems to me that there should be a way to record the set of
> > recursive actions and appropriate pointers into the lexed string such
> that
> > you can replicate the logical state of the DFA when you execute the
> actions
> > once an alternative is definitely selected.  Is there more to the state
> of
> > the system at any given possible action point than a pointer to the start
> of
> > the current substring, its length, and maybe a pointer to the already
> > matched token stream?  I am possibly out of my depth here on
> understanding
> > how the lexing system really works.  Or I have not adequately explained
> my
> > idea :)
>
> hiya. :)  Well, imagine that you are modifying "global" state as you match
> characters in identifier; this is something done in actions that ANTLR can
> analyze. There is no way to "roll this back".  ANTLR can only provide the
> information you specified about where it is and what character it was
> parsing--it can't deal with other member variables that you have updated.
>  Plus tracking all that information for input character could get expensive
> even if it was limited to that.
>
> I'm leaning towards a trivial NFA implementation (Thompson's algorithm from
> the 60s as described by Russ Cox)
>
> http://swtch.com/~rsc/regexp/regexp2.html
>
> because it allows us to save partial matches like id=ID very easily, unlike
> a DFA. Unfortunately, we really need identifiers and keywords to be
> efficient because that's what most input streams consist of. I'm not sure I
> could get the NFA to go that fast compared to a DFA. On the other hand, the
> new NFA-based mechanism would likely be faster than the current v3 mechanism
> which is guaranteed to recognize the first few characters of any token
> twice. so, in the end I should go for the smallest implementation and the
> simplest that gives us the capabilities. That would spell Thompson's
> NFA-based algorithm. There's even a chance that I could make it go faster
> using two or three threads instead of just one to do the NFA simulation.
>
> choices choices
>
> Ter
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list