[antlr-interest] greedy vs nongreedy lexer rules
Cliff Hudson
cliff.s.hudson at gmail.com
Sun Apr 18 16:52:43 PDT 2010
On the subject of multi-threaded implementation, let me vote +1 for
providing a mechanism for target implementors to take advantage of this. :)
Multi-core systems are the norm now. In my job, we spend a LOT of time
determining how best to extract maximum work in minimum time, and parallel
programming is a big part of that.
On Sun, Apr 18, 2010 at 4:28 PM, Terence Parr <parrt at cs.usfca.edu> wrote:
>
> On Apr 18, 2010, at 4:18 PM, Cliff Hudson wrote:
>
> > Hmm, it seems to me that there should be a way to record the set of
> > recursive actions and appropriate pointers into the lexed string such
> that
> > you can replicate the logical state of the DFA when you execute the
> actions
> > once an alternative is definitely selected. Is there more to the state
> of
> > the system at any given possible action point than a pointer to the start
> of
> > the current substring, its length, and maybe a pointer to the already
> > matched token stream? I am possibly out of my depth here on
> understanding
> > how the lexing system really works. Or I have not adequately explained
> my
> > idea :)
>
> hiya. :) Well, imagine that you are modifying "global" state as you match
> characters in identifier; this is something done in actions that ANTLR can
> analyze. There is no way to "roll this back". ANTLR can only provide the
> information you specified about where it is and what character it was
> parsing--it can't deal with other member variables that you have updated.
> Plus tracking all that information for input character could get expensive
> even if it was limited to that.
>
> I'm leaning towards a trivial NFA implementation (Thompson's algorithm from
> the 60s as described by Russ Cox)
>
> http://swtch.com/~rsc/regexp/regexp2.html
>
> because it allows us to save partial matches like id=ID very easily, unlike
> a DFA. Unfortunately, we really need identifiers and keywords to be
> efficient because that's what most input streams consist of. I'm not sure I
> could get the NFA to go that fast compared to a DFA. On the other hand, the
> new NFA-based mechanism would likely be faster than the current v3 mechanism
> which is guaranteed to recognize the first few characters of any token
> twice. so, in the end I should go for the smallest implementation and the
> simplest that gives us the capabilities. That would spell Thompson's
> NFA-based algorithm. There's even a chance that I could make it go faster
> using two or three threads instead of just one to do the NFA simulation.
>
> choices choices
>
> Ter
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list