[antlr-interest] Interest in CUDA Target?

Greg Diamos gdiamos at nvidia.com
Thu Apr 19 12:37:16 PDT 2012

Thank you for the response Márton.

I don't have a tuned implementation yet that would be suitable for benchmarking, just a general outline of an algorithm.  But, based on its characteristics I think that
it could have the potential to get within a factor of 3-4 of peak on a GPU for simple grammars.  

The basic idea algorithm is conservative and bottom-up; it resembles a parallel reduction in structure.  It starts by processing each input character in parallel, and feeding them as the input
to a DFA that transitions to an initial state.  Each subsequent stage merges two (or more) discrete states together, such that the combination of states yields a new state and
possibly generates a token.  Tokens are produced sparsely and eventually gathered together using parallel stream compaction.  I'm mainly considering doing this for lexing at this point, 
but it could be directly extended to parsing by lexical tokens the input rather than characters.

I've attached an example of the state transitions rules for a very simple grammar.  The circles represent states, and the squares represent rules for merging

The worst case runtime of the algorithm on an infinitely parallel machine should be O(Log N) where N is the number of characters
in the longest token.  Real implementations should be able to do much of the state merging in passes over a block of characters
stored in on-chip memory, with additional passes to handle state machines that have not terminated on the block boundaries.  As long
as the majority of tokens can be generated completely within a block of characters, an implementation would have near-linear complexity
in terms of the number of memory accesses. Similar algorithms involving reductions over characters in on-chip memory (merge, intersect)
can come very close to saturating GPU DRAM bandwidth (over 150GB/s).

I still have a few major concerns with the algorithm:
  1) I don't know what the complexity of the state transition table would look like for arbitrary grammars.   The larger this is, the more costly the
      state merging operation will be.  If these are larger than the GPU on-chip memory (similar in size to an L1 cache), it could significantly impact
  2) I don't have a proof that this can be done in general for arbitrary grammars.

The next step is to write down an algorithm for computing a state transition table from a grammar specification, although realistically I'm going to
work through several more examples by hand first.



From: anteusz at freemail.hu [anteusz at freemail.hu]
Sent: Thursday, April 19, 2012 6:25 AM
To: Greg Diamos
Subject: Re: [antlr-interest] Interest in CUDA Target?

On 4/19/2012 2:00 AM, Greg Diamos wrote:
> I'm curious whether people who know parsing and lexing better than me think that there would be interest in a CUDA backend for ANTLR.  I believe that
> I have developed an algorithm that will allow for fine-grained data-parallel parsing of arbitrary LL(*) grammars, and will likely be significantly faster than
> existing implementations on CUDA (it could probably also be adapted to other multi-core processors).
> However, I don't really have a sense of how performance bottlenecked applications that rely on parsing are.  My interest was more of a thought experiment than
> driven by the needs of a particular application, so I'd be interested in any feedback that members of this list could provide.
> Does anyone have an application that would benefit from an order of magnitude or more reduction in parsing time, or are existing backends fast enough?
> Regards,
> Greg


I am interested in parsers in general. I wrote couple of hand-made and
yacc based parsers.
What have you got? How much is it faster to a normal CPU approach?


Márton Papp

This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: state-transitions-bottom-up.png
Type: image/png
Size: 70288 bytes
Desc: state-transitions-bottom-up.png
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20120419/86b5166e/attachment.png 

More information about the antlr-interest mailing list