[antlr-interest] ANTLR build process performance improvement

Thu Aug 11 14:12:07 PDT 2011

Really? Because I found a comment on BufferedTokenStream that seems to imply
that the Lexer is lazy so that the parser can pass state to it (presumably
to flip gated semantic predicates) and that some ST template feature already
depends on this behavior... Am I reading this wrong?

    /** Buffer all input tokens but do on-demand fetching of new tokens from
     *  lexer. Useful when the parser or lexer has to set context/mode info
before
     *  proper lexing of future tokens. The ST template parser needs this,
     *  for example, because it has to constantly flip back and forth
between
     *  inside/output templates. E.g., <names:{hi, <it>}> has to parse names
     *  as part of an expression but "hi, <it>" as a nested template.

Thanks,
Chris

On Thu, Aug 11, 2011 at 2:03 PM, Sam Harwell <sharwell at pixelminegames.com>wrote:

> ANTLR is designed so with the lexer and parser as independent units. Your
> lexer should be able to run to completion (the entire document) before the
> parser is even created.****
>
> ** **
>
> There are ways around it, but I intentionally avoid getting into situations
> where I have to think about what they might be.****
>
> ** **
>
> Sam****
>
> ** **
>
> *From:* chris king [mailto:kingces95 at gmail.com]
> *Sent:* Thursday, August 11, 2011 2:59 PM
> *To:* Sam Harwell; antlr-interest at antlr.org
> *Subject:* Re: ANTLR build process performance improvement****
>
> ** **
>
> Hey Sam, thanks! I'm using it now and it's faster. Love the VS tool-chain!
> I'd be dead in the water without it. And haven't bumped into any bugs with
> the new version. So cool! ****
>
>  ****
>
> I've bummed into a slightly different problem and no matter how I arrange
> my grammar (included) I can't seem to work around it. I think it might be a
> bug in the SpecialStateTransition logic...****
>
>  ****
>
> Below is what I'm trying to parse along with the trace using your
> enter-exit partial methods (wonderful addition BTW) for my C# preprocessor
> interleaved with when the tokens get pulled from the stream. Also
> interleaved are my calls to toggle SkipSection which is trying to keep track
> of when code is #ifdefed out. So what we see is that [#if] is pulled and
> then [false] and [\r\n]. At that point in the parse I'm at the end of the
> pre-processor line and so I look up the pp_conditional stack to see if
> I'm in a #ifdefed out section of code. In this case I am so I set
> SkipSection to True. That enables (via semantic predicate) my lexer rule
> PP_SKIPPED_CHARACTERS=45 which should suck up any code that is not a pragma
> statement (doesn't start with #). That's all well and good and so the next
> thing that get tokenized is [#pragma warning disable] which is good. ****
>
>  ****
>
> Now at this point I expect that no tokens should get pulled until I reach
> the pp_pragma production. I expect this because I figure ANLTER should be
> able to predict where it needs to go without pulling any more tokens --
> after all the only thing that can follow a [#pragma warning disable] token
> is a list of integers. The actual behavior is that in my
> pp_condition_section production ANTLR pulls [10] as a
> PP_SKIPPED_CHARACTERS instead of an INTEGER because SkipSection is set to
> True. If things had gone as expected and [10] had been pulled in the
> pp_pragma production. If that had happened then SkipSection would have been
> set to False and [10] would be pulled as an INTEGER. ****
>
>  ****
>
> The code that's actually pulling the [10] is DFA.Predict in the
> SpecialStateTransition loop when trying to predict where to go for the
> pp_conditional_section production:****
>
> pp_conditional_section****
>
> : { !SkipSection }? => input_section****
>
> | { SkipSection }? => pp_skipped_section****
>
> ;****
>
> I'm guessing that this production is "special" because it's got those gated
> semantic predicates and that's why DFA.Predict enters into the
> SpecialStateTransition logic. What I don't understand is why it would need
> to pull any more tokens to know where to go next. Do you think that's a bug
> that it's pulling tokens in this case?****
>
>  ****
>
> After writing this e-mail it occurred to me that I might manually try to do
> the prediction. I did this by putting a break point in the
> pp_conditional_section rule at the dfa.Predict line. But instead of asking
> the DFA to do the prediction I just set-ip to the case I wanted (e.g. {...}?
> => pp_skipped_section). Then hit F5. And it works! I included the trace of
> that run below. Given that I really do think that the SpecialStateTransition
> logic (or there abouts) is being to aggressive about pulling tokens... what
> do you think? ****
>
>  ****
>
> Thanks,****
>
> Chris****
>
>  ****
>
> CSharpAst.Parse("#if false\r\n#pragma warning disable
> 10\r\n/*foo*/\r\n#endif");****
>
>  ****
>
> Enter start 1****
>
> [@-1,0:2='#if',<38>,1:0]****
>
>  Enter input_section 2****
>
>   Enter input_section_part 3****
>
>    Enter pp_directive 6****
>
>     Enter pp_conditional 8****
>
>      Enter pp_if_section 9****
>
> [@-1,4:8='false',<4>,1:4]****
>
>       Enter pp_expression 17****
>
>        Enter pp_or_expression 18****
>
>         Enter pp_and_expression 19****
>
>          Enter pp_equality_expression 20****
>
>           Enter pp_unary_expression 21****
>
>            Enter pp_primary_expression 22****
>
> [@-1,9:10='\\r\\n',<29>,1:9]****
>
>            Leave pp_primary_expression 22****
>
>           Leave pp_unary_expression 21****
>
>          Leave pp_equality_expression 20****
>
>         Leave pp_and_expression 19****
>
>        Leave pp_or_expression 18****
>
>       Leave pp_expression 17****
>
>       Enter pp_conditional_block 12****
>
>        Enter pp_new_line 31****
>
> SkipSection = True****
>
> [@-1,11:33='#pragma warning disable',<42>,2:0]****
>
>        Leave pp_new_line 31****
>
>        Enter pp_conditional_section 13****
>
> [@-1,34:36=' 10',<45>,2:23]****
>
> [@-1,37:38='\\r\\n',<29>,2:26]****
>
>         Enter pp_skipped_section 14****
>
>          Enter pp_skipped_section_part 15****
>
>           Enter pp_directive 6****
>
>            Enter pp_leaf_directive 7****
>
>             Enter pp_pragma 29****
>
> SkipSection = False****
>
>              Enter pp_warning_list 30****
>
>  ****
>
>  Here is the trace when I make the prediction by hand:****
>
>  ****
>
> Enter start 1
> [@-1,0:2='#if',<38>,1:0]
>  Enter input_section 2
>   Enter input_section_part 3
>    Enter pp_directive 6
>     Enter pp_conditional 8
>      Enter pp_if_section 9
> [@-1,4:8='false',<4>,1:4]
>       Enter pp_expression 17
>        Enter pp_or_expression 18
>         Enter pp_and_expression 19
>          Enter pp_equality_expression 20
>           Enter pp_unary_expression 21
>            Enter pp_primary_expression 22
> [@-1,9:10='\\r\\n',<29>,1:9]
>            Leave pp_primary_expression 22
>           Leave pp_unary_expression 21
>          Leave pp_equality_expression 20
>         Leave pp_and_expression 19
>        Leave pp_or_expression 18
>       Leave pp_expression 17
>       Enter pp_conditional_block 12
>        Enter pp_new_line 31
> SkipSection = True
> [@-1,11:33='#pragma warning disable',<42>,2:0]
>        Leave pp_new_line 31
>        Enter pp_conditional_section 13
>         Enter pp_skipped_section 14
>          Enter pp_skipped_section_part 15
>           Enter pp_directive 6
>            Enter pp_leaf_directive 7
>             Enter pp_pragma 29
> SkipSection = False
> [@-1,35:36='10',<28>,2:24]
>              Enter pp_warning_list 30
> [@-1,37:38='\\r\\n',<29>,2:26]
>              Leave pp_warning_list 30
>              Enter pp_new_line 31
> SkipSection = True
> [@-1,39:45='/*foo*/',<45>,3:0]
>              Leave pp_new_line 31
>             Leave pp_pragma 29
>            Leave pp_leaf_directive 7
>           Leave pp_directive 6
>          Leave pp_skipped_section_part 15
>          Enter pp_skipped_section_part 15
> [@-1,46:47='\\r\\n',<29>,3:7]
> [@-1,48:53='#endif',<35>,4:0]
>          Leave pp_skipped_section_part 15
>         Leave pp_skipped_section 14
>        Leave pp_conditional_section 13
>       Leave pp_conditional_block 12
>      Leave pp_if_section 9
>      Enter pp_endif 16
>       Enter pp_new_line 31
> SkipSection = True
>       Leave pp_new_line 31
>      Leave pp_endif 16
>     Leave pp_conditional 8
>    Leave pp_directive 6
>   Leave input_section_part 3
>  Leave input_section 2
> Leave start 1****
>
>  ****
>
>  ****
>
> On Thu, Aug 11, 2011 at 9:31 AM, Sam Harwell <sharwell at pixelminegames.com>
> wrote:****
>
>  ****
>
> Hi “brave testers” :)****
>
>  ****
>
> I updated the MSBuild integration for the CSharp3 target to significantly
> improve its performance in several areas. I haven’t tested the update to see
> if it fixes the issues with ReSharper’s IntelliSense engine, but it sure
> would be sweet if it did!****
>
>  ****
>
> 1.       Time to compile grammars should be reduced by 1-2 seconds per
> project containing grammars.****
>
> 2.       The “lag” in the IDE when you change windows away from a modified
> grammar file and when you save a grammar file should be reduced by 1-2
> seconds each time.****
>
> 3.       When you open a project IntelliSense will be ready immediately as
> opposed to waiting until you save a grammar or build the project.****
>
> 4.       When you add or remove a file from the project, IntelliSense
> won’t break.****
>
>  ****
>
> If you’d like to test out the new tool, it’s available in the following 7z
> file. Simply close Visual Studio and replace your existing Antlr3.targets
> and AntlrBuildTask.dll with the ones from this archive and you’re ready to
> go.****
>
>  ****
>
>
> http://www.tunnelvisionlabs.com/downloads/antlr/AntlrBuildTask-experimental-9029.7z
> ****
>
>  ****
>
> Thanks,****
>
> Sam****
>
>  ****
>