[antlr-interest] ANTLR build process performance improvement

Thu Aug 11 14:51:49 PDT 2011

There is a non-buffered  stream but lexer/parser communication is fraught
with difficulties unless you have a scannerless parser. However, this does
not matter anyway because a C# pre-processor MUST be ENTIRELY lexer based
as per the language specification (and it tells you why). Read through
that section as it will help you a lot.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of chris king
> Sent: Thursday, August 11, 2011 2:12 PM
> To: Sam Harwell
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] ANTLR build process performance
> improvement
>
> Really? Because I found a comment on BufferedTokenStream that seems to
> imply that the Lexer is lazy so that the parser can pass state to it
> (presumably to flip gated semantic predicates) and that some ST
> template feature already depends on this behavior... Am I reading this
> wrong?
>
>     /** Buffer all input tokens but do on-demand fetching of new tokens
> from
>      *  lexer. Useful when the parser or lexer has to set context/mode
> info before
>      *  proper lexing of future tokens. The ST template parser needs
> this,
>      *  for example, because it has to constantly flip back and forth
> between
>      *  inside/output templates. E.g., <names:{hi, <it>}> has to parse
> names
>      *  as part of an expression but "hi, <it>" as a nested template.
>
> Thanks,
> Chris
>
> On Thu, Aug 11, 2011 at 2:03 PM, Sam Harwell
> <sharwell at pixelminegames.com>wrote:
>
> > ANTLR is designed so with the lexer and parser as independent units.
> > Your lexer should be able to run to completion (the entire document)
> > before the parser is even created.****
> >
> > ** **
> >
> > There are ways around it, but I intentionally avoid getting into
> > situations where I have to think about what they might be.****
> >
> > ** **
> >
> > Sam****
> >
> > ** **
> >
> > *From:* chris king [mailto:kingces95 at gmail.com]
> > *Sent:* Thursday, August 11, 2011 2:59 PM
> > *To:* Sam Harwell; antlr-interest at antlr.org
> > *Subject:* Re: ANTLR build process performance improvement****
> >
> > ** **
> >
> > Hey Sam, thanks! I'm using it now and it's faster. Love the VS tool-
> chain!
> > I'd be dead in the water without it. And haven't bumped into any bugs
> > with the new version. So cool! ****
> >
> >  ****
> >
> > I've bummed into a slightly different problem and no matter how I
> > arrange my grammar (included) I can't seem to work around it. I think
> > it might be a bug in the SpecialStateTransition logic...****
> >
> >  ****
> >
> > Below is what I'm trying to parse along with the trace using your
> > enter-exit partial methods (wonderful addition BTW) for my C#
> > preprocessor interleaved with when the tokens get pulled from the
> > stream. Also interleaved are my calls to toggle SkipSection which is
> > trying to keep track of when code is #ifdefed out. So what we see is
> > that [#if] is pulled and then [false] and [\r\n]. At that point in
> the
> > parse I'm at the end of the pre-processor line and so I look up the
> > pp_conditional stack to see if I'm in a #ifdefed out section of code.
> > In this case I am so I set SkipSection to True. That enables (via
> > semantic predicate) my lexer rule
> > PP_SKIPPED_CHARACTERS=45 which should suck up any code that is not a
> > pragma statement (doesn't start with #). That's all well and good and
> > so the next thing that get tokenized is [#pragma warning disable]
> > which is good. ****
> >
> >  ****
> >
> > Now at this point I expect that no tokens should get pulled until I
> > reach the pp_pragma production. I expect this because I figure ANLTER
> > should be able to predict where it needs to go without pulling any
> > more tokens -- after all the only thing that can follow a [#pragma
> > warning disable] token is a list of integers. The actual behavior is
> > that in my pp_condition_section production ANTLR pulls [10] as a
> > PP_SKIPPED_CHARACTERS instead of an INTEGER because SkipSection is
> set
> > to True. If things had gone as expected and [10] had been pulled in
> > the pp_pragma production. If that had happened then SkipSection would
> > have been set to False and [10] would be pulled as an INTEGER. ****
> >
> >  ****
> >
> > The code that's actually pulling the [10] is DFA.Predict in the
> > SpecialStateTransition loop when trying to predict where to go for
> the
> > pp_conditional_section production:****
> >
> > pp_conditional_section****
> >
> > : { !SkipSection }? => input_section****
> >
> > | { SkipSection }? => pp_skipped_section****
> >
> > ;****
> >
> > I'm guessing that this production is "special" because it's got those
> > gated semantic predicates and that's why DFA.Predict enters into the
> > SpecialStateTransition logic. What I don't understand is why it would
> > need to pull any more tokens to know where to go next. Do you think
> > that's a bug that it's pulling tokens in this case?****
> >
> >  ****
> >
> > After writing this e-mail it occurred to me that I might manually try
> > to do the prediction. I did this by putting a break point in the
> > pp_conditional_section rule at the dfa.Predict line. But instead of
> > asking the DFA to do the prediction I just set-ip to the case I
> wanted (e.g. {...}?
> > => pp_skipped_section). Then hit F5. And it works! I included the
> > trace of that run below. Given that I really do think that the
> > SpecialStateTransition logic (or there abouts) is being to aggressive
> > about pulling tokens... what do you think? ****
> >
> >  ****
> >
> > Thanks,****
> >
> > Chris****
> >
> >  ****
> >
> > CSharpAst.Parse("#if false\r\n#pragma warning disable
> > 10\r\n/*foo*/\r\n#endif");****
> >
> >  ****
> >
> > Enter start 1****
> >
> > [@-1,0:2='#if',<38>,1:0]****
> >
> >  Enter input_section 2****
> >
> >   Enter input_section_part 3****
> >
> >    Enter pp_directive 6****
> >
> >     Enter pp_conditional 8****
> >
> >      Enter pp_if_section 9****
> >
> > [@-1,4:8='false',<4>,1:4]****
> >
> >       Enter pp_expression 17****
> >
> >        Enter pp_or_expression 18****
> >
> >         Enter pp_and_expression 19****
> >
> >          Enter pp_equality_expression 20****
> >
> >           Enter pp_unary_expression 21****
> >
> >            Enter pp_primary_expression 22****
> >
> > [@-1,9:10='\\r\\n',<29>,1:9]****
> >
> >            Leave pp_primary_expression 22****
> >
> >           Leave pp_unary_expression 21****
> >
> >          Leave pp_equality_expression 20****
> >
> >         Leave pp_and_expression 19****
> >
> >        Leave pp_or_expression 18****
> >
> >       Leave pp_expression 17****
> >
> >       Enter pp_conditional_block 12****
> >
> >        Enter pp_new_line 31****
> >
> > SkipSection = True****
> >
> > [@-1,11:33='#pragma warning disable',<42>,2:0]****
> >
> >        Leave pp_new_line 31****
> >
> >        Enter pp_conditional_section 13****
> >
> > [@-1,34:36=' 10',<45>,2:23]****
> >
> > [@-1,37:38='\\r\\n',<29>,2:26]****
> >
> >         Enter pp_skipped_section 14****
> >
> >          Enter pp_skipped_section_part 15****
> >
> >           Enter pp_directive 6****
> >
> >            Enter pp_leaf_directive 7****
> >
> >             Enter pp_pragma 29****
> >
> > SkipSection = False****
> >
> >              Enter pp_warning_list 30****
> >
> >  ****
> >
> >  Here is the trace when I make the prediction by hand:****
> >
> >  ****
> >
> > Enter start 1
> > [@-1,0:2='#if',<38>,1:0]
> >  Enter input_section 2
> >   Enter input_section_part 3
> >    Enter pp_directive 6
> >     Enter pp_conditional 8
> >      Enter pp_if_section 9
> > [@-1,4:8='false',<4>,1:4]
> >       Enter pp_expression 17
> >        Enter pp_or_expression 18
> >         Enter pp_and_expression 19
> >          Enter pp_equality_expression 20
> >           Enter pp_unary_expression 21
> >            Enter pp_primary_expression 22 [@-
> 1,9:10='\\r\\n',<29>,1:9]
> >            Leave pp_primary_expression 22
> >           Leave pp_unary_expression 21
> >          Leave pp_equality_expression 20
> >         Leave pp_and_expression 19
> >        Leave pp_or_expression 18
> >       Leave pp_expression 17
> >       Enter pp_conditional_block 12
> >        Enter pp_new_line 31
> > SkipSection = True
> > [@-1,11:33='#pragma warning disable',<42>,2:0]
> >        Leave pp_new_line 31
> >        Enter pp_conditional_section 13
> >         Enter pp_skipped_section 14
> >          Enter pp_skipped_section_part 15
> >           Enter pp_directive 6
> >            Enter pp_leaf_directive 7
> >             Enter pp_pragma 29
> > SkipSection = False
> > [@-1,35:36='10',<28>,2:24]
> >              Enter pp_warning_list 30
> > [@-1,37:38='\\r\\n',<29>,2:26]
> >              Leave pp_warning_list 30
> >              Enter pp_new_line 31
> > SkipSection = True
> > [@-1,39:45='/*foo*/',<45>,3:0]
> >              Leave pp_new_line 31
> >             Leave pp_pragma 29
> >            Leave pp_leaf_directive 7
> >           Leave pp_directive 6
> >          Leave pp_skipped_section_part 15
> >          Enter pp_skipped_section_part 15
> > [@-1,46:47='\\r\\n',<29>,3:7] [@-1,48:53='#endif',<35>,4:0]
> >          Leave pp_skipped_section_part 15
> >         Leave pp_skipped_section 14
> >        Leave pp_conditional_section 13
> >       Leave pp_conditional_block 12
> >      Leave pp_if_section 9
> >      Enter pp_endif 16
> >       Enter pp_new_line 31
> > SkipSection = True
> >       Leave pp_new_line 31
> >      Leave pp_endif 16
> >     Leave pp_conditional 8
> >    Leave pp_directive 6
> >   Leave input_section_part 3
> >  Leave input_section 2
> > Leave start 1****
> >
> >  ****
> >
> >  ****
> >
> > On Thu, Aug 11, 2011 at 9:31 AM, Sam Harwell
> > <sharwell at pixelminegames.com>
> > wrote:****
> >
> >  ****
> >
> > Hi "brave testers" :)****
> >
> >  ****
> >
> > I updated the MSBuild integration for the CSharp3 target to
> > significantly improve its performance in several areas. I haven't
> > tested the update to see if it fixes the issues with ReSharper's
> > IntelliSense engine, but it sure would be sweet if it did!****
> >
> >  ****
> >
> > 1.       Time to compile grammars should be reduced by 1-2 seconds
> per
> > project containing grammars.****
> >
> > 2.       The "lag" in the IDE when you change windows away from a
> modified
> > grammar file and when you save a grammar file should be reduced by 1-
> 2
> > seconds each time.****
> >
> > 3.       When you open a project IntelliSense will be ready
> immediately as
> > opposed to waiting until you save a grammar or build the project.****
> >
> > 4.       When you add or remove a file from the project, IntelliSense
> > won't break.****
> >
> >  ****
> >
> > If you'd like to test out the new tool, it's available in the
> > following 7z file. Simply close Visual Studio and replace your
> > existing Antlr3.targets and AntlrBuildTask.dll with the ones from
> this
> > archive and you're ready to
> > go.****
> >
> >  ****
> >
> >
> > http://www.tunnelvisionlabs.com/downloads/antlr/AntlrBuildTask-
> experim
> > ental-9029.7z
> > ****
> >
> >  ****
> >
> > Thanks,****
> >
> > Sam****
> >
> >  ****
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address