[antlr-interest] ANTLR build process performance improvement

Thu Aug 11 14:03:15 PDT 2011

ANTLR is designed so with the lexer and parser as independent units. Your
lexer should be able to run to completion (the entire document) before the
parser is even created.

There are ways around it, but I intentionally avoid getting into situations
where I have to think about what they might be.

Sam

From: chris king [mailto:kingces95 at gmail.com] 
Sent: Thursday, August 11, 2011 2:59 PM
To: Sam Harwell; antlr-interest at antlr.org
Subject: Re: ANTLR build process performance improvement

Hey Sam, thanks! I'm using it now and it's faster. Love the VS tool-chain!
I'd be dead in the water without it. And haven't bumped into any bugs with
the new version. So cool! 

I've bummed into a slightly different problem and no matter how I arrange my
grammar (included) I can't seem to work around it. I think it might be a bug
in the SpecialStateTransition logic...

Below is what I'm trying to parse along with the trace using your enter-exit
partial methods (wonderful addition BTW) for my C# preprocessor interleaved
with when the tokens get pulled from the stream. Also interleaved are my
calls to toggle SkipSection which is trying to keep track of when code is
#ifdefed out. So what we see is that [#if] is pulled and then [false] and
[\r\n]. At that point in the parse I'm at the end of the pre-processor line
and so I look up the pp_conditional stack to see if I'm in a #ifdefed out
section of code. In this case I am so I set SkipSection to True. That
enables (via semantic predicate) my lexer rule PP_SKIPPED_CHARACTERS=45
which should suck up any code that is not a pragma statement (doesn't start
with #). That's all well and good and so the next thing that get tokenized
is [#pragma warning disable] which is good. 

Now at this point I expect that no tokens should get pulled until I reach
the pp_pragma production. I expect this because I figure ANLTER should be
able to predict where it needs to go without pulling any more tokens --
after all the only thing that can follow a [#pragma warning disable] token
is a list of integers. The actual behavior is that in my
pp_condition_section production ANTLR pulls [10] as a PP_SKIPPED_CHARACTERS
instead of an INTEGER because SkipSection is set to True. If things had gone
as expected and [10] had been pulled in the pp_pragma production. If that
had happened then SkipSection would have been set to False and [10] would be
pulled as an INTEGER. 

The code that's actually pulling the [10] is DFA.Predict in the
SpecialStateTransition loop when trying to predict where to go for the
pp_conditional_section production:

pp_conditional_section

: { !SkipSection }? => input_section

| { SkipSection }? => pp_skipped_section

;

I'm guessing that this production is "special" because it's got those gated
semantic predicates and that's why DFA.Predict enters into the
SpecialStateTransition logic. What I don't understand is why it would need
to pull any more tokens to know where to go next. Do you think that's a bug
that it's pulling tokens in this case?

After writing this e-mail it occurred to me that I might manually try to do
the prediction. I did this by putting a break point in the
pp_conditional_section rule at the dfa.Predict line. But instead of asking
the DFA to do the prediction I just set-ip to the case I wanted (e.g. {...}?
=> pp_skipped_section). Then hit F5. And it works! I included the trace of
that run below. Given that I really do think that the SpecialStateTransition
logic (or there abouts) is being to aggressive about pulling tokens... what
do you think? 

Thanks,

Chris

CSharpAst.Parse("#if false\r\n#pragma warning disable
10\r\n/*foo*/\r\n#endif");

Enter start 1

[@-1,0:2='#if',<38>,1:0]

 Enter input_section 2

  Enter input_section_part 3

   Enter pp_directive 6

    Enter pp_conditional 8

     Enter pp_if_section 9

[@-1,4:8='false',<4>,1:4]

      Enter pp_expression 17

       Enter pp_or_expression 18

        Enter pp_and_expression 19

         Enter pp_equality_expression 20

          Enter pp_unary_expression 21

           Enter pp_primary_expression 22

[@-1,9:10='\\r\\n',<29>,1:9]

           Leave pp_primary_expression 22

          Leave pp_unary_expression 21

         Leave pp_equality_expression 20

        Leave pp_and_expression 19

       Leave pp_or_expression 18

      Leave pp_expression 17

      Enter pp_conditional_block 12

       Enter pp_new_line 31

SkipSection = True

[@-1,11:33='#pragma warning disable',<42>,2:0]

       Leave pp_new_line 31

       Enter pp_conditional_section 13

[@-1,34:36=' 10',<45>,2:23]

[@-1,37:38='\\r\\n',<29>,2:26]

        Enter pp_skipped_section 14

         Enter pp_skipped_section_part 15

          Enter pp_directive 6

           Enter pp_leaf_directive 7

            Enter pp_pragma 29

SkipSection = False

             Enter pp_warning_list 30

 Here is the trace when I make the prediction by hand:

Enter start 1
[@-1,0:2='#if',<38>,1:0]
 Enter input_section 2
  Enter input_section_part 3
   Enter pp_directive 6
    Enter pp_conditional 8
     Enter pp_if_section 9
[@-1,4:8='false',<4>,1:4]
      Enter pp_expression 17
       Enter pp_or_expression 18
        Enter pp_and_expression 19
         Enter pp_equality_expression 20
          Enter pp_unary_expression 21
           Enter pp_primary_expression 22
[@-1,9:10='\\r\\n',<29>,1:9]
           Leave pp_primary_expression 22
          Leave pp_unary_expression 21
         Leave pp_equality_expression 20
        Leave pp_and_expression 19
       Leave pp_or_expression 18
      Leave pp_expression 17
      Enter pp_conditional_block 12
       Enter pp_new_line 31
SkipSection = True
[@-1,11:33='#pragma warning disable',<42>,2:0]
       Leave pp_new_line 31
       Enter pp_conditional_section 13
        Enter pp_skipped_section 14
         Enter pp_skipped_section_part 15
          Enter pp_directive 6
           Enter pp_leaf_directive 7
            Enter pp_pragma 29
SkipSection = False
[@-1,35:36='10',<28>,2:24]
             Enter pp_warning_list 30
[@-1,37:38='\\r\\n',<29>,2:26]
             Leave pp_warning_list 30
             Enter pp_new_line 31
SkipSection = True
[@-1,39:45='/*foo*/',<45>,3:0]
             Leave pp_new_line 31
            Leave pp_pragma 29
           Leave pp_leaf_directive 7
          Leave pp_directive 6
         Leave pp_skipped_section_part 15
         Enter pp_skipped_section_part 15
[@-1,46:47='\\r\\n',<29>,3:7]
[@-1,48:53='#endif',<35>,4:0]
         Leave pp_skipped_section_part 15
        Leave pp_skipped_section 14
       Leave pp_conditional_section 13
      Leave pp_conditional_block 12
     Leave pp_if_section 9
     Enter pp_endif 16
      Enter pp_new_line 31
SkipSection = True
      Leave pp_new_line 31
     Leave pp_endif 16
    Leave pp_conditional 8
   Leave pp_directive 6
  Leave input_section_part 3
 Leave input_section 2
Leave start 1

On Thu, Aug 11, 2011 at 9:31 AM, Sam Harwell <sharwell at pixelminegames.com>
wrote:

Hi "brave testers" :)

I updated the MSBuild integration for the CSharp3 target to significantly
improve its performance in several areas. I haven't tested the update to see
if it fixes the issues with ReSharper's IntelliSense engine, but it sure
would be sweet if it did!

1.       Time to compile grammars should be reduced by 1-2 seconds per
project containing grammars.

2.       The "lag" in the IDE when you change windows away from a modified
grammar file and when you save a grammar file should be reduced by 1-2
seconds each time.

3.       When you open a project IntelliSense will be ready immediately as
opposed to waiting until you save a grammar or build the project.

4.       When you add or remove a file from the project, IntelliSense won't
break.

If you'd like to test out the new tool, it's available in the following 7z
file. Simply close Visual Studio and replace your existing Antlr3.targets
and AntlrBuildTask.dll with the ones from this archive and you're ready to
go.

http://www.tunnelvisionlabs.com/downloads/antlr/AntlrBuildTask-experimental-
9029.7z

Thanks,

Sam