[antlr-interest] Recognizing Indentation as blocks

Daniels, Troy (US SSA) troy.daniels at baesystems.com
Wed Mar 26 13:21:02 PDT 2008


Not having the book, I can't look at the grammar.  But I'd guess you'd want something like:

CHANGE_INDENTATION:  EOL ws+=WHITE_SPACE*
  {
     if (sizeOf(ws) > previousWhiteSpace)
       emit(INDENT); 
     else if (sizeOf(ws) < previousWhiteSpace) 
       emit(DEDENT);
     previousWhiteSpace = sizeOf(ws);
  }
     
Basically, when you find the end of line character, you want to look at the whitespace after it, and emit the appropriate token if it's changed.  Since WHITE_SPACE has a * after it, this matches even when there is no white space.  Since it starts with an EOL, you don't need to worry about false triggers in the middle of a line like just WHITE_SPACE* would.

I'm not familiar with the API for emitting tokens, so the details of the above code are almost certainly wrong, but the general concept should be right.

Troy

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org 
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Sven Busse
> Sent: Wednesday, March 26, 2008 3:57 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Recognizing Indentation as blocks
> 
> uhm, has anybody an idea?
> 
> thanks
> Sven
> 
> ----------
> 
> Von: Sven Busse [mailto:mail at ghost23.de]
> Gesendet: Montag, 24. März 2008 11:56
> An: antlr-interest at antlr.org
> Betreff: [antlr-interest] Recognizing Indentation as blocks
> 
> Hi,
> 
> i am currently reading Terrence's book. I am currently at the 
> chapter "Emitting more than one token per Lexer rule". He 
> gives an example from
> python:
> 
> if foo:
> 	print "foo is true"
> 	f()
> g()
> 
> He then discusses an exemplary INDENT lexer rule, which i am 
> trying to understand.
> 
> His INDENT rule aims to match Whitespace and Tabs if they 
> start at the beginning of the line. If the indentation is 
> bigger than in a previous line, an imaginary INDENT token is 
> emitted. If it is smaller than in the previous line, one or 
> multiple DEDENT token are emitted.
> 
> Now my question is, would this actually work with an example 
> like the little python script? Because the line with "g()" 
> has actually no whitespace at all, so i would assume there 
> would be no match and thus the logic of emitting DEDENT would 
> not even be invoked.
> 
> Is this correct or am i missing something? I am referring to 
> the book "The defintive ANTRL Reference", page 95.
> 
> Thank you
> Sven
> 
> 
> 
> 


More information about the antlr-interest mailing list