[antlr-interest] Recognizing Indentation as blocks

Thu Mar 27 14:29:30 PDT 2008

> -----Original Message-----
> From: Sven Busse [mailto:mail at ghost23.de] 
> Sent: Thursday, March 27, 2008 2:05 PM
> To: Daniels, Troy (US SSA); antlr-interest at antlr.org
> Subject: AW: [antlr-interest] Recognizing Indentation as blocks
> 
> Hi,
> 
> ok, thanks, that helps. So EOL is actually a Newline character, right?

EOL is whatever is the appropriate End Of Line expression for your language/operating system.  It may also include carriage returns.

Troy

> 
> Cheers
> Sven
> 
> -----Ursprüngliche Nachricht-----
> Von: Daniels, Troy (US SSA) [mailto:troy.daniels at baesystems.com]
> Gesendet: Mittwoch, 26. März 2008 21:21
> An: Sven Busse; antlr-interest at antlr.org
> Betreff: RE: [antlr-interest] Recognizing Indentation as blocks
> 
> Not having the book, I can't look at the grammar.  But I'd 
> guess you'd want something like:
> 
> CHANGE_INDENTATION:  EOL ws+=WHITE_SPACE*
>   {
>      if (sizeOf(ws) > previousWhiteSpace)
>        emit(INDENT); 
>      else if (sizeOf(ws) < previousWhiteSpace) 
>        emit(DEDENT);
>      previousWhiteSpace = sizeOf(ws);
>   }
>      
> Basically, when you find the end of line character, you want 
> to look at the whitespace after it, and emit the appropriate 
> token if it's changed.  Since WHITE_SPACE has a * after it, 
> this matches even when there is no white space.  Since it 
> starts with an EOL, you don't need to worry about false 
> triggers in the middle of a line like just WHITE_SPACE* would.
> 
> I'm not familiar with the API for emitting tokens, so the 
> details of the above code are almost certainly wrong, but the 
> general concept should be right.
> 
> Troy
> 
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org 
> > [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Sven Busse
> > Sent: Wednesday, March 26, 2008 3:57 PM
> > To: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Recognizing Indentation as blocks
> > 
> > uhm, has anybody an idea?
> > 
> > thanks
> > Sven
> > 
> > ----------
> > 
> > Von: Sven Busse [mailto:mail at ghost23.de]
> > Gesendet: Montag, 24. März 2008 11:56
> > An: antlr-interest at antlr.org
> > Betreff: [antlr-interest] Recognizing Indentation as blocks
> > 
> > Hi,
> > 
> > i am currently reading Terrence's book. I am currently at 
> the chapter 
> > "Emitting more than one token per Lexer rule". He gives an example 
> > from
> > python:
> > 
> > if foo:
> > 	print "foo is true"
> > 	f()
> > g()
> > 
> > He then discusses an exemplary INDENT lexer rule, which i 
> am trying to 
> > understand.
> > 
> > His INDENT rule aims to match Whitespace and Tabs if they 
> start at the 
> > beginning of the line. If the indentation is bigger than in 
> a previous 
> > line, an imaginary INDENT token is emitted. If it is 
> smaller than in 
> > the previous line, one or multiple DEDENT token are emitted.
> > 
> > Now my question is, would this actually work with an 
> example like the 
> > little python script? Because the line with "g()"
> > has actually no whitespace at all, so i would assume there 
> would be no 
> > match and thus the logic of emitting DEDENT would not even 
> be invoked.
> > 
> > Is this correct or am i missing something? I am referring 
> to the book 
> > "The defintive ANTRL Reference", page 95.
> > 
> > Thank you
> > Sven
> > 
> > 
> > 
> > 
> 
>