[antlr-interest] Recognizing Indentation as blocks
Daniels, Troy (US SSA)
troy.daniels at baesystems.com
Thu Mar 27 14:29:30 PDT 2008
> -----Original Message-----
> From: Sven Busse [mailto:mail at ghost23.de]
> Sent: Thursday, March 27, 2008 2:05 PM
> To: Daniels, Troy (US SSA); antlr-interest at antlr.org
> Subject: AW: [antlr-interest] Recognizing Indentation as blocks
>
> Hi,
>
> ok, thanks, that helps. So EOL is actually a Newline character, right?
EOL is whatever is the appropriate End Of Line expression for your language/operating system. It may also include carriage returns.
Troy
>
> Cheers
> Sven
>
> -----Ursprüngliche Nachricht-----
> Von: Daniels, Troy (US SSA) [mailto:troy.daniels at baesystems.com]
> Gesendet: Mittwoch, 26. März 2008 21:21
> An: Sven Busse; antlr-interest at antlr.org
> Betreff: RE: [antlr-interest] Recognizing Indentation as blocks
>
> Not having the book, I can't look at the grammar. But I'd
> guess you'd want something like:
>
> CHANGE_INDENTATION: EOL ws+=WHITE_SPACE*
> {
> if (sizeOf(ws) > previousWhiteSpace)
> emit(INDENT);
> else if (sizeOf(ws) < previousWhiteSpace)
> emit(DEDENT);
> previousWhiteSpace = sizeOf(ws);
> }
>
> Basically, when you find the end of line character, you want
> to look at the whitespace after it, and emit the appropriate
> token if it's changed. Since WHITE_SPACE has a * after it,
> this matches even when there is no white space. Since it
> starts with an EOL, you don't need to worry about false
> triggers in the middle of a line like just WHITE_SPACE* would.
>
> I'm not familiar with the API for emitting tokens, so the
> details of the above code are almost certainly wrong, but the
> general concept should be right.
>
> Troy
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org
> > [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Sven Busse
> > Sent: Wednesday, March 26, 2008 3:57 PM
> > To: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Recognizing Indentation as blocks
> >
> > uhm, has anybody an idea?
> >
> > thanks
> > Sven
> >
> > ----------
> >
> > Von: Sven Busse [mailto:mail at ghost23.de]
> > Gesendet: Montag, 24. März 2008 11:56
> > An: antlr-interest at antlr.org
> > Betreff: [antlr-interest] Recognizing Indentation as blocks
> >
> > Hi,
> >
> > i am currently reading Terrence's book. I am currently at
> the chapter
> > "Emitting more than one token per Lexer rule". He gives an example
> > from
> > python:
> >
> > if foo:
> > print "foo is true"
> > f()
> > g()
> >
> > He then discusses an exemplary INDENT lexer rule, which i
> am trying to
> > understand.
> >
> > His INDENT rule aims to match Whitespace and Tabs if they
> start at the
> > beginning of the line. If the indentation is bigger than in
> a previous
> > line, an imaginary INDENT token is emitted. If it is
> smaller than in
> > the previous line, one or multiple DEDENT token are emitted.
> >
> > Now my question is, would this actually work with an
> example like the
> > little python script? Because the line with "g()"
> > has actually no whitespace at all, so i would assume there
> would be no
> > match and thus the logic of emitting DEDENT would not even
> be invoked.
> >
> > Is this correct or am i missing something? I am referring
> to the book
> > "The defintive ANTRL Reference", page 95.
> >
> > Thank you
> > Sven
> >
> >
> >
> >
>
>
More information about the antlr-interest
mailing list