[antlr-interest] Parsing a indention specific language like Python.

Jim Idle jimi at temporal-wave.com
Tue Sep 9 15:15:34 PDT 2008


On Tue, 2008-09-09 at 16:40 -0400, Jarrod Roberson wrote:
> I am working on a domain specific language and want to make indention
> significant similar to python's syntax.


If it is your language, then the answer is "Don't do that!"

> But very very very very simple compared to python. ( i.e. no line
> continuation characters )
> 
> I have done some research on Google and just find resource talking
> about how difficult it is to do.
> Is this something that is really possible with ANTLR v3.x?


Yes, it is possible, but you need to keep track of indents and outdents
for the lexer - probably best to override the token stream to do that.

> Or should I just resign myself to using block identifiers?


If you mean "Wisely chose formal terminals to identify the statement
block", then yes you should ;-) This is my opinion of course, but the
complexities make programs error prone. When choosing a syntax, don't
think about the language in perfect form, think "How can a parser
identify the maximum possible set of error conditions, give good error
messages, and recover". Any construct where a line of code is
syntactically and semantically valid whether at column 8 or 16, but has
different actual meaning in terms of logical flow, is basically broken.
A parser just has to assume you know what you mean. Combine those kind
of concerns with specifying a language that is easy to formulate for
your target audience and you are good. 

Now if you means something as simple as leading white space of any depth
is significant, otherwise it isn't then:

fragment
LINEIN : ;

WS 
@init
{
   int sPos = getCharIndex();
}
: (' ' | '\t')+ { if (sPos == 0) $type = LINEIN; }
;

line : configSegment* ;

configSegment
    : ID // Segment ID
        (LINEIN configValue?)*
    ;

Might help you get going.

> Any insight is appreciated.


I thought that there was a sample Python grammar for 3.1... yes:

http://www.antlr.org/grammar/1200715779785/Python.g



> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080909/c32e738e/attachment.html 


More information about the antlr-interest mailing list