[antlr-interest] Quick intro to Python backend

Ben misc7 at emerose.org
Mon Sep 7 08:18:07 PDT 2009


Hi all, I'm trying to use the Python backend.  I wanted to start with something simple and I didn't see an example like this either in Parr's book or on the wiki.

So, suppose I just want to recognize a string with (uppercase) letters and numbers separated by a dot.  The parse function needs to return the letters and numbers separately, or else the value None if the expression doesn't parse.  Below are how it might be done with regular expressions, and how I've tried to do it with ANTLR3 using the AST generator.

============== Regular expression example
import re

def parse(s):
    """Parse string s and return pair [letters, numbers]"""
    m = re.match("([A-Z]+)\\.([0-9]+)$", s)
    return (m.group(1), m.group(2)) if m else None

============== My ANTLR attempt: Example.g
grammar Example;

options {
	language=Python;
	output=AST;
	ASTLabelType=CommonTree;
}

@lexer::members {
def reportError(self, e):
   raise e
}
@members {
def mismatch(self, input, ttype, follow):
    raise MismatchedTokenException(ttype, input)

def recoverFromMismatchedSet(self, input, e, follow):
    raise e
}
@rulecatch {
except RecognitionException, e:
    raise
}

expr	:	LETTERS '.' NUMBERS EOF -> LETTERS NUMBERS;

LETTERS	:	'A'..'Z'+ ;

NUMBERS	:	'0'..'9'+;

============== My ANTLR attempt: example.py

import StringIO
import antlr3
import ExampleLexer, ExampleParser

def parse(s):
    """Parse string s and return pair [letters, numbers]"""
    stringio = StringIO.StringIO(s)
    char_stream = antlr3.ANTLRInputStream(stringio)
    lexer = ExampleLexer.ExampleLexer(char_stream)
    tokens = antlr3.CommonTokenStream(lexer)
    parser = ExampleParser.ExampleParser(tokens)
    try: expr = parser.expr()
    except antlr3.RecognitionException: return None
    return tuple(child.text for child in expr.tree.getChildren())

============== End code

So first, do I have the right basic idea here?  It seems to be getting pretty complicated and already into some undocumented stuff for a simple example.

Second, this example still doesn't really work, because it still matches strings like "3452EOUSNTHO.32423AOE" that it shouldn't.  Also, this solution is hard to package because the modules in the antlr3 runtime refer to other antlr3 modules like "antlr3.streams" instead of just "streams" so I have to start messing with python's module path.  How can I fix this?

Thanks for any advise,
-- 
Ben



More information about the antlr-interest mailing list