[antlr-interest] Quick intro to Python backend

Martijn Reuvers martijn.reuvers at gmail.com
Mon Sep 7 11:46:22 PDT 2009


Hi Ben,

I can't help you with the Python part, but below is a quick & dirty
grammar for Java which more or less does do what you want.

Most likely your errors will come from the lexer, hence I throw the
exception there to show you. So in your python program catching that
should tell you it was wrong. The rest of the grammar fills a list
with text or numbers that are parsed, and prints it in the @after of
begin rule (which is only there so a quick printout was possible).

Martijn


grammar test001;

@lexer::members {

  @Override
	public void reportError(RecognitionException e) {
		throw new RuntimeException(e);
	}

}

@members {

	java.util.List result = new java.util.ArrayList();

	public List getResult() {
		return result;
	}
}

begin
@after {
	System.out.println("RESULT=" + result);
}
	:	expr
	;


expr
	:	what (DOT expr)?
	;
	
what
	:	TEXT
		{
			if($TEXT.text != null) {
				result.add($TEXT.text);
			}		
		}
	| NUMBER
		{
			if($NUMBER.text != null) {
				result.add($NUMBER.text);
			}
		}
	;	

DOT
	:	'.'
	;

TEXT
	: 'A'..'Z'+
  ;

NUMBER
	:	'0'..'9'+
	;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;


On Mon, Sep 7, 2009 at 5:18 PM, Ben<misc7 at emerose.org> wrote:
> Hi all, I'm trying to use the Python backend.  I wanted to start with something simple and I didn't see an example like this either in Parr's book or on the wiki.
>
> So, suppose I just want to recognize a string with (uppercase) letters and numbers separated by a dot.  The parse function needs to return the letters and numbers separately, or else the value None if the expression doesn't parse.  Below are how it might be done with regular expressions, and how I've tried to do it with ANTLR3 using the AST generator.
>
> ============== Regular expression example
> import re
>
> def parse(s):
>    """Parse string s and return pair [letters, numbers]"""
>    m = re.match("([A-Z]+)\\.([0-9]+)$", s)
>    return (m.group(1), m.group(2)) if m else None
>
> ============== My ANTLR attempt: Example.g
> grammar Example;
>
> options {
>        language=Python;
>        output=AST;
>        ASTLabelType=CommonTree;
> }
>
> @lexer::members {
> def reportError(self, e):
>   raise e
> }
> @members {
> def mismatch(self, input, ttype, follow):
>    raise MismatchedTokenException(ttype, input)
>
> def recoverFromMismatchedSet(self, input, e, follow):
>    raise e
> }
> @rulecatch {
> except RecognitionException, e:
>    raise
> }
>
> expr    :       LETTERS '.' NUMBERS EOF -> LETTERS NUMBERS;
>
> LETTERS :       'A'..'Z'+ ;
>
> NUMBERS :       '0'..'9'+;
>
> ============== My ANTLR attempt: example.py
>
> import StringIO
> import antlr3
> import ExampleLexer, ExampleParser
>
> def parse(s):
>    """Parse string s and return pair [letters, numbers]"""
>    stringio = StringIO.StringIO(s)
>    char_stream = antlr3.ANTLRInputStream(stringio)
>    lexer = ExampleLexer.ExampleLexer(char_stream)
>    tokens = antlr3.CommonTokenStream(lexer)
>    parser = ExampleParser.ExampleParser(tokens)
>    try: expr = parser.expr()
>    except antlr3.RecognitionException: return None
>    return tuple(child.text for child in expr.tree.getChildren())
>
> ============== End code
>
> So first, do I have the right basic idea here?  It seems to be getting pretty complicated and already into some undocumented stuff for a simple example.
>
> Second, this example still doesn't really work, because it still matches strings like "3452EOUSNTHO.32423AOE" that it shouldn't.  Also, this solution is hard to package because the modules in the antlr3 runtime refer to other antlr3 modules like "antlr3.streams" instead of just "streams" so I have to start messing with python's module path.  How can I fix this?
>
> Thanks for any advise,
> --
> Ben
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list