[antlr-interest] Broken examples? (Re simple converter/translator)

Fri Apr 11 08:52:48 PDT 2008

Actually, I found filter/fuzzy-parsing/stream filter examples....

However, some of the details are a bit rough.

This example from:
http://www.antlr.org/wiki/display/ANTLR3/Lexical+filters :

lexer grammar FuzzyJava;
options {filter=true;}

FIELD
    :   TYPE WS name=ID '[]'? WS? (';'|'=')
        {System.out.println("found var "+$name.text);}
    ;

fragment
QID :   ID ('.' ID)*
        ;

fragment
ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
    ;

WS  :   (' '|'\t'|'\n')+
    ;

The above has a few problems:
* Missing 'TYPE' rule

* ANTLRWorks fails with this:
[01:44:57] \tmp\antlrworks\__Test__.java:12: <identifier> expected
[01:44:57]  g = new (tokens, 49153);
[01:44:57]         ^
[01:44:57] 1 error

Does ANTLRWorks not like 'lexer grammars' or filters or something?

Also getting stuff in console like:
[01:51:05] error(100):
C:\SAFE\svndev\vuitools\gsl2grxml\FilterTest.g:0:0: syntax error:
codegen: <AST>:0:0: unexpected end of subtree
[01:51:05] error(10):  internal error:
org.antlr.tool.Message.toString(Message.java:124): Assertion failed!
Message ID 100 created but is not present in errorMsgIDs or
warningMsgIDs. 
 - Have I installed wrong or something?

> _____________________________________________ 
> From: 	Peter Nann  
> Sent:	Saturday, 12 April 2008 12:54 AM
> To:	'antlr-interest at antlr.org'
> Subject:	Want to write a fairly simple syntax converter...
> 
> 
> I am new to all this language parsing, and I am struggling to
> understand 'how much I need to understand' (to use a Rumsfeld'ism)
> 
> If I just want to write a fairly simple converter, and keep whitespace
> fairly intact, how 'dirty' do I have to get my hands, language parsing
> and code wise?
> 
> To clarify, I want to convert a proprietary format into equivalent
> XML.
> Something like "x [ a b c ]"   ->  "<rule name=x> <one-of> a b c
> </one-of> </rule>"
> (But obviously, it gets a little more complicated than that)
> 
> My 2 biggest questions:
> 1) Do I need to worry about 'building trees', accessing the AST or
> anything like that? Or are the 'snippets' of code you can put in the
> grammar rules going to get me by?
> 2) Is maintaining whitespace easily do-able? It seems to get gobbled
> up with little opportunity to keep it intact. It seems I could maybe
> tokenize it explicitly as meaningful input, and then be able to simply
> re-constitute it in the output, or is that just crazy talk and will
> complicate my grammars too much (with 'WS?' sprinkled everywhere...)
> 
> ... Just trying to get a good idea of what I am in for...
> 
> Thanks for any replies!
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080412/0de37d3f/attachment.html