[antlr-interest] GUnit Tree Parsing

Andrew Mains andrew.mains at oracle.com
Tue Aug 7 13:07:35 PDT 2012


Hi all,

Let me know if this isn't the right forum for this, but I'm wondering if 
there would be any interest in creating a "tree parsing mode" for GUnit. 
Currently (afaik), one can only test tree grammars by entering the raw 
text for the source one wants to parse, running the lexer/parser on that 
to generate
the tree, and then running your tree parser on that generated tree. This 
works fine (and is great for more end-to-end style testing), but is less 
good for unit testing.

The problems as I see them are these:

1. Bugs in your lexer/parser can cause tests to fail in the tree grammar 
unit tests.

2. It's oftentimes easier (for me at least) to specify a test in terms 
of the tree structure directly, rather
than deducing what source code will give me the sort of tree structure I 
want.
     2. a. Trees are (usually) more concise than the source they come from.

My idea to fix both of these problems is this: allow test cases which 
specify tree structures as input, using the same syntax as ANTLR's 
rewrite rules. Test cases would look like this:

my_tree_grammar_rule: ^(SOME_CONSTRUCT VAL1 VAL2 (SOME_NESTED_THING 
...)) ->
                                         (ALTERED_CONSTRUCT ....);

I have implemented something that does basically this for myself, but 
I'd love to integrate it fully with GUnit (I'm currently using it in 
JUnit tests).

Is there already development on something like this (maybe for v4?), and 
if not, would anyone be interested in seeing something like this happen? 
Feel free to email me with any questions, suggestions or concerns (and 
again, let me know if this is the wrong forum for this sort of question).

Thanks!

Andrew

On 08/07/2012 12:00 PM, antlr-interest-request at antlr.org wrote:
> Send antlr-interest mailing list submissions to
> 	antlr-interest at antlr.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://www.antlr.org/mailman/listinfo/antlr-interest
> or, via email, send a message with subject or body 'help' to
> 	antlr-interest-request at antlr.org
>
> You can reach the person managing the list at
> 	antlr-interest-owner at antlr.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of antlr-interest digest..."
>
>
> Today's Topics:
>
>     1. Lexer consumes input but doesn't emit all tokens (Glenn McGregor)
>     2. Class instead of grammar and feeding grammars (Oana Ureche)
>     3. Re: Class instead of grammar and feeding grammars (Loring Craymer)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 06 Aug 2012 13:22:11 -0700
> From: Glenn McGregor <glenn at fenris.net>
> Subject: [antlr-interest] Lexer consumes input but doesn't emit all
> 	tokens
> To: antlr-interest at antlr.org
> Message-ID: <50202773.5090206 at fenris.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Given the partial grammar from a much larger...
>
>
> tokens { COLON = ':' }
>
> fragment
> ALPHA_NUM
>       :    'A'..'Z' | 'a'..'z' | '0'..'9';
>
> NAME_LITERAL
>       :    '\\'? ALPHA_NUM ( ( ':' | '_' | '-' | ALPHA_NUM )* ALPHA_NUM )? ;
>
> ANY    :    . ;
>
>
>
> I would like the input
>
> test:ack:
>
> to arrive as two tokens, a NAME_LITERAL of 'test:ack', and a COLON.
>
> Instead, this input disappears entirely, but parses successfully.
>
> Any suggestions?
>
> Glenn McGregor
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 6 Aug 2012 23:03:07 -0700 (PDT)
> From: Oana Ureche <oana_ureche2000 at yahoo.com>
> Subject: [antlr-interest] Class instead of grammar and feeding
> 	grammars
> To: "antlr-interest at antlr.org" <antlr-interest at antlr.org>
> Message-ID:
> 	<1344319387.84361.YahooMailNeo at web125306.mail.ne1.yahoo.com>
> Content-Type: text/plain; charset=iso-8859-1
>
> Hi all,
>
> Could I please ask your clarification on two matters? I am a newbie so sorry in advance.
>
> I am trying to convert source code to XML which I have been successful with PHP, but I think it was by luck, because I can't seem to do it for Java.?
> While trying to, I have encountered two things...
>
> I have noticed that some grammars start with class instead of grammar, such as:
> class JavaRecognizer extends Parser;
> options { ...
> } and thus I am having errors when trying to generate the Java file:
>
> C:\>java -cp antlrworks-1.4.3.jar org.antlr.Tool java15.g
> error(10): ?internal error: ?: java.lang.Error: Error parsing java15.g: 'class'
> not expected 'grammar'
>
> So I am curious what is the difference and how am I going to get the Java generated file?
>
> Second, I have noticed that some Java grammars have two files, such as: Java.g and JavaParserTree.g and I have ran into the concept of ?"a tree grammar that can be fed?with the Java.g grammer's output." How do I "feed" the grammar exactly? Do I first generate the Java files and do the feeding in the file that processes the output? For example,?
>
> CommonTokenStream tokens = new CommonTokenStream(lexer); //lexer generated from Java.g
> JavaTreeParser parser = new JavaTreeParser(tokens);?
>
> Thank you very much in advance,
> oanure
>
> ------------------------------
>
> Message: 3
> Date: Tue, 7 Aug 2012 01:04:52 -0700 (PDT)
> From: Loring Craymer <lgcraymer at yahoo.com>
> Subject: Re: [antlr-interest] Class instead of grammar and feeding
> 	grammars
> To: Oana Ureche <oana_ureche2000 at yahoo.com>,
> 	"antlr-interest at antlr.org" <antlr-interest at antlr.org>
> Message-ID:
> 	<1344326692.38633.YahooMailNeo at web125501.mail.ne1.yahoo.com>
> Content-Type: text/plain; charset=iso-8859-1
>
> You have both ANTLR 2 and ANTLR 3 grammars: ?"class" is used for ANTLR 2 grammars, and "grammar" for ANTLR 3. ?ANTLR 2 and ANTLR 3 are different tools and have different libraries.
>
> --Loring
>
>
>
>> ________________________________
>> From: Oana Ureche <oana_ureche2000 at yahoo.com>
>> To: "antlr-interest at antlr.org" <antlr-interest at antlr.org>
>> Sent: Monday, August 6, 2012 11:03 PM
>> Subject: [antlr-interest] Class instead of grammar and feeding grammars
>>
>> Hi all,
>>
>> Could I please ask your clarification on two matters? I am a newbie so sorry in advance.
>>
>> I am trying to convert source code to XML which I have been successful with PHP, but I think it was by luck, because I can't seem to do it for Java.?
>> While trying to, I have encountered two things...
>>
>> I have noticed that some grammars start with class instead of grammar, such as:
>> class JavaRecognizer extends Parser;
>> options { ...
>> } and thus I am having errors when trying to generate the Java file:
>>
>> C:\>java -cp antlrworks-1.4.3.jar org.antlr.Tool java15.g
>> error(10): ?internal error: ?: java.lang.Error: Error parsing java15.g: 'class'
>> not expected 'grammar'
>>
>> So I am curious what is the difference and how am I going to get the Java generated file?
>>
>> Second, I have noticed that some Java grammars have two files, such as: Java.g and JavaParserTree.g and I have ran into the concept of ?"a tree grammar that can be fed?with the Java.g grammer's output." How do I "feed" the grammar exactly? Do I first generate the Java files and do the feeding in the file that processes the output? For example,?
>>
>> CommonTokenStream tokens = new CommonTokenStream(lexer); //lexer generated from Java.g
>> JavaTreeParser parser = new JavaTreeParser(tokens);?
>>
>> Thank you very much in advance,
>> oanure
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>>
> ------------------------------
>
> _______________________________________________
> antlr-interest mailing list
> antlr-interest at antlr.org
> http://www.antlr.org/mailman/listinfo/antlr-interest
>
> End of antlr-interest Digest, Vol 93, Issue 7
> *********************************************



More information about the antlr-interest mailing list