[antlr-interest] Simple Grammar Question

Sat Jan 17 11:34:49 PST 2009

    *Johannes*, hi

    Thank you for the answer,

    Now I will know:
    *1) "If the lexer can match the same input via more than one rule,
    it chooses the rules which consumes the most input"
    2) "Do not call a token from a token; instead, call fragments from a
    token"
    *
    I wander if ANTLR community has "10 commandments" (or 100?) posted
    anywhere? :-)

    John

-------- Original Message  --------
Subject: Re: [antlr-interest] Simple Grammar Question
From: Johannes Luber <jaluber at gmx.de>
To: John Gardener <John.Gardener at carrotgarden.com>
Cc: antlr-interest at antlr.org
Date: Sat 17 Jan 2009 11:38:33 AM CST
> John Gardener schrieb:
>   
>>      *Hello;*
>>
>>     I am stuck with simple grammar; any help is much welcomed;
>>
>>     I want to parse 2 term sentenses, such as:
>>     <1: single digit > <2: name containing letters and digits > EOF
>>
>>     Below comes:
>>     1) grammar
>>     2) test rig
>>     3) output
>>
>>     PROBLEM:
>>     Second term (name) seems to greedily consume whole input;
>>
>>     Please let me know what is the proper way to deal with this?
>>     
>
> If the lexer can match the same input via more than one rule, it chooses
> the rules which consumes the most input. Try the following rules instead:
>
> fragment NAME:;
>
> DIGIT : ('0'..'9'|'A'..'Z' {$type=NAME;}) ('0'..'9'
> {$type=NAME;}|'A'..'Z' {$type=NAME;})*
>       { out.println("+DIGIT: " + $text ); } ;
>
> It should only generate DIGITs if no more than one character is matched
> and that character is a digit. But can names start with digits anyway?
> If not, this may work, too:
>
> DIGIT : '0'..'9'
>       { out.println("+DIGIT: " + $text ); } ;
>
> NAME : 'A'..'Z' ( 'A'..'Z' | '0'..'9' ) *
>       { out.println("+NAME: " + $text ); } ;
>
> Not using fragments for tokens and yet still calling other lexer rules
> in lexer rules may give strange results anyway and is discouraged by the
> experienced users. With fragments the above version looks like this:
>
> fragment DIGIT : '0'..'9';
>
> fragment ALPHA : 'A'..'Z';
>
> NUMBER : DIGIT { out.println("+NUMBER: " + $text ); } ;
>
>
> NAME : ALPHA ( ALPHA | DIGIT ) *
>       { out.println("+NAME: " + $text ); } ;
>
> Johannes
>
>   
>>     *1) GRAMMAR*
>>
>>     grammar Simple;  
>>
>>     options {
>>         language = Java;
>>     }
>>
>>     @parser::header {
>>         package simple;
>>         import static java.lang.System.out;
>>     }
>>
>>     @lexer::header{
>>       package simple;
>>       import static java.lang.System.out;
>>     }
>>
>>     // PARSER
>>
>>     record :
>>       digit name EOF
>>       { out.println( "+record: " +  $text );  };
>>
>>     digit : DIGIT
>>       { out.println( "+digit: " +  $text );  };
>>      
>>     name : NAME
>>       { out.println( "+name: " +  $text );  };
>>
>>
>>     // LEXER
>>
>>     DIGIT : '0'..'9'
>>       { out.println("+DIGIT: " + $text ); } ;
>>
>>     LETTER : 'A'..'Z'
>>       { out.println("+LETTER: " + $text ); } ;
>>
>>     NAME : ( LETTER | DIGIT ) + 
>>       { out.println("+NAME: " + $text ); } ;
>>
>>
>>     *2) TEST RIG*
>>
>>     package simple;
>>
>>     import java.io.ByteArrayInputStream;
>>
>>     import org.antlr.runtime.ANTLRInputStream;
>>     import org.antlr.runtime.CommonTokenStream;
>>
>>     import static java.lang.System.out;
>>
>>     public class SimpleTest {
>>
>>         public static void main(String[] args) throws Exception {
>>
>>             String record = "3B5A";
>>
>>             ByteArrayInputStream stream = new ByteArrayInputStream(record
>>                     .getBytes());
>>
>>             ANTLRInputStream input = new ANTLRInputStream(stream);
>>
>>             SimpleLexer lexer = new SimpleLexer(input);
>>
>>             CommonTokenStream tokens = new CommonTokenStream(lexer);
>>
>>             SimpleParser parser = new SimpleParser(tokens);
>>
>>             parser.record();
>>
>>             out.println(record);
>>
>>         }
>>
>>     }
>>
>>
>>     *3) TEST OUTPUT*
>>
>>     +DIGIT: 3
>>     +LETTER: 3B
>>     +DIGIT: 3B5
>>     +LETTER: 3B5A
>>     +NAME: 3B5A
>>     line 1:0 missing DIGIT at '3B5A'
>>     +digit: null
>>     +name: 3B5A
>>     +record: 3B5A
>>     3B5A
>>
>>
>>     *Thank you, *
>>
>>     John
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>     
>
>
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090117/78317d38/attachment.html