[antlr-interest] Simple Grammar Question

Sat Jan 17 09:38:33 PST 2009

John Gardener schrieb:
>      *Hello;*
> 
>     I am stuck with simple grammar; any help is much welcomed;
> 
>     I want to parse 2 term sentenses, such as:
>     <1: single digit > <2: name containing letters and digits > EOF
> 
>     Below comes:
>     1) grammar
>     2) test rig
>     3) output
> 
>     PROBLEM:
>     Second term (name) seems to greedily consume whole input;
> 
>     Please let me know what is the proper way to deal with this?

If the lexer can match the same input via more than one rule, it chooses
the rules which consumes the most input. Try the following rules instead:

fragment NAME:;

DIGIT : ('0'..'9'|'A'..'Z' {$type=NAME;}) ('0'..'9'
{$type=NAME;}|'A'..'Z' {$type=NAME;})*
      { out.println("+DIGIT: " + $text ); } ;

It should only generate DIGITs if no more than one character is matched
and that character is a digit. But can names start with digits anyway?
If not, this may work, too:

DIGIT : '0'..'9'
      { out.println("+DIGIT: " + $text ); } ;

NAME : 'A'..'Z' ( 'A'..'Z' | '0'..'9' ) *
      { out.println("+NAME: " + $text ); } ;

Not using fragments for tokens and yet still calling other lexer rules
in lexer rules may give strange results anyway and is discouraged by the
experienced users. With fragments the above version looks like this:

fragment DIGIT : '0'..'9';

fragment ALPHA : 'A'..'Z';

NUMBER : DIGIT { out.println("+NUMBER: " + $text ); } ;

NAME : ALPHA ( ALPHA | DIGIT ) *
      { out.println("+NAME: " + $text ); } ;

Johannes

> 
>     *1) GRAMMAR*
> 
>     grammar Simple;  
> 
>     options {
>         language = Java;
>     }
> 
>     @parser::header {
>         package simple;
>         import static java.lang.System.out;
>     }
> 
>     @lexer::header{
>       package simple;
>       import static java.lang.System.out;
>     }
> 
>     // PARSER
> 
>     record :
>       digit name EOF
>       { out.println( "+record: " +  $text );  };
> 
>     digit : DIGIT
>       { out.println( "+digit: " +  $text );  };
>      
>     name : NAME
>       { out.println( "+name: " +  $text );  };
> 
> 
>     // LEXER
> 
>     DIGIT : '0'..'9'
>       { out.println("+DIGIT: " + $text ); } ;
> 
>     LETTER : 'A'..'Z'
>       { out.println("+LETTER: " + $text ); } ;
> 
>     NAME : ( LETTER | DIGIT ) + 
>       { out.println("+NAME: " + $text ); } ;
> 
> 
>     *2) TEST RIG*
> 
>     package simple;
> 
>     import java.io.ByteArrayInputStream;
> 
>     import org.antlr.runtime.ANTLRInputStream;
>     import org.antlr.runtime.CommonTokenStream;
> 
>     import static java.lang.System.out;
> 
>     public class SimpleTest {
> 
>         public static void main(String[] args) throws Exception {
> 
>             String record = "3B5A";
> 
>             ByteArrayInputStream stream = new ByteArrayInputStream(record
>                     .getBytes());
> 
>             ANTLRInputStream input = new ANTLRInputStream(stream);
> 
>             SimpleLexer lexer = new SimpleLexer(input);
> 
>             CommonTokenStream tokens = new CommonTokenStream(lexer);
> 
>             SimpleParser parser = new SimpleParser(tokens);
> 
>             parser.record();
> 
>             out.println(record);
> 
>         }
> 
>     }
> 
> 
>     *3) TEST OUTPUT*
> 
>     +DIGIT: 3
>     +LETTER: 3B
>     +DIGIT: 3B5
>     +LETTER: 3B5A
>     +NAME: 3B5A
>     line 1:0 missing DIGIT at '3B5A'
>     +digit: null
>     +name: 3B5A
>     +record: 3B5A
>     3B5A
> 
> 
>     *Thank you, *
> 
>     John
> 
> 
> ------------------------------------------------------------------------
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address