[antlr-interest] rule vs. subrule conflict, or

Sun May 20 16:17:17 PDT 2007

Cameron,

You have told the grammar that 'I' is a keyword, hence that it what the
lexer is returning. Remember that lexers must be context free, so you
can't expect it to know anything about what the parser wants.

If you want i as both a keyword and an identifier, then everywhere you
allow an id, you will need to allow 'I', choosing to interpret it as an
ID where it should be that.'

I would recommend that you don't put the 'I' in the parser, but define
this is a proper lexer token as that might make things clearer.

STRUCT_I : 'i' ;

dsBody : (ID | STRUCT_I) COLON INT EOL ...

Jim

From: Cameron Palmer [mailto:cameron.palmer at gmail.com] 
Sent: Sunday, May 20, 2007 12:42 AM
To: Jim Idle
Cc: antlr-interest at antlr.org
Subject: rule vs. subrule conflict, or

Jim - Thank you very much. I believe I have conquered Comment handling,
and I also found a bit on the case insensitive issue at
http://www.antlr.org/wiki/display/ANTLR3/Migrating+from+ANTLR+2+to+ANTLR
+3 .

I am encountering an interesting conflict between a rule and a subrule.

In the dsBody rule if i choose an ID that matches one of the
iStructElement rules I get a MismatchTokenException or with a slightly
different formulation NoViableAltException. 

Example Input that causes error:
.data
i:12 <- problem if you change 'i' to 'b' all is well
i 12
i 13
i 14
.end

what i find stange is that I would assume that the ID COLON INT EOL in
the grammar below would be checked beforehand and therefore no
ambiguity. The syntax diagram in ANTLRworks and the source code
generated by ANTLR leads me to believe this is the correct
interpretation. 

Grammar fragment:
dataSection
    : DATABEGIN EOL
    dsBody
    DATAEND EOL
    ;

dsBody    : ID COLON INT EOL
    (iStructElement EOL)+
    ;

iStructElement
    : 'i' INT 
    | 'ri' INT INT
    | 'f'
    | 'rf' INT
    | 'a' INT
    | 'ra' INT INT
    | 'u'
    | 'ru' INT
    ;

Thank you,

Cameron. 

On 5/18/07, Jim Idle <jimi at temporal-wave.com> wrote:

Cameron,

Your comment rule for the lexer needs to change a little to terminate in
a  non-greedy fashion and not to attempt to embed one rule (EOL) in the
other (otherwise you need to use a separate "fragment" and reference
that (I prefer the formulation below, but you may or may not want the
EOL to be part of the comment token (I assume not)):

COMMENT
    : SEMI (~( '\r' | '\n'))* {$channel=HIDDEN};

Case sensitivity is normal for the ANTLR 3 token stream, however I have
posted (or someone posted my code at least if I don't post it ;-), a way
to override the default stream with a method that will compare in a case
insensitive way but preserve the original case in the token text (useful
in some cases such as formatters and for literals so on). See:

http://www.antlr.org/pipermail/antlr-interest/2007-January/019008.h
tml 

I need to move this code into the wiki, but then, I need to do a lot of
things ;-)

Jim

From: antlr-interest-bounces at antlr.org [mailto:
antlr-interest-bounces at antlr.org
<mailto:antlr-interest-bounces at antlr.org> ] On Behalf Of Cameron Palmer
Sent: Thursday, May 17, 2007 10:17 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] SDF assembly language grammar

I am a student rewriting the assembler for an ISA called SDF (Scheduled
Dataflow). You can check out our work at http://csrl.unt.edu/sdf .
Anyway I am just writing the grammar and am having trouble making the
leap to code generation. I had started with v2 but figure I might just
convert the grammar to v3 today. The goal of this grammar is to emit
opcodes for each instruction and maintain numeric translations for the
labels. I think I would rather do the label translation as an AST rather
than backpatching. I have attached my grammar and sample assembly. My
questions:
Why it seems to stop parsing about the comment line (what am i doing it
wrong?)
Where did the caseSensitiveLiterals option disappear?

Thank you,

Cameron.

A sample of the assembly:
version 2.0.0
.data
.end
code main
        putr1 main.1
        forkep r1
        stop
main.1:
        ; alloc sum frame
        put sum, r4
        put 4, r5 
        falloc r4, r5, r6
        ; alloc result frame
        put main_res, r4
        put 1, r5
        falloc r4, r5, r7
        ;
        putr1 main.2
        forksp r1
        stop

The Grammar:
grammar SDF;

@header {
package edu.unt.csrl.sdf.asm;
import java.util.HashMap();
}
@members {
HashMap memory = new HashMap(); // Something like this to hold label
values that will be rewritten 
}
program
    : (EOL)? 
    versionSection
    dataSection
    (codeBlock)*
    ;

versionSection
    : 'version' (.)+ EOL;

dataSection
    : DATABEGIN EOL
        dsBody 
      DATAEND EOL;

dsBody
    : ID COLON INT EOL
        iStructBody
        dsBody
    | ;

iStructBody
    : ('i' | 'ri' | 'f' | 'rf' | 'a' | 'ra' | 'u' | 'ru')=> 
        iStructElement
        iStructBody
    | ;

iStructElement
    : 'i' INT EOL
    | 'ri' INT INT EOL
    | 'f' FLOAT EOL
    | 'rf' INT FLOAT EOL
    | 'a' INT EOL 
    | 'ra' INT INT EOL
    | 'u' EOL
    | 'ru' INT EOL;

codeBlock
    : 'code' ID EOL
        code;

code
    : label code
    | instruction code
    ; 

label
    : ID COLON (EOL)?;

instruction
    : 'nop' EOL
    | 'add' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'sub' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL 
    | 'mul' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'div' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'mod' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL 
    | 'and' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'or' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'not' reg['r'] COMMA reg['r'] EOL 
    | 'xor' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'band' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'bor' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL 
    | 'bnot' reg['r'] COMMA reg['r'] EOL
    | 'shl' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'shr' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL 
    | 'sar' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'neg' reg['r'] COMMA reg['r'] EOL
    | 'max' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL 
    | 'min' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'abs' reg['r'] COMMA reg['r'] EOL
    | 'lt' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL 
    | 'le' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'eq' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'ne' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL 
    | 'ge' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'gt' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'tbl' reg['r'] COMMA reg['r'] EOL 
    | 'tch' reg['r'] COMMA reg['r'] EOL
    | 'fadd' reg['f'] COMMA reg['f'] COMMA reg['f'] EOL
    | 'fsub' reg['f'] COMMA reg['f'] COMMA reg['f'] EOL 
    | 'fmul' reg['f'] COMMA reg['f'] COMMA reg['f'] EOL
    | 'fdiv' reg['f'] COMMA reg['f'] COMMA reg['f'] EOL
    | 'flr' reg['f'] COMMA reg['f'] EOL 
    | 'ceil' reg['f'] COMMA reg['f'] EOL
    | 'flt' reg['f'] COMMA reg['f'] COMMA reg['r'] EOL
    | 'fle' reg['f'] COMMA reg['f'] COMMA reg['r'] EOL 
    | 'feq' reg['f'] COMMA reg['f'] COMMA reg['r'] EOL
    | 'fne' reg['f'] COMMA reg['f'] COMMA reg['r'] EOL
    | 'fge' reg['f'] COMMA reg['f'] COMMA reg['r'] EOL 
    | 'fgt' reg['f'] COMMA reg['f'] COMMA reg['r'] EOL
    | 'trl' reg['r'] COMMA reg['f'] EOL
    | 'tdb' EOL // TODO
    | 'tin' reg['f'] COMMA reg['r'] EOL 
    | 'gput' INT COMMA reg['g'] EOL
    | 'gadd' reg['g'] COMMA reg['g'] COMMA reg['g'] EOL
    | 'gsub' reg['g'] COMMA reg['g'] COMMA reg['g'] EOL 
    | 'gmul' reg['g'] COMMA reg['g'] COMMA reg['g'] EOL
    | 'gtl' reg['g'] COMMA reg['r'] EOL
    | 'ltg' reg['r'] COMMA reg['g'] EOL 
    | 'move' reg['r'] COMMA reg['r'] EOL
    | 'putr1' (INT | ID) EOL
    | 'put' (INT | ID) COMMA reg['r'] EOL
    | 'load' reg['r'] PIPE (reg['r'] | INT) COMMA reg['r'] EOL 
    | 'store' reg['r'] COMMA reg['r'] PIPE (reg['r'] | INT) EOL
    | 'store1' EOL // TODO
    | 'rstore' EOL // TODO
    | 'ialloc' reg['r'] COMMA reg['r'] EOL 
    | 'ifree' reg['r'] EOL
    | 'ifetch' reg['r'] PIPE (reg['r'] | INT) COMMA reg['r'] EOL
    | 'istore' reg['r'] COMMA reg['r'] PIPE (reg['r'] | INT) EOL 
    | 'forksp' reg['r'] ( | COMMA reg['r']) EOL
    | 'forkep' reg['r'] ( | COMMA reg['r']) EOL
    | 'stop' EOL
    | 'falloc' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL 
    | 'ralloc' EOL // TODO
    | 'ffree' ( | reg['r']) EOL
    | 'input' INT COMMA reg['r'] EOL
    | 'output' reg['r'] COMMA INT EOL
    | 'skip' EOL // TODO 
    | 'ifchcl' EOL // TODO
    | 'pow' reg['r'] COMMA reg['r'] COMMA reg['r'] EOL
    | 'sfalloc' EOL // TODO
    | 'fput' FLOAT COMMA reg['f'] EOL 
    | 'finput' INT COMMA reg['f'] EOL
    | 'halt' EOL
    | 'fmove' reg['f'] COMMA reg['f'] EOL
    | 'fneg' reg['f'] COMMA reg['f'] EOL 
    | 'fabs' reg['f'] COMMA reg['f'] EOL
    | 'fmath' INT COMMA reg['f'] COMMA reg['f'] EOL
    | 'sforkep' EOL // TODO
    | 'sforksp' EOL // TODO 
    | 'scforksp' EOL // TODO
    | 'sffree' EOL // TODO
    | 'sread' EOL
    ;

reg[char type] returns[int rNum = -1]
    : rStr=ID {
        try {
            if(Character.toLowerCase (rStr.charAt(0)) != type)
                throw new SemanticException("expected " + type + "x,
found " + rStr);

            if(rStr.equals("RFP"))
                rNum = 63; 
            else
                rNum = Integer.parseInt(rStr.substring(1));
        } catch(Exception ex) {
            throw new SemanticException("expected " + type + "x, found "
+ rStr); 
        }
    };

DATABEGIN:
    '.data';
DATAEND:
    '.end';
COMMENT
    : SEMI (.)* EOL {$channel=HIDDEN}; 
fragment DIGIT
    : ('0'..'9');
INT 
    : ('-')? (DIGIT)+;        
FLOAT    
    : INT '.' (DIGIT)+;
SEMI    
    : ';';
COLON
    : ':';
COMMA
    : ',';
PIPE
    : '|';

ID 
    : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'.'|'0'..'9')*;

EOL
    : (('\r')? '\n') {newline();}; 

WS
    : (
    | (' ' | '\t')
    ) { $channel=HIDDEN; }
    ;

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070520/aed77028/attachment-0001.html