[antlr-interest] Problem with an underline and semantic action

Nicola Cuomo ncuomo at gmail.com
Thu Sep 15 18:18:57 PDT 2005


Hi,
   I'm a antlr newby trying to make a translator.

After  reading  the  manual  and looking at the examples on the site i
still have a lot of problem/question.

So let's start :)

First problem

I've this grammar:

-------------------------------------------------------------------
options
{
        language = "Cpp";
}

class TestParser extends Parser;

options 
{
        buildAST = false;
        k = 3;
}

spec
        : UNDERLINE CONST_IDENT
        ;


class TestLexer extends Lexer;

options 
{
        charVocabulary='\u0000'..'\u00ff';
        k = 3;                  
}

tokens
{
        UNDERLINE               = "under_line";
}

/* Whitespaces */
WS
  : ( ' '
    | '\t'
    | '\f'

    // handle newlines
    | ( "\r\n"  // DOS/Windows
        | '\r'    // Macintosh
        | '\n'    // Unix
                        )
      { newline(); }
    )
    { $setType(antlr::Token::SKIP); }
  ;

COMMENT
  : "%" (~('\n'|'\r'))*
    { $setType(antlr::Token::SKIP); }
  ;
  
CONST_IDENT
  options { testLiterals=true; }
        : ('a'..'z') ('a'..'z'|'A'..'Z'|'0'..'9')*
        ;
-------------------------------------------------------------------

It's a test that should parse something like "under_line a123123"

When i execute the program i get

$ ./main
under_line a123123
line 1:1: expecting "under_line", found 'under'
Parse exception: line 1:6: unexpected char: '_'

It  seem  to stop looking for char when it hit the underline returning
the  "under"  token  and  breaking  the parse. My first thought was to
extend the charVocabulary but i've no clue on how to do it.

charVocabulary='\u0000'..'\u00ff';  shouldn't  already include all the
ascii character??

charVocabulary='\u0000'..'\ufffe';  like  someone suggested on this ml
for  a similar problem doesn't work in Cpp mode "warning: underline.g:
Vocabularies  of  this  size  still  experimental in C++ mode" and the
following compilation fail.

The "Second problem" is about semantic action:

I've the following grammar piece
-----
formula
        : expression (EQUAL|LESST) expression
        ;
expression
        : CONST_IDENT
        | VAR_IDENT (PRIME)?
         ... and so on ...
        ;
-----

I  would  like  to  get all the string that match the first expression
rule in formula.

I've written something like:

-----
formula
        : exp:expression (EQUAL|LESST) expression { std::cout << exp->getText() << std::endl; }
        ;
expression
        : CONST_IDENT
        | VAR_IDENT (PRIME)?
        ... and so on ...
        ;
-----

But the compilation fail saying that no exp is defined. From what i've
seen  it seem to work with terminal token like EQUAL.
There's a way to get all the text of a matching rule without having to
build it from the "subexpression"?

Sorry for my english :)

Thanks for the answer :P
-- 
 Nicola                          mailto:ncuomo at gmail.com



More information about the antlr-interest mailing list