[antlr-interest] missing tokens and strange behaviour regarding some chars

Nieves.Salor.Moral at esa.int Nieves.Salor.Moral at esa.int
Thu Aug 5 02:39:53 PDT 2010


Thanks both Jim and Kevin

Kevin, I tried to use more LEXER expressions but the problem when parsing 
was that the TOKEN code that the LEXER sends is different than the more 
general rule as they are no fragments but full lexer rules, so it was not 
working. And yes It is giving me a real hard time.

Jim, I am doing something similar to what you suggested me. But I found 
the main error was in how I was mixing some TOKENS inside another LEXER 
rules and not only fragments, so the codes that were being sended were not 
the ones that I though would work because they were more general. Now the 
two problems that I had are solved, now I am extending the grammar and 
keep on testing it.

Example

a: TERMINAL1 rule2

TERMINAL1: TERMINAL2 | 'b'

TERMINAL2: 'c'

If I tried to send c rule2 I though that it was going to work correctly, 
but no because, as I discovered debugging (I don't know if this is a 
general case) it finds that 'c' is a TERMINAL2 TOKEN and so, it doesn't 
match the rule a.

Is this assumption correct in general?? Because maybe for me It has worked 
until now, but I can find another problem when extending, and I want to do 
a robust compiler.

Thanks for everything

Nieves




"Jim Idle" <jimi at temporal-wave.com> 
03/08/2010 18:18

To
<Nieves.Salor.Moral at esa.int>, <antlr-interest at antlr.org>
cc

Subject
RE: [antlr-interest] missing tokens and strange behaviour regarding some 
chars






Your expression is still defined in an LALR manner hence it will get
confused, you need to define it as a cascading set of rules with higher
precedence towards the bottom of the nest. That probably does not make a 
lot
of sense to you as words, so the best thing to do is to read through the
grammar for say Java or  C and look at the expression rules. Then 
basically
copy them and adapt themto your own operators.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Nieves.Salor.Moral at esa.int
> Sent: Tuesday, August 03, 2010 12:37 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] missing tokens and strange behaviour regarding
> some chars
> 
> Hello to everyone!
> 
> I am new with ANTLR but not with compilers. Before I explain the problem
I'll
> try to explain a little bit the situation background.
> 
> I am trying to design for a custom language, first a syntax highlighter
and
> second a module that can store the information in a DB (so in essence
would
> be creating a compiler with its output as SQL queries).
> My input language is defined in EBNF, thus it has left-recursion and
> ambiguity. In order to solve it, I have changed it a little to avoid 
those
> problems and mostly I have managed it without using predicates or
> backtracking.
> 
> Working with ANTLR Works, I am debugging the grammar with different
> examples (just the parser), before adding the highlighting code in the
> StringTemplate. but I get these strange errors, mostly regarding
> NoViableAltException.
> 
> One problem for example is trying to define negative expressions with 
the
> simple_factor rule.
> So when I debug expressions like 500 or +500 in the simple_factor, I 
don't
get
> an error. But If I try -500, I get the NoViableAltException. Also if I
change - for
> another symbol like @, it also work when I try @500. I have traced all 
the
> possibilities in the different possibilities in simple_factor, but in no
one the
> first symbol can be a negative symbol.
> And I am lost as to why this can happen. I add the whole grammar because
it
> is quite big to paste it.
> 
> Another problem that appears is that sometimes tokens are missed when
> reading, so for example if I have an input beginning with 'initiate and
> confirm',  ANTLR reads 'conf' and loses the first characters. With the
same
> grammar that I have posted. One example of this problem goes with the
> input 'initiate and confirm sys_stop of SCOE_1553 of LLCS of EGSE of
System
> of ODB' with the rule initiate_and_confirm_step_statement.
> 
> Thanks in advance for any input
> 
> Nieves Salor Moral
> 
> addition_operator:  ADDITION_OPERATOR
>         ;
> 
> ADDITION_OPERATOR
>         :       '+'|'-'
>         ;
> 
>  UNSIGNED_INTEGER
>         :       DIGIT+
>         ;
> 
> simple_factor
>         :       addition_operator simple_factor
>         |       NEGATION_BOOLEAN_OPERATOR simple_factor
>         |       constant
>         |       '('expression ')'
>         |       function
>         |       object_property_request
>         |       OBJECT_TYPE partial_path
>         |       'ask user' '(' expression ('default' expression)? ')'
> ('expect' predefined_type)?
>         ;
> 
> constant:       BOOLEAN_CONSTANT
>         |       UNSIGNED_INTEGER ( numeric_constant|
> RELATIVE_TIME_CONSTANT)
>         |       RELATIVE_TIME_CONSTANT
>         |       string_constant
>         |       HEXADECIMAL_CONSTANT
>         ;
> real_constant
>         :       ('.' UNSIGNED_INTEGER)? ('e' addition_operator?
> UNSIGNED_INTEGER)?
>         ;
> 
> numeric_constant
>         :        real_constant engineering_units?
>         ;
> 





More information about the antlr-interest mailing list