[antlr-interest] missing tokens and strange behaviour regarding some chars

Jim Idle jimi at temporal-wave.com
Thu Aug 5 09:34:22 PDT 2010


Those rules are ambiguous, so your lexer is broken. The sequence 'c' cannot
bother be TERMINAL1 and a TERMINAL2 as this grammar is context free. Hence
ANTLR happens to decide that TERMINAL2 is what you want. The lexer has to
return the same sequence of tokens as the same type every time. It is not
driven by the parser. The lexer runs and produces all the the tokens, then
the parser runs. 

 

So, you can only have:

 

TERMINAL : 'b' | 'c';

 

Or your TERMINAL2 should be a fragment rule, in which case you will always
get TERMINAL1. 

 

If a particular sequence of characters means something, then you should
process this with semantics (either during the parse or better yet, after
you have produced an AST that you can walk and analyze.

 

Jim

 

From: Nieves.Salor.Moral at esa.int [mailto:Nieves.Salor.Moral at esa.int] 
Sent: Thursday, August 05, 2010 2:40 AM
To: Jim Idle; cummings at kjchome.homeip.net
Cc: antlr-interest at antlr.org
Subject: RE: [antlr-interest] missing tokens and strange behaviour regarding
some chars

 


Thanks both Jim and Kevin 

Kevin, I tried to use more LEXER expressions but the problem when parsing
was that the TOKEN code that the LEXER sends is different than the more
general rule as they are no fragments but full lexer rules, so it was not
working. And yes It is giving me a real hard time. 

Jim, I am doing something similar to what you suggested me. But I found the
main error was in how I was mixing some TOKENS inside another LEXER rules
and not only fragments, so the codes that were being sended were not the
ones that I though would work because they were more general. Now the two
problems that I had are solved, now I am extending the grammar and keep on
testing it. 

Example 

a: TERMINAL1 rule2 

TERMINAL1: TERMINAL2 | 'b' 

TERMINAL2: 'c' 

If I tried to send c rule2 I though that it was going to work correctly, but
no because, as I discovered debugging (I don't know if this is a general
case) it finds that 'c' is a TERMINAL2 TOKEN and so, it doesn't match the
rule a. 

Is this assumption correct in general?? Because maybe for me It has worked
until now, but I can find another problem when extending, and I want to do a
robust compiler. 

Thanks for everything 

Nieves





"Jim Idle" <jimi at temporal-wave.com> 

03/08/2010 18:18 


To

<Nieves.Salor.Moral at esa.int>, <antlr-interest at antlr.org> 


cc

	

Subject

RE: [antlr-interest] missing tokens and strange behaviour regarding
some chars

 

		




Your expression is still defined in an LALR manner hence it will get
confused, you need to define it as a cascading set of rules with higher
precedence towards the bottom of the nest. That probably does not make a lot
of sense to you as words, so the best thing to do is to read through the
grammar for say Java or  C and look at the expression rules. Then basically
copy them and adapt themto your own operators.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [ <mailto:antlr-interest->
mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Nieves.Salor.Moral at esa.int
> Sent: Tuesday, August 03, 2010 12:37 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] missing tokens and strange behaviour regarding
> some chars
> 
> Hello to everyone!
> 
> I am new with ANTLR but not with compilers. Before I explain the problem
I'll
> try to explain a little bit the situation background.
> 
> I am trying to design for a custom language, first a syntax highlighter
and
> second a module that can store the information in a DB (so in essence
would
> be creating a compiler with its output as SQL queries).
> My input language is defined in EBNF, thus it has left-recursion and
> ambiguity. In order to solve it, I have changed it a little to avoid those
> problems and mostly I have managed it without using predicates or
> backtracking.
> 
> Working with ANTLR Works, I am debugging the grammar with different
> examples (just the parser), before adding the highlighting code in the
> StringTemplate. but I get these strange errors, mostly regarding
> NoViableAltException.
> 
> One problem for example is trying to define negative expressions with the
> simple_factor rule.
> So when I debug expressions like 500 or +500 in the simple_factor, I don't
get
> an error. But If I try -500, I get the NoViableAltException. Also if I
change - for
> another symbol like @, it also work when I try @500. I have traced all the
> possibilities in the different possibilities in simple_factor, but in no
one the
> first symbol can be a negative symbol.
> And I am lost as to why this can happen. I add the whole grammar because
it
> is quite big to paste it.
> 
> Another problem that appears is that sometimes tokens are missed when
> reading, so for example if I have an input beginning with 'initiate and
> confirm',  ANTLR reads 'conf' and loses the first characters. With the
same
> grammar that I have posted. One example of this problem goes with the
> input 'initiate and confirm sys_stop of SCOE_1553 of LLCS of EGSE of
System
> of ODB' with the rule initiate_and_confirm_step_statement.
> 
> Thanks in advance for any input
> 
> Nieves Salor Moral
> 
> addition_operator:  ADDITION_OPERATOR
>         ;
> 
> ADDITION_OPERATOR
>         :       '+'|'-'
>         ;
> 
>  UNSIGNED_INTEGER
>         :       DIGIT+
>         ;
> 
> simple_factor
>         :       addition_operator simple_factor
>         |       NEGATION_BOOLEAN_OPERATOR simple_factor
>         |       constant
>         |       '('expression ')'
>         |       function
>         |       object_property_request
>         |       OBJECT_TYPE partial_path
>         |       'ask user' '(' expression ('default' expression)? ')'
> ('expect' predefined_type)?
>         ;
> 
> constant:       BOOLEAN_CONSTANT
>         |       UNSIGNED_INTEGER ( numeric_constant|
> RELATIVE_TIME_CONSTANT)
>         |       RELATIVE_TIME_CONSTANT
>         |       string_constant
>         |       HEXADECIMAL_CONSTANT
>         ;
> real_constant
>         :       ('.' UNSIGNED_INTEGER)? ('e' addition_operator?
> UNSIGNED_INTEGER)?
>         ;
> 
> numeric_constant
>         :        real_constant engineering_units?
>         ;
> 






More information about the antlr-interest mailing list