[antlr-interest] Spaces issues

Loring Craymer lgcraymer at yahoo.com
Tue Mar 29 22:46:28 PDT 2011


Now you are getting confused by COMBINED grammars.  ANTLR generates distinct 
lexer and parser from a combined grammar; the lexer (the capitalized rules in 
the "combined grammar") tokenizes the input, while the parser operates on the 
tokens generated by the lexer.  Regroup your rules to segregate parser from 
lexer (this is to make them more readable and has no functional impact) and then 
consider whether the lexer rules do what you want (clearly, they don't in their 
current incarnation).  Lexer rules should be considered as alternatives for 
tokens (in fact, ANTLR generates a master lexer rule which basically takes the 
form
Tokens : A | B | ... | Z ;
where A, B, and so forth are the lexer production names).

--Loring



----- Original Message ----
> From: Fabien Hermenier <hermenierfabien at gmail.com>
> To: antlr-interest at antlr.org
> Sent: Tue, March 29, 2011 7:59:54 PM
> Subject: Re: [antlr-interest] Spaces issues
> 
> Hi
> 
> I have reduced the number of fragment to zero for test purposes but it 
> does not solve the problem.
> So I have reduced the grammar to a minimum,  to only be able to parse the 
> input I gave to you.
> It appears yet, that  the Lexer rule "INNERCONTENT" has caused the issue.
> This is strange to me as  it was not used in the rule "litteralRange".
> 
> Does anyone know how is this  possible ?
> 
> Thanks for your help
> Fabien.
> 
> Le 29/03/11 19:53,  Loring Craymer a écrit :
> > The likely cause of your problems is the  extensive use of fragment rules.  
>ANTLR
> > 3 does not use follow sets  in  lexers and invocation of fragment rules 
>usually
> > disables LL*  processing.  Inline your fragment rules, and your current 
>problems
> >  should disappear, although others may still lurk.
> >
> >  --Loring
> >
> >
> > ----- Original Message ----
> >> From:  Fabien Hermenier<hermenierfabien at gmail.com>
> >>  To: antlr-interest at antlr.org
> >>  Sent: Tue, March 29, 2011 12:51:47 PM
> >> Subject: Re: [antlr-interest]  Spaces issues
> >>
> >> Here is my entire grammar
> >>  There is a lot of commented rules and  "litteralRange" does not have  its
> >> complete definition cause easier patterns  does not work  yet.
> >> Currently, litteralRange should accept inputs such as   "[2..3"] or "[ 2
> >> .. 0xFF]".
> >>
> >> Thanks for  your  help!
> >>
> >> ---
> >> grammar  ANTLRVJob5;
> >>
> >> options {
> >>         language = Java;
> >>       output =  AST;
> >> }
> >> fragment Digit  :'0'..'9';
> >>  fragment Letter    :'a'..'z'|'A'..'Z';
> >> fragment   Name    : Domain ('.' Domain)*;
> >> fragment Domain:  Letter  ('-'?(Letter|Digit))*;
> >> fragment VarPrefix:  '$';
> >> fragment EnumSep:  ',';
> >> fragment  InnerContent:    (Letter
> >>                 |Digit
> >>                 |'_'
> >>                |'-'
> >>                 |'.'(Letter|Digit));
> >> fragment RRange: ']'   (InnerContent*(Letter|Digit))?;
> >> fragment LRange: (Letter   (Digit|Letter|'-'|'_'|'.')*)? '[';
> >>
> >> //Number litteral  section
> >> fragment  HEX_LITERAL : ;
> >> fragment  OCTAL_LITERAL :;
> >> fragment  DECIMAL_LITERAL:;
> >>  NUMBER: '0'(
> >>       ('x'|'X') { $type =   HEX_LITERAL;}
> >>        (Digit|'a'..'f'|'A'..'F')+
> >>         |
> >>       ('0'..'7')+ {$type =  OCTAL_LITERAL;}
> >>        |
> >>        )
> >>       |
> >>        '1'..'9' Digit*  {$type = DECIMAL_LITERAL;}
> >>        ;
> >>
> >> NAME: Name;
> >> ENUMSEP:   EnumSep;
> >> EQUALS    :    '=';
> >>  ENDL    :     ';';
> >> PLUS    :     '+';
> >> MINUS     :    '-';
> >>  TIMES    :    '*';
> >> VARIABLE:      VarPrefix(Letter|'_')(Letter|Digit|'_')*;
> >>
> >>  COMMENT
> >>        :   '//' ~('\n'|'\r')* '\r'?  '\n' {$channel=HIDDEN;}
> >>        |   '/*' (  options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
> >>         ;
> >>
> >> WS    :     ('\n'|'\r'|'\t'|' ')   {$channel=HIDDEN;};
> >>
> >>
> >> INNERCONTENT:   InnerContent+;
> >> RRANGE:RRange;
> >> LRANGE:  LRange;
> >> LVRANGE: VarPrefix  LRange;
> >>  CONSTRAINTIDENTIFIER:   Letter(Letter|Digit|'_')*'(';
> >>
> >> litteral:      NAME|NUMBER;
> >> operator:     PLUS|TIMES;
> >>
> >> //litteralRange:     LRANGE  INTEGER '..' INTEGER RRANGE;
> >> litteralRange:    '['   NUMBER '..' NUMBER ']';
> >>
> >> litteralEnum:     LRANGE INNERCONTENT  /*(ENUMSEP INNERCONTENT)+']'   
>RRANGE*/;
> >>
> >> variableEnum: LVRANGE  INNERCONTENT  (ENUMSEP INNERCONTENT)+  RRANGE;
> >> variableRange: LVRANGE   NUMBER '..' NUMBER RRANGE;
> >>
> >> explodedSet:('{}'|  '{'expression (ENUMSEP  expression)*'}');
> >>
> >>  atom    :    '(' expression  ')'
> >>            |litteral
> >> //          |VARIABLE
> >>            |litteralRange
> >> //          |litteralEnum
> >> //          |variableRange
> >> //         |variableEnum
> >> //          |explodedSet
> >> ;
> >>
> >>
> >> expression:  atom/* (operator  expression)?*/;
> >>
> >> var_decl:     VARIABLE EQUALS expression  ';';
> >>
> >>  /*forEachStatement:
> >>       'foreach' VARIABLE  'in'  expression '{'
> >>        instruction*
> >>        '}';
> >>
> >>  constraintCallStatement: CONSTRAINTIDENTIFIER expression (','
> >>  expression)* ')' ';';
> >> */
> >> instruction:     var_decl;
> >>             //|forEachStatement
> >> //          |constraintCallStatement;
> >>
> >> vjob_decl:      instruction*;
> >> ---
> >>
> >> Le 29/03/11 12:47, Jim  Idle a écrit :
> >>> Looks  like you might be looking for a token  that you have not defined, 
>but
> >>>   post your grammar as it  stands now and we can work it out.
> >>>
> >>>    Jim
> >>>
> >>>> -----Original  Message-----
> >>>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >>>> bounces at antlr.org] On Behalf Of Fabien   Hermenier
> >>>> Sent: Tuesday, March 29, 2011 11:25  AM
> >>>> To: antlr-interest at antlr.org
> >>>>    Subject: Re: [antlr-interest] Spaces  issues
> >>>>
> >>>> Yes, and it  this situation,  it seems it ignores the first number and
> >>>> the range   delimiter:
> >>>> Here is a sample of the event list with the  input  "[2..3]" and the
> >>>> starting rule   "litteralRange"
> >>>>
> >>>> Consume   [[/<32>,1:0, at 0]
> >>>> Create node 2(0)
> >>>>  Add child 2 to  1
> >>>> Location (64,20)
> >>>>  LT 1 (3)
> >>>> LT 1  (3)
> >>>> LT 2  (])
> >>>> LT 1 (3)
> >>>> LT 1  (3)
> >>>>   LT 1 (3)
> >>>>  RecognitionException: MismatchedTokenException(0!=0) Begin  resync LT  1
> >>>> (3) Consume [3/<15>,1:4, at 1] LT 1 (]) Consume   []/<35>,1:5 at 2] LT 1 (;)
> >>>> ...
> >>>>    ...
> >>>>
> >>>> Le 29/03/11 12:16, Jim Idle a  écrit  :
> >>>>> Did you use the debugger instead of  the  interpreter?
> >>>>>
> >>>>>    Jim
> >>>>>
> >>>>>> -----Original   Message-----
> >>>>>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >>>>>> bounces at antlr.org] On Behalf Of Fabien   Hermenier
> >>>>>> Sent: Tuesday, March 29, 2011 10:37   AM
> >>>>>> To: antlr-interest at antlr.org
> >>>>>>    Subject: Re: [antlr-interest] Spaces   issues
> >>>>>>
> >>>>>> Le 29/03/11  07:36, John B. Brodie  a écrit :
> >>>>>>>    Greetings!
> >>>>>>>
> >>>>>>> On  Tue, 2011-03-29 at  00:47 -0600, Fabien Hermenier  wrote:
> >>>>>>>>    Hi
> >>>>>>>>
> >>>>>>>> I  starting to use  ANTLR3 with AntlrWorks 3.4.1 on OS X and I  have
> >>>>>>>> some  issues with spaces. I've  attached a sample antlr file
> >>>>>>    describing
> >>>>>>>> my grammar (see 1st   grammar)
> >>>>>>>>
> >>>>>>>>  I'm trying to  test 'litteralRange'. So using the interpreter,  I
> >>>>>>   write
> >>>>>>>>  "[2 ..3]" or "[2 .. 3]" as input and it works  fine. However, if  I
> >>>>>>>> give the string "[2..3]" it does  not  work. I have followed the
> >>>>>>>> tutorial and  declare  the Lexer WS with the channel hidden to
> >>>>    ignore
> >>>>>>>> spaces, but I still have strange  issues with   this.
> >>>>>>>>
> >>>>>>>>  Another strange  fact is that if I write a reduced grammar  that
> >>>>   just
> >>>>>>>> isolate  the rule I want to test, it is fine  (see 2nd  grammar).
> >>>>>>>>
> >>>>>>>>  Does  anyone have a solution or a hint   ?
> >>>>>>>>
> >>>>>>> ....good  stuff   snipped....
> >>>>>>>
> >>>>>>> see  Jim Idle's WIKI   entry:
> >>>>>>>
> >>>>>>> http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating
> >>>>>>> +point%2C+dot%2C+range%2C+time+specs
> >>>>>>>
> >>>>>>> ; ; (the above url is supposed to be all on 1 line without white   
>space)
> >>>>>>>
> >>>>>>> Hope  this  helps...
> >>>>>>>          -jbb
> >>>>>>>
> >>>>>>>
> >>>>>>  Thanks,  I still have a question. I understand how it is difficult  to
> >>>>>>   capture '..' while having to bother with  float numbers such as  
".3".
> >>>>>> But in my case, I  only have to deal with integer  values, so
> >>>>  currently
> >>>>>> I don't see why I need to  help the  Lexer.
> >>>>>> I have reduced the number of fragments   following the principle of
> >>>> the
> >>>>>>  link you sent to  me (to catch in a single rule numbers in base  10,
> >>>>   16
> >>>>>>  or
> >>>>>> 8) but
> >>>>>> it   didn't solve my problem   yet.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>    List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>>>>    Unsubscribe:
> >>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-
> >>>>>>    email-address
> >>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>>>    Unsubscribe:
> >>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-
> >>>>    address
> >>>>
> >>>>
> >>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>>    Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> >>>>    email-address
> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>  Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>
> >>
> >>  List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>  Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 


More information about the antlr-interest mailing list