[antlr-interest] Re: Skipping grammar

Anthony W Youngman Anthony.Youngman at ECA-International.com
Wed Oct 8 04:09:33 PDT 2003


Firstly, DON'T stop posting here ... what matters is that you show that
you are trying to understand. What pisses people off is when they think
students are trying to skip homework ...

Secondly, the best way to learn is to teach. I probably don't know much
more than you, so you're helping me learn (plus all those other lurkers
who are watching and haven't dived in :-) I've posted similarly "dumb"
posts and been grateful to everyone who's helped me - I owe it to the
list to help when I can.

Okay. In your "file to parse" you have "packet", "model", "method". Are
these keywords? If so, your life is nice and simple. Similarly, if
that's your nesting such that the method braces are always at the third
level down, it's equally as simple, just slightly different. So how do
you handle that?

At the start of the lexer, you can declare an initialisation code block.
You want to declare a state enum with the values NULL, IN_PACKET,
IN_MODEL, and IN_METHOD.

Your lexer will now contain something like this ...

packet: state == NULL   // this is a predicate
   {
      "PACKET" lcurly {state = IN_PACKET}  //set the state variable
      model lcurly {state = NULL} // reset the state variable
   } ;

model: state == IN_PACKET
   {
      "MODEL" lcurly {state = IN_MODEL}
      method lcurly {state = IN_PACKET}
   } ;

method: ... I'll leave it to you :-)

I'm sure I've messed up my ANTLR syntax good and proper, and other
people will help you with how to do this, but this looks like the
approach you want. Particularly, look at predicates and in-line code.
And WATCH OUT !!! because predicates *can* get you into trouble with
look-ahead. It looks like what you're doing is pretty simple and won't
be any trouble, but it does happen ...

Cheers,
Wol

-----Original Message-----
From: pwolleba [mailto:pwolleba at yahoo.no] 
Sent: 08 October 2003 11:36
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Re: Skipping grammar


Hello!

I am starting to dominate this newsgroup with my problem, so I guess 
I have to stop after this post!
Anyway, I will paste some of my code from my parser and if you could 
find where I am thinking wrong I would appreciate if you could 
comment it!



PARSER
 
//---------------------------------------------- METHODE -------------
methodeNode         : (METHOD^) declarationName methodeDecleration 
methodBody;

methodeDecleration  : (LPAREN!) (methodArguments)? (RPAREN!)
                      {#methodeDecleration=#
([ARGUMENTS,"Arguments"],#methodeDecleration);};

methodArguments     : (methodArgument (COMMA! methodArguments)?);

methodArgument      : declarationName;

methodBody          : (METHOD_BODY)
                      {#methodBody=#
([EXPRESSION,"Expression"],#methodBody);};


LEX

METHOD_BODY : '{'! (BracedExpr | ~'}')* "};"!;

protected
BracedExpr : '{' (BracedExpr | ~'}')* "}";



FILE TO PARSE

Packet name{
Model name {
Method{
	Expressiontext;
	If/else and so on
};
};
};

As you can see the method is build up much like a method in both C++ 
or Java. What makes it difficult is the fact that I don't want to 
parse the method body text, I just want to consume it.

As you can see my Lex wont work, since it will react at both the 
Packet bracket as well as Model bracket. If I somehow could just make 
it start when it is a method I would be really happy.

Best regards,
Per




--- In antlr-interest at yahoogroups.com, "Anthony W Youngman" 
<Anthony.Youngman at E...> wrote:
> Hmmm ...
> 
> You should be able to declare that in the lexer.
> 
> method: lcurly method_body rcurly ;
> 
> protected method_body: name arguments expression ;
> 
> Do the curly brackets always indicate a method? If not, how do you 
tell
> whether it's the start of a method or the start of something else? 
If
> you can unabiguously identify the start of a method (eg it's 
flagged by
> an lcurly, which is the only use of an lcurly) then what you appear 
to
> want is pretty simple to achieve.
> 
> Solve the problem of how to identify "this is a method", and the 
rest of
> it should just fall into place. If the lexer can recognise "this is 
a
> method" then the lexer can handle methods for you. The parser will 
then
> build your tree for you the way you want it.
> 
> I think your original comment about ";" being used to terminate 
both IFs
> and methods is a red herring. Have you grasped why it's not a 
problem?
> If you have, then you should be able to work out the rest of the
> solution fairly easily. If you haven't, then you need to get that
> straight because it shows a fundamental misunderstanding of ANTLR. 
Don't
> forget, both the lexer and parser are recursive (they "drill 
down"), so
> context-dependent semantics shouldn't be a problem ...
> 
> Cheers,
> Wol
> 
> -----Original Message-----
> From: pwolleba [mailto:pwolleba at y...] 
> Sent: 08 October 2003 10:13
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Re: Skipping grammar
> 
> 
> Hello again
> 
> Thanks for helping me out Arnar, your solutions are really good! 
> Still I think I will have problem implementing them, much because I 
> have not given you enough information. 
> I need to make a method tag in my tree that contains information, 
> such as arguments into the method and such (see example).
> 
> 
> Method testMethod (Args,Args....){
> 	Expression text
> }
> 
> method
> |
> |--------Name
> |
> |--------Arguments
> |
> |-------- Expression
> 
> 
> If I solve this in my lexer I will not be able to create this node 
> tree, it will just be one node method that contains all the text. 
If 
> I drop the "method"tag in my METHOD_BODY tag, it will trigger at 
all 
> the other bracket in my document.
> Can I somehow make my lexer rule without the "method" tag, and then 
> make it just trigger when I need the method body?
> 
> best regards,
> Per
> 
> --- In antlr-interest at yahoogroups.com, "Arnar Birgisson" 
> <arnarb at o...> wrote:
> > Hello Per,
> > 
> > Perhaps you could make "method {" a single token in the parser, 
and 
> set
> > the nestingLevel variable to zero when that one matches.
> > 
> > The solution I posted uses the parser to eat up the stuff inside 
> {...},
> > another possibility might be to make the lexer do this:
> > 
> > METHOD_BODY
> >   : "method"! '{'! ( BracedExpr | ~'}' )* "};"!
> >   ;
> > 
> > protected
> > BracedExpr
> >   : '{' ( BracedExpr | ~'}' )* "}"
> >   ;
> > 
> > Overall, this might be a better solution. The token METHOD_BODY 
will
> > then contain as it's text whatever was inside the {...}.
> > 
> > As a side note, this is possible in ANTLR lexers because the are 
LL
> (k)
> > and can thus handle context-free grammars. Conventional lexers are
> > limited to regular grammars (represented by regular expressions 
> which
> > are equivalent to finite automata) and can f.x. not match nested 
> braces,
> > parenthesis etc. See
> > http://www.antlr.org/doc/lexer.html#Predicated-LL(k)_Lexing for 
more
> > information on this.
> > 
> > Arnar
> > 
> > ps. yes, the "i" should have been "nestingLevel" :o)
> > pps. again, I haven't tried this, it might not even be 
syntactically
> > correct
> > 
> > >>> pwolleba at y... 10/07/03 5:34 PM >>>
> > Hello again!
> > 
> > I am looking at your example Arnar, and I have some questions. 
> > When I wrote my example I should have included some more 
> information. 
> > The methode node is inside of another node called member (see 
> > example) and it can be more than one!
> > 
> > Member{
> > Methode {
> > 	Sometext;
> > };
> > };
> > 
> > This makes your example a bit more difficult to implement, since 
> the 
> > counter will start a zero at the first bracket, which is the 
member 
> > bracket. I must somehow be able to set nestingLevel = 0 from the 
> > parser when the method node is starting.
> > How do I do that?
> > 
> > best regards,
> > Per
> > 
> > Ps: I guess it should be nestingLevel++ instead of i++. Correct?
> > 
> > --- In antlr-interest at yahoogroups.com, "pwolleba" <pwolleba at y...> 
> > wrote:
> > > Yes that is correct, what is inside the bracket is a different 
> > > language which I at the moment don't want to write a parser for 
> (it 
> > > is pretty complex and big). Anyway I have just come back to 
work, 
> > and 
> > > I am going to try out your solution Arnar, hopefully it will 
> work! 
> > > 
> > > I just want to thank the community for trying to find a 
solution 
> to 
> > > my question, and I must say it came really fast!
> > > 
> > > Best regards,
> > > 
> > > Per
> > > 
> > > 
> > > --- In antlr-interest at yahoogroups.com, "Arnar Birgisson" 
> > > <arnarb at o...> wrote:
> > > > Hi..
> > > > 
> > > > In my earlier post, I understood Per differently. I think he 
> > want's 
> > > to
> > > > parse "method name{ <whatever> };" and just eat up 
<whatever>, 
> > > including
> > > > any nested braces, and put it in a variable, completely 
without 
> > > lexing
> > > > and/or parsing it. Per, is this correct?
> > > > 
> > > > The result of all this being a tree something like this:
> > > > 
> > > > METHOD
> > > >  |
> > > > name-body
> > > > 
> > > > where the body node contains anything inside the {..} as it's 
> > text.
> > > > 
> > > > Arnar
> > > > 
> > > > >>> Anthony.Youngman at E... 10/07/03 1:33 PM >>>
> > > > I think you're missing the point. Define a ; as SEMI. The way 
> I'd 
> > > do it
> > > > (and this is all pseudocode) is
> > > > 
> > > > if_statement: "IF" lcurly (method)* rcurly "ELSE" lcurly 
> (method)*
> > > > rcurly SEMI ;
> > > > method: blah_blah SEMI ;
> > > > 
> > > > That way, the lexer doesn't care whether ; is ending a method 
> or 
> > an 
> > > if
> > > > clause, and the parser won't get confused because when it 
hits a
> > > > right-curly it will be expecting an ELSE or a SEMI, and not a 
> > > method.
> > > > And if the ELSE is optional you just mark it as such so when 
> the 
> > > parser
> > > > hits the right-curly after the if, it's expecting an ELSE or 
a 
> > SEMI 
> > > and
> > > > nothing else.
> > > > 
> > > > Cheers,
> > > > Wol
> > > > 
> > > > -----Original Message-----
> > > > From: pwolleba [mailto:pwolleba at y...] 
> > > > Sent: 07 October 2003 08:19
> > > > To: antlr-interest at yahoogroups.com
> > > > Subject: [antlr-interest] Skipping grammar
> > > > 
> > > > 
> > > > I am pretty new to ANTLR so maybe this question is very 
> trivial, 
> > if 
> > > > so even better then maybe it is a simple solution to my 
> problem. 
> > > > Anyway I am struggling with writing a new parser in ANTLR to 
> > > replace 
> > > > and old implementation in Flex/Bison, this to make a product 
> that 
> > > are 
> > > > open for implementation from both C++ as well as Java. 
> > > > 
> > > > The parser will parse a language that we are using to build 
> > > > databases, and it must support this language 100% if to be 
> > > accepted. 
> > > > 
> > > > Here is the code cutting that I am struggling with.
> > > > 
> > > > method name{
> > > >   SomeText!()text[];
> > > >   if(a < b && b < c){
> > > >      SomeText()!()[];
> > > >   }
> > > >   else{
> > > >      SomeText()!()[];
> > > >   };
> > > > };
> > > > 
> > > > I am not interesting in the expression that is inside the 
name 
> > > > method, I just want ANTLR to grab the text for me, and put it 
> as 
> > a 
> > > > node inside the tree. The problem is the fact that the 
if/else 
> > > > statement is ending with a "};" which is the same token as 
the 
> > > method 
> > > > end token, and I have no guarantee that there could be more 
> that 
> > > one 
> > > > inside the method. A solution would be to make a counter that 
> > will 
> > > > increase for each "{" and decrease for each "}", then I would 
> > know 
> > > > when the method ends. To my frustration I don't know how I 
> should 
> > > > make such a counter in ANTRL, that still supports implement 
in 
> > both 
> > > > Java or C++ code.
> > > > I would be really really happy if someone could help me with 
> this 
> > > > problem!
> > > > 
> > > > Best reagards,
> > > > 
> > > > Per
> > > > 
> > > > 
> > > > 
> > > >  
> > > > 
> > > > Your use of Yahoo! Groups is subject to
> > > > http://docs.yahoo.com/info/terms/ 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > 
> > 
> 
**********************************************************************
> > > *************
> > > > 
> > > > This transmission is intended for the named recipient only. 
It 
> may
> > > > contain private and confidential information. If this has 
come 
> to 
> > > you in
> > > > error you must not act on anything disclosed in it, nor must 
> you 
> > > copy
> > > > it, modify it, disseminate it in any way, or show it to 
anyone. 
> > > Please
> > > > e-mail the sender to inform us of the transmission error or 
> > > telephone
> > > > ECA International immediately and delete the e-mail from your
> > > > information system.
> > > > 
> > > > Telephone numbers for ECA International offices are: Sydney 
+61 
> > (0)2
> > > > 9911 7799, Hong Kong + 852 2121 2388, London +44 (0)20 7351 
> 5000 
> > > and New
> > > > York +1 212 582 2333.
> > > > 
> > > > 
> > > 
> > 
> 
**********************************************************************
> > > *************
> > > > 
> > > > 
> > > >  
> > > > 
> > > > Your use of Yahoo! Groups is subject to
> > > > http://docs.yahoo.com/info/terms/
> > 
> > 
> >  
> > 
> > Your use of Yahoo! Groups is subject to
> > http://docs.yahoo.com/info/terms/
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/


 

Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/ 



 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list