[antlr-interest] Re: Skipping grammar

Wed Oct 8 09:03:09 PDT 2003

I haven't spent too much time analyzing this but in general it sounds good!

Monty

-----Original Message-----
From: Arnar Birgisson [mailto:arnarb at oddi.is] 
Sent: Wednesday, October 08, 2003 4:48 AM
To: antlr-interest at yahoogroups.com
Subject: RE: [antlr-interest] Re: Skipping grammar

Per: Anthony is on the money here.. do not stop posting here! I'm taking
a graduate course in compiler design and implementation and I choose (or
is it chose?) ANTLR as my tool for the term-project. I first saw ANTLR
no more than 4-5 weeks ago, so in fact you are doing me (and probably
others) a big favour in helping me learn and uderstand this myself.

Does "method {...}" always appear inside "model {...}", and does "model
{..}" always appear inside "packet {...}"? Can a packet contain another
packet, and can a model contain another model? If the answers are yes,
and no, respectively, the nesting level of the starting { for a method
is fixed and you can adapt the first solution we discussed.

If the grammar is more general, i.e. packets can contain other packets
etc. you can do more fancy stuff, like having a stack in your lexer, and
each time you see a "{", determine it's type by the keyword appearing
before it, and push the token-id for the corresponding closing "}" on
the stack. Then, upon seeing an } in the input, pop the type of the
stack and use it with "setType". That way, matching braces will have
matching token types which the parser can use. Example (pseudo-code):

class MyLexer extends Lexer;

tokens { OPEN_PACKET; CLOSE_PACKET; OPEN_MODEL; CLOSE_MODEL;
OPEN_METHOD; CLOSE_METHOD; }

{
	stack braces = new stack();
	int nextBrace = OPEN_PACKET;
	bool readingMethodBody = false;

	int getMatchingToken(int open) {
		if (open == OPEN_PACKET) return CLOSE_PACKET;
		if (open == OPEN_MODEL) reutrn CLOSE_MODE;
		// etc.
	}
}

PACKET: "PACKET" { netxtBrace = OPEN_PACKET; };
MODEL: "model" { nextBrace = OPEN_MODEL; };
METHOD: "method" { nextBrace = OPEN_METHOD; };

OPEN_BRACE /* except method opening braces */
	{ nextBrace != OPEN_METHOD }?
	: '{'
	{
		$setType(nextBrace);
		braces.push(getMatchingToken(nextBrace));
	}
	;

METHOD_BODY
	{ nextBrace == OPEN_METHOD }?
	: '{'! ( BracedExpr | ~'}' )* '}'!

protected
BracedExpr
  : '{' ( BracedExpr | ~'}' )* '}'
  ;

CLOSE_BRACE
	: '}' { $setType(braces.pop()); }
	;

/* plus other tokens you need. */

The token stream for this input

Packet name{
Model name {
Method{
 	Expressiontext;
	If/else and so on
};
};
};

would be something like:

[PACKET,"packet"]
[ID,"name"]
[OPEN_PACKET,"{"]
[MODEL,"model"]
[ID,"name"]
[OPEN_MODEL,"{"]
[METHOD,"method"]
[METHOD_BODY,"Expressiontext;\nIf/else and so on"]
              ^ note that the '{' and '}' were discarded with !
[SEMI,";"]
[CLOSE_MODEL,"}"]
[SEMI,";"]
[CLOSE_PACKET,"}"]
[SEMI,";"]

Do any of you gurus see a problem with this?

My next suggestion was using token multiplexing, and just now I recieved
Ric's post on that :o)

This allows you to add in f.x. tokens for "(param1,param2)" in between
"method" and it's opening brace.
If I'm still way of course: Do you have a formal BNF, EBNF or equivalent
grammar for your input? If so, it would help to see it.

Arnar

> -----Original Message-----
> From: Anthony W Youngman 
> [mailto:Anthony.Youngman at ECA-International.com] 
> Sent: 8. október 2003 11:10
> To: antlr-interest at yahoogroups.com
> Subject: RE: [antlr-interest] Re: Skipping grammar
> 
> 
> Firstly, DON'T stop posting here ... what matters is that you 
> show that
> you are trying to understand. What pisses people off is when 
> they think
> students are trying to skip homework ...
> 
> Secondly, the best way to learn is to teach. I probably don't 
> know much
> more than you, so you're helping me learn (plus all those 
> other lurkers
> who are watching and haven't dived in :-) I've posted similarly "dumb"
> posts and been grateful to everyone who's helped me - I owe it to the
> list to help when I can.
> 
> Okay. In your "file to parse" you have "packet", "model", 
> "method". Are
> these keywords? If so, your life is nice and simple. Similarly, if
> that's your nesting such that the method braces are always at 
> the third
> level down, it's equally as simple, just slightly different. So how do
> you handle that?
> 
> At the start of the lexer, you can declare an initialisation 
> code block.
> You want to declare a state enum with the values NULL, IN_PACKET,
> IN_MODEL, and IN_METHOD.
> 
> Your lexer will now contain something like this ...
> 
> packet: state == NULL   // this is a predicate
>    {
>       "PACKET" lcurly {state = IN_PACKET}  //set the state variable
>       model lcurly {state = NULL} // reset the state variable
>    } ;
> 
> model: state == IN_PACKET
>    {
>       "MODEL" lcurly {state = IN_MODEL}
>       method lcurly {state = IN_PACKET}
>    } ;
> 
> method: ... I'll leave it to you :-)
> 
> I'm sure I've messed up my ANTLR syntax good and proper, and other
> people will help you with how to do this, but this looks like the
> approach you want. Particularly, look at predicates and in-line code.
> And WATCH OUT !!! because predicates *can* get you into trouble with
> look-ahead. It looks like what you're doing is pretty simple and won't
> be any trouble, but it does happen ...
> 
> Cheers,
> Wol
> 
> -----Original Message-----
> From: pwolleba [mailto:pwolleba at yahoo.no] 
> Sent: 08 October 2003 11:36
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Re: Skipping grammar
> 
> 
> Hello!
> 
> I am starting to dominate this newsgroup with my problem, so I guess 
> I have to stop after this post!
> Anyway, I will paste some of my code from my parser and if you could 
> find where I am thinking wrong I would appreciate if you could 
> comment it!
> 
> 
> 
> PARSER
>  
> //---------------------------------------------- METHODE -------------
> methodeNode         : (METHOD^) declarationName methodeDecleration 
> methodBody;
> 
> methodeDecleration  : (LPAREN!) (methodArguments)? (RPAREN!)
>                       {#methodeDecleration=#
> ([ARGUMENTS,"Arguments"],#methodeDecleration);};
> 
> methodArguments     : (methodArgument (COMMA! methodArguments)?);
> 
> methodArgument      : declarationName;
> 
> methodBody          : (METHOD_BODY)
>                       {#methodBody=#
> ([EXPRESSION,"Expression"],#methodBody);};
> 
> 
> LEX
> 
> METHOD_BODY : '{'! (BracedExpr | ~'}')* "};"!;
> 
> protected
> BracedExpr : '{' (BracedExpr | ~'}')* "}";
> 
> 
> 
> FILE TO PARSE
> 
> Packet name{
> Model name {
> Method{
> 	Expressiontext;
> 	If/else and so on
> };
> };
> };
> 
> As you can see the method is build up much like a method in both C++ 
> or Java. What makes it difficult is the fact that I don't want to 
> parse the method body text, I just want to consume it.
> 
> As you can see my Lex wont work, since it will react at both the 
> Packet bracket as well as Model bracket. If I somehow could just make 
> it start when it is a method I would be really happy.
> 
> Best regards,
> Per
> 
> 
> 
> 
> --- In antlr-interest at yahoogroups.com, "Anthony W Youngman" 
> <Anthony.Youngman at E...> wrote:
> > Hmmm ...
> > 
> > You should be able to declare that in the lexer.
> > 
> > method: lcurly method_body rcurly ;
> > 
> > protected method_body: name arguments expression ;
> > 
> > Do the curly brackets always indicate a method? If not, how do you 
> tell
> > whether it's the start of a method or the start of something else? 
> If
> > you can unabiguously identify the start of a method (eg it's 
> flagged by
> > an lcurly, which is the only use of an lcurly) then what you appear 
> to
> > want is pretty simple to achieve.
> > 
> > Solve the problem of how to identify "this is a method", and the 
> rest of
> > it should just fall into place. If the lexer can recognise "this is 
> a
> > method" then the lexer can handle methods for you. The parser will 
> then
> > build your tree for you the way you want it.
> > 
> > I think your original comment about ";" being used to terminate 
> both IFs
> > and methods is a red herring. Have you grasped why it's not a 
> problem?
> > If you have, then you should be able to work out the rest of the
> > solution fairly easily. If you haven't, then you need to get that
> > straight because it shows a fundamental misunderstanding of ANTLR. 
> Don't
> > forget, both the lexer and parser are recursive (they "drill 
> down"), so
> > context-dependent semantics shouldn't be a problem ...
> > 
> > Cheers,
> > Wol
> > 
> > -----Original Message-----
> > From: pwolleba [mailto:pwolleba at y...] 
> > Sent: 08 October 2003 10:13
> > To: antlr-interest at yahoogroups.com
> > Subject: [antlr-interest] Re: Skipping grammar
> > 
> > 
> > Hello again
> > 
> > Thanks for helping me out Arnar, your solutions are really good! 
> > Still I think I will have problem implementing them, much because I 
> > have not given you enough information. 
> > I need to make a method tag in my tree that contains information, 
> > such as arguments into the method and such (see example).
> > 
> > 
> > Method testMethod (Args,Args....){
> > 	Expression text
> > }
> > 
> > method
> > |
> > |--------Name
> > |
> > |--------Arguments
> > |
> > |-------- Expression
> > 
> > 
> > If I solve this in my lexer I will not be able to create this node 
> > tree, it will just be one node method that contains all the text. 
> If 
> > I drop the "method"tag in my METHOD_BODY tag, it will trigger at 
> all 
> > the other bracket in my document.
> > Can I somehow make my lexer rule without the "method" tag, and then 
> > make it just trigger when I need the method body?
> > 
> > best regards,
> > Per
> > 
> > --- In antlr-interest at yahoogroups.com, "Arnar Birgisson" 
> > <arnarb at o...> wrote:
> > > Hello Per,
> > > 
> > > Perhaps you could make "method {" a single token in the parser, 
> and 
> > set
> > > the nestingLevel variable to zero when that one matches.
> > > 
> > > The solution I posted uses the parser to eat up the stuff inside 
> > {...},
> > > another possibility might be to make the lexer do this:
> > > 
> > > METHOD_BODY
> > >   : "method"! '{'! ( BracedExpr | ~'}' )* "};"!
> > >   ;
> > > 
> > > protected
> > > BracedExpr
> > >   : '{' ( BracedExpr | ~'}' )* "}"
> > >   ;
> > > 
> > > Overall, this might be a better solution. The token METHOD_BODY 
> will
> > > then contain as it's text whatever was inside the {...}.
> > > 
> > > As a side note, this is possible in ANTLR lexers because the are 
> LL
> > (k)
> > > and can thus handle context-free grammars. Conventional lexers are
> > > limited to regular grammars (represented by regular expressions 
> > which
> > > are equivalent to finite automata) and can f.x. not match nested 
> > braces,
> > > parenthesis etc. See
> > > http://www.antlr.org/doc/lexer.html#Predicated-LL(k)_Lexing for 
> more
> > > information on this.
> > > 
> > > Arnar
> > > 
> > > ps. yes, the "i" should have been "nestingLevel" :o)
> > > pps. again, I haven't tried this, it might not even be 
> syntactically
> > > correct
> > > 
> > > >>> pwolleba at y... 10/07/03 5:34 PM >>>
> > > Hello again!
> > > 
> > > I am looking at your example Arnar, and I have some questions. 
> > > When I wrote my example I should have included some more 
> > information. 
> > > The methode node is inside of another node called member (see 
> > > example) and it can be more than one!
> > > 
> > > Member{
> > > Methode {
> > > 	Sometext;
> > > };
> > > };
> > > 
> > > This makes your example a bit more difficult to implement, since 
> > the 
> > > counter will start a zero at the first bracket, which is the 
> member 
> > > bracket. I must somehow be able to set nestingLevel = 0 from the 
> > > parser when the method node is starting.
> > > How do I do that?
> > > 
> > > best regards,
> > > Per
> > > 
> > > Ps: I guess it should be nestingLevel++ instead of i++. Correct?
> > > 
> > > --- In antlr-interest at yahoogroups.com, "pwolleba" <pwolleba at y...> 
> > > wrote:
> > > > Yes that is correct, what is inside the bracket is a different 
> > > > language which I at the moment don't want to write a parser for 
> > (it 
> > > > is pretty complex and big). Anyway I have just come back to 
> work, 
> > > and 
> > > > I am going to try out your solution Arnar, hopefully it will 
> > work! 
> > > > 
> > > > I just want to thank the community for trying to find a 
> solution 
> > to 
> > > > my question, and I must say it came really fast!
> > > > 
> > > > Best regards,
> > > > 
> > > > Per
> > > > 
> > > > 
> > > > --- In antlr-interest at yahoogroups.com, "Arnar Birgisson" 
> > > > <arnarb at o...> wrote:
> > > > > Hi..
> > > > > 
> > > > > In my earlier post, I understood Per differently. I think he 
> > > want's 
> > > > to
> > > > > parse "method name{ <whatever> };" and just eat up 
> <whatever>, 
> > > > including
> > > > > any nested braces, and put it in a variable, completely 
> without 
> > > > lexing
> > > > > and/or parsing it. Per, is this correct?
> > > > > 
> > > > > The result of all this being a tree something like this:
> > > > > 
> > > > > METHOD
> > > > >  |
> > > > > name-body
> > > > > 
> > > > > where the body node contains anything inside the {..} as it's 
> > > text.
> > > > > 
> > > > > Arnar
> > > > > 
> > > > > >>> Anthony.Youngman at E... 10/07/03 1:33 PM >>>
> > > > > I think you're missing the point. Define a ; as SEMI. The way 
> > I'd 
> > > > do it
> > > > > (and this is all pseudocode) is
> > > > > 
> > > > > if_statement: "IF" lcurly (method)* rcurly "ELSE" lcurly 
> > (method)*
> > > > > rcurly SEMI ;
> > > > > method: blah_blah SEMI ;
> > > > > 
> > > > > That way, the lexer doesn't care whether ; is ending a method 
> > or 
> > > an 
> > > > if
> > > > > clause, and the parser won't get confused because when it 
> hits a
> > > > > right-curly it will be expecting an ELSE or a SEMI, and not a 
> > > > method.
> > > > > And if the ELSE is optional you just mark it as such so when 
> > the 
> > > > parser
> > > > > hits the right-curly after the if, it's expecting an ELSE or 
> a 
> > > SEMI 
> > > > and
> > > > > nothing else.
> > > > > 
> > > > > Cheers,
> > > > > Wol
> > > > > 
> > > > > -----Original Message-----
> > > > > From: pwolleba [mailto:pwolleba at y...] 
> > > > > Sent: 07 October 2003 08:19
> > > > > To: antlr-interest at yahoogroups.com
> > > > > Subject: [antlr-interest] Skipping grammar
> > > > > 
> > > > > 
> > > > > I am pretty new to ANTLR so maybe this question is very 
> > trivial, 
> > > if 
> > > > > so even better then maybe it is a simple solution to my 
> > problem. 
> > > > > Anyway I am struggling with writing a new parser in ANTLR to 
> > > > replace 
> > > > > and old implementation in Flex/Bison, this to make a product 
> > that 
> > > > are 
> > > > > open for implementation from both C++ as well as Java. 
> > > > > 
> > > > > The parser will parse a language that we are using to build 
> > > > > databases, and it must support this language 100% if to be 
> > > > accepted. 
> > > > > 
> > > > > Here is the code cutting that I am struggling with.
> > > > > 
> > > > > method name{
> > > > >   SomeText!()text[];
> > > > >   if(a < b && b < c){
> > > > >      SomeText()!()[];
> > > > >   }
> > > > >   else{
> > > > >      SomeText()!()[];
> > > > >   };
> > > > > };
> > > > > 
> > > > > I am not interesting in the expression that is inside the 
> name 
> > > > > method, I just want ANTLR to grab the text for me, and put it 
> > as 
> > > a 
> > > > > node inside the tree. The problem is the fact that the 
> if/else 
> > > > > statement is ending with a "};" which is the same token as 
> the 
> > > > method 
> > > > > end token, and I have no guarantee that there could be more 
> > that 
> > > > one 
> > > > > inside the method. A solution would be to make a counter that 
> > > will 
> > > > > increase for each "{" and decrease for each "}", then I would 
> > > know 
> > > > > when the method ends. To my frustration I don't know how I 
> > should 
> > > > > make such a counter in ANTRL, that still supports implement 
> in 
> > > both 
> > > > > Java or C++ code.
> > > > > I would be really really happy if someone could help me with 
> > this 
> > > > > problem!
> > > > > 
> > > > > Best reagards,
> > > > > 
> > > > > Per
> > > > > 
> > > > > 
> > > > > 
> > > > >  
> > > > > 
> > > > > Your use of Yahoo! Groups is subject to
> > > > > http://docs.yahoo.com/info/terms/ 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
> > 
> **********************************************************************
> > > > *************
> > > > > 
> > > > > This transmission is intended for the named recipient only. 
> It 
> > may
> > > > > contain private and confidential information. If this has 
> come 
> > to 
> > > > you in
> > > > > error you must not act on anything disclosed in it, nor must 
> > you 
> > > > copy
> > > > > it, modify it, disseminate it in any way, or show it to 
> anyone. 
> > > > Please
> > > > > e-mail the sender to inform us of the transmission error or 
> > > > telephone
> > > > > ECA International immediately and delete the e-mail from your
> > > > > information system.
> > > > > 
> > > > > Telephone numbers for ECA International offices are: Sydney 
> +61 
> > > (0)2
> > > > > 9911 7799, Hong Kong + 852 2121 2388, London +44 (0)20 7351 
> > 5000 
> > > > and New
> > > > > York +1 212 582 2333.
> > > > > 
> > > > > 
> > > > 
> > > 
> > 
> **********************************************************************
> > > > *************
> > > > > 
> > > > > 
> > > > >  
> > > > > 
> > > > > Your use of Yahoo! Groups is subject to
> > > > > http://docs.yahoo.com/info/terms/
> > > 
> > > 
> > >  
> > > 
> > > Your use of Yahoo! Groups is subject to
> > > http://docs.yahoo.com/info/terms/
> > 
> > 
> >  
> > 
> > Your use of Yahoo! Groups is subject to
> > http://docs.yahoo.com/info/terms/
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/ 
> 
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to 
> http://docs.yahoo.com/info/terms/ 
> 
> 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/