[antlr-interest] Re: Skipping grammar

pwolleba pwolleba at yahoo.no
Wed Oct 8 05:07:17 PDT 2003


Here I am again! :o)

After reading the multi Lexer solution I do think that is the 
solution to my problem. I actually didn't know that it was possible 
to implement more than one lexer (it is just brilliant!), I guess it 
takes more than a couple of days to learn this tool. :o)
Anyway I will try it out and see if it solves my problem, and post 
the result on this board. I am a bit curious if it is still possible 
to make a C++ parser after I have implemented the multiplexer, I had 
hoped I could use the same parser for both platforms. 

I just want to thank you all for trying to help me out here, I really 
really appreciate it!


Best regards,
Per




--- In antlr-interest at yahoogroups.com, "Arnar Birgisson" 
<arnarb at o...> wrote:
> Per: Anthony is on the money here.. do not stop posting here! I'm 
taking
> a graduate course in compiler design and implementation and I 
choose (or
> is it chose?) ANTLR as my tool for the term-project. I first saw 
ANTLR
> no more than 4-5 weeks ago, so in fact you are doing me (and 
probably
> others) a big favour in helping me learn and uderstand this myself.
> 
> Does "method {...}" always appear inside "model {...}", and 
does "model
> {..}" always appear inside "packet {...}"? Can a packet contain 
another
> packet, and can a model contain another model? If the answers are 
yes,
> and no, respectively, the nesting level of the starting { for a 
method
> is fixed and you can adapt the first solution we discussed.
> 
> If the grammar is more general, i.e. packets can contain other 
packets
> etc. you can do more fancy stuff, like having a stack in your 
lexer, and
> each time you see a "{", determine it's type by the keyword 
appearing
> before it, and push the token-id for the corresponding closing "}" 
on
> the stack. Then, upon seeing an } in the input, pop the type of the
> stack and use it with "setType". That way, matching braces will have
> matching token types which the parser can use. Example (pseudo-
code):
> 
> class MyLexer extends Lexer;
> 
> tokens { OPEN_PACKET; CLOSE_PACKET; OPEN_MODEL; CLOSE_MODEL;
> OPEN_METHOD; CLOSE_METHOD; }
> 
> {
> 	stack braces = new stack();
> 	int nextBrace = OPEN_PACKET;
> 	bool readingMethodBody = false;
> 
> 	int getMatchingToken(int open) {
> 		if (open == OPEN_PACKET) return CLOSE_PACKET;
> 		if (open == OPEN_MODEL) reutrn CLOSE_MODE;
> 		// etc.
> 	}
> }
> 
> PACKET: "PACKET" { netxtBrace = OPEN_PACKET; };
> MODEL: "model" { nextBrace = OPEN_MODEL; };
> METHOD: "method" { nextBrace = OPEN_METHOD; };
> 
> OPEN_BRACE /* except method opening braces */
> 	{ nextBrace != OPEN_METHOD }?
> 	: '{'
> 	{
> 		$setType(nextBrace);
> 		braces.push(getMatchingToken(nextBrace));
> 	}
> 	;
> 
> METHOD_BODY
> 	{ nextBrace == OPEN_METHOD }?
> 	: '{'! ( BracedExpr | ~'}' )* '}'!
> 
> protected
> BracedExpr
>   : '{' ( BracedExpr | ~'}' )* '}'
>   ;
> 
> CLOSE_BRACE
> 	: '}' { $setType(braces.pop()); }
> 	;
> 
> /* plus other tokens you need. */
> 
> The token stream for this input
> 
> Packet name{
> Model name {
> Method{
>  	Expressiontext;
> 	If/else and so on
> };
> };
> };
> 
> would be something like:
> 
> [PACKET,"packet"]
> [ID,"name"]
> [OPEN_PACKET,"{"]
> [MODEL,"model"]
> [ID,"name"]
> [OPEN_MODEL,"{"]
> [METHOD,"method"]
> [METHOD_BODY,"Expressiontext;\nIf/else and so on"]
>               ^ note that the '{' and '}' were discarded with !
> [SEMI,";"]
> [CLOSE_MODEL,"}"]
> [SEMI,";"]
> [CLOSE_PACKET,"}"]
> [SEMI,";"]
> 
> Do any of you gurus see a problem with this?
> 
> My next suggestion was using token multiplexing, and just now I 
recieved
> Ric's post on that :o)
> 
> This allows you to add in f.x. tokens for "(param1,param2)" in 
between
> "method" and it's opening brace.
> If I'm still way of course: Do you have a formal BNF, EBNF or 
equivalent
> grammar for your input? If so, it would help to see it.
> 
> Arnar
> 
> > -----Original Message-----
> > From: Anthony W Youngman 
> > [mailto:Anthony.Youngman at E...] 
> > Sent: 8. október 2003 11:10
> > To: antlr-interest at yahoogroups.com
> > Subject: RE: [antlr-interest] Re: Skipping grammar
> > 
> > 
> > Firstly, DON'T stop posting here ... what matters is that you 
> > show that
> > you are trying to understand. What pisses people off is when 
> > they think
> > students are trying to skip homework ...
> > 
> > Secondly, the best way to learn is to teach. I probably don't 
> > know much
> > more than you, so you're helping me learn (plus all those 
> > other lurkers
> > who are watching and haven't dived in :-) I've posted 
similarly "dumb"
> > posts and been grateful to everyone who's helped me - I owe it to 
the
> > list to help when I can.
> > 
> > Okay. In your "file to parse" you have "packet", "model", 
> > "method". Are
> > these keywords? If so, your life is nice and simple. Similarly, if
> > that's your nesting such that the method braces are always at 
> > the third
> > level down, it's equally as simple, just slightly different. So 
how do
> > you handle that?
> > 
> > At the start of the lexer, you can declare an initialisation 
> > code block.
> > You want to declare a state enum with the values NULL, IN_PACKET,
> > IN_MODEL, and IN_METHOD.
> > 
> > Your lexer will now contain something like this ...
> > 
> > packet: state == NULL   // this is a predicate
> >    {
> >       "PACKET" lcurly {state = IN_PACKET}  //set the state 
variable
> >       model lcurly {state = NULL} // reset the state variable
> >    } ;
> > 
> > model: state == IN_PACKET
> >    {
> >       "MODEL" lcurly {state = IN_MODEL}
> >       method lcurly {state = IN_PACKET}
> >    } ;
> > 
> > method: ... I'll leave it to you :-)
> > 
> > I'm sure I've messed up my ANTLR syntax good and proper, and other
> > people will help you with how to do this, but this looks like the
> > approach you want. Particularly, look at predicates and in-line 
code.
> > And WATCH OUT !!! because predicates *can* get you into trouble 
with
> > look-ahead. It looks like what you're doing is pretty simple and 
won't
> > be any trouble, but it does happen ...
> > 
> > Cheers,
> > Wol
> > 
> > -----Original Message-----
> > From: pwolleba [mailto:pwolleba at y...] 
> > Sent: 08 October 2003 11:36
> > To: antlr-interest at yahoogroups.com
> > Subject: [antlr-interest] Re: Skipping grammar
> > 
> > 
> > Hello!
> > 
> > I am starting to dominate this newsgroup with my problem, so I 
guess 
> > I have to stop after this post!
> > Anyway, I will paste some of my code from my parser and if you 
could 
> > find where I am thinking wrong I would appreciate if you could 
> > comment it!
> > 
> > 
> > 
> > PARSER
> >  
> > //---------------------------------------------- METHODE ---------
----
> > methodeNode         : (METHOD^) declarationName 
methodeDecleration 
> > methodBody;
> > 
> > methodeDecleration  : (LPAREN!) (methodArguments)? (RPAREN!)
> >                       {#methodeDecleration=#
> > ([ARGUMENTS,"Arguments"],#methodeDecleration);};
> > 
> > methodArguments     : (methodArgument (COMMA! methodArguments)?);
> > 
> > methodArgument      : declarationName;
> > 
> > methodBody          : (METHOD_BODY)
> >                       {#methodBody=#
> > ([EXPRESSION,"Expression"],#methodBody);};
> > 
> > 
> > LEX
> > 
> > METHOD_BODY : '{'! (BracedExpr | ~'}')* "};"!;
> > 
> > protected
> > BracedExpr : '{' (BracedExpr | ~'}')* "}";
> > 
> > 
> > 
> > FILE TO PARSE
> > 
> > Packet name{
> > Model name {
> > Method{
> > 	Expressiontext;
> > 	If/else and so on
> > };
> > };
> > };
> > 
> > As you can see the method is build up much like a method in both 
C++ 
> > or Java. What makes it difficult is the fact that I don't want to 
> > parse the method body text, I just want to consume it.
> > 
> > As you can see my Lex wont work, since it will react at both the 
> > Packet bracket as well as Model bracket. If I somehow could just 
make 
> > it start when it is a method I would be really happy.
> > 
> > Best regards,
> > Per
> > 
> > 
> > 
> > 
> > --- In antlr-interest at yahoogroups.com, "Anthony W Youngman" 
> > <Anthony.Youngman at E...> wrote:
> > > Hmmm ...
> > > 
> > > You should be able to declare that in the lexer.
> > > 
> > > method: lcurly method_body rcurly ;
> > > 
> > > protected method_body: name arguments expression ;
> > > 
> > > Do the curly brackets always indicate a method? If not, how do 
you 
> > tell
> > > whether it's the start of a method or the start of something 
else? 
> > If
> > > you can unabiguously identify the start of a method (eg it's 
> > flagged by
> > > an lcurly, which is the only use of an lcurly) then what you 
appear 
> > to
> > > want is pretty simple to achieve.
> > > 
> > > Solve the problem of how to identify "this is a method", and 
the 
> > rest of
> > > it should just fall into place. If the lexer can 
recognise "this is 
> > a
> > > method" then the lexer can handle methods for you. The parser 
will 
> > then
> > > build your tree for you the way you want it.
> > > 
> > > I think your original comment about ";" being used to terminate 
> > both IFs
> > > and methods is a red herring. Have you grasped why it's not a 
> > problem?
> > > If you have, then you should be able to work out the rest of the
> > > solution fairly easily. If you haven't, then you need to get 
that
> > > straight because it shows a fundamental misunderstanding of 
ANTLR. 
> > Don't
> > > forget, both the lexer and parser are recursive (they "drill 
> > down"), so
> > > context-dependent semantics shouldn't be a problem ...
> > > 
> > > Cheers,
> > > Wol
> > > 
> > > -----Original Message-----
> > > From: pwolleba [mailto:pwolleba at y...] 
> > > Sent: 08 October 2003 10:13
> > > To: antlr-interest at yahoogroups.com
> > > Subject: [antlr-interest] Re: Skipping grammar
> > > 
> > > 
> > > Hello again
> > > 
> > > Thanks for helping me out Arnar, your solutions are really 
good! 
> > > Still I think I will have problem implementing them, much 
because I 
> > > have not given you enough information. 
> > > I need to make a method tag in my tree that contains 
information, 
> > > such as arguments into the method and such (see example).
> > > 
> > > 
> > > Method testMethod (Args,Args....){
> > > 	Expression text
> > > }
> > > 
> > > method
> > > |
> > > |--------Name
> > > |
> > > |--------Arguments
> > > |
> > > |-------- Expression
> > > 
> > > 
> > > If I solve this in my lexer I will not be able to create this 
node 
> > > tree, it will just be one node method that contains all the 
text. 
> > If 
> > > I drop the "method"tag in my METHOD_BODY tag, it will trigger 
at 
> > all 
> > > the other bracket in my document.
> > > Can I somehow make my lexer rule without the "method" tag, and 
then 
> > > make it just trigger when I need the method body?
> > > 
> > > best regards,
> > > Per
> > > 
> > > --- In antlr-interest at yahoogroups.com, "Arnar Birgisson" 
> > > <arnarb at o...> wrote:
> > > > Hello Per,
> > > > 
> > > > Perhaps you could make "method {" a single token in the 
parser, 
> > and 
> > > set
> > > > the nestingLevel variable to zero when that one matches.
> > > > 
> > > > The solution I posted uses the parser to eat up the stuff 
inside 
> > > {...},
> > > > another possibility might be to make the lexer do this:
> > > > 
> > > > METHOD_BODY
> > > >   : "method"! '{'! ( BracedExpr | ~'}' )* "};"!
> > > >   ;
> > > > 
> > > > protected
> > > > BracedExpr
> > > >   : '{' ( BracedExpr | ~'}' )* "}"
> > > >   ;
> > > > 
> > > > Overall, this might be a better solution. The token 
METHOD_BODY 
> > will
> > > > then contain as it's text whatever was inside the {...}.
> > > > 
> > > > As a side note, this is possible in ANTLR lexers because the 
are 
> > LL
> > > (k)
> > > > and can thus handle context-free grammars. Conventional 
lexers are
> > > > limited to regular grammars (represented by regular 
expressions 
> > > which
> > > > are equivalent to finite automata) and can f.x. not match 
nested 
> > > braces,
> > > > parenthesis etc. See
> > > > http://www.antlr.org/doc/lexer.html#Predicated-LL(k)_Lexing 
for 
> > more
> > > > information on this.
> > > > 
> > > > Arnar
> > > > 
> > > > ps. yes, the "i" should have been "nestingLevel" :o)
> > > > pps. again, I haven't tried this, it might not even be 
> > syntactically
> > > > correct
> > > > 
> > > > >>> pwolleba at y... 10/07/03 5:34 PM >>>
> > > > Hello again!
> > > > 
> > > > I am looking at your example Arnar, and I have some 
questions. 
> > > > When I wrote my example I should have included some more 
> > > information. 
> > > > The methode node is inside of another node called member (see 
> > > > example) and it can be more than one!
> > > > 
> > > > Member{
> > > > Methode {
> > > > 	Sometext;
> > > > };
> > > > };
> > > > 
> > > > This makes your example a bit more difficult to implement, 
since 
> > > the 
> > > > counter will start a zero at the first bracket, which is the 
> > member 
> > > > bracket. I must somehow be able to set nestingLevel = 0 from 
the 
> > > > parser when the method node is starting.
> > > > How do I do that?
> > > > 
> > > > best regards,
> > > > Per
> > > > 
> > > > Ps: I guess it should be nestingLevel++ instead of i++. 
Correct?
> > > > 
> > > > --- In antlr-interest at yahoogroups.com, "pwolleba" 
<pwolleba at y...> 
> > > > wrote:
> > > > > Yes that is correct, what is inside the bracket is a 
different 
> > > > > language which I at the moment don't want to write a parser 
for 
> > > (it 
> > > > > is pretty complex and big). Anyway I have just come back to 
> > work, 
> > > > and 
> > > > > I am going to try out your solution Arnar, hopefully it 
will 
> > > work! 
> > > > > 
> > > > > I just want to thank the community for trying to find a 
> > solution 
> > > to 
> > > > > my question, and I must say it came really fast!
> > > > > 
> > > > > Best regards,
> > > > > 
> > > > > Per
> > > > > 
> > > > > 
> > > > > --- In antlr-interest at yahoogroups.com, "Arnar Birgisson" 
> > > > > <arnarb at o...> wrote:
> > > > > > Hi..
> > > > > > 
> > > > > > In my earlier post, I understood Per differently. I think 
he 
> > > > want's 
> > > > > to
> > > > > > parse "method name{ <whatever> };" and just eat up 
> > <whatever>, 
> > > > > including
> > > > > > any nested braces, and put it in a variable, completely 
> > without 
> > > > > lexing
> > > > > > and/or parsing it. Per, is this correct?
> > > > > > 
> > > > > > The result of all this being a tree something like this:
> > > > > > 
> > > > > > METHOD
> > > > > >  |
> > > > > > name-body
> > > > > > 
> > > > > > where the body node contains anything inside the {..} as 
it's 
> > > > text.
> > > > > > 
> > > > > > Arnar
> > > > > > 
> > > > > > >>> Anthony.Youngman at E... 10/07/03 1:33 PM >>>
> > > > > > I think you're missing the point. Define a ; as SEMI. The 
way 
> > > I'd 
> > > > > do it
> > > > > > (and this is all pseudocode) is
> > > > > > 
> > > > > > if_statement: "IF" lcurly (method)* rcurly "ELSE" lcurly 
> > > (method)*
> > > > > > rcurly SEMI ;
> > > > > > method: blah_blah SEMI ;
> > > > > > 
> > > > > > That way, the lexer doesn't care whether ; is ending a 
method 
> > > or 
> > > > an 
> > > > > if
> > > > > > clause, and the parser won't get confused because when it 
> > hits a
> > > > > > right-curly it will be expecting an ELSE or a SEMI, and 
not a 
> > > > > method.
> > > > > > And if the ELSE is optional you just mark it as such so 
when 
> > > the 
> > > > > parser
> > > > > > hits the right-curly after the if, it's expecting an ELSE 
or 
> > a 
> > > > SEMI 
> > > > > and
> > > > > > nothing else.
> > > > > > 
> > > > > > Cheers,
> > > > > > Wol
> > > > > > 
> > > > > > -----Original Message-----
> > > > > > From: pwolleba [mailto:pwolleba at y...] 
> > > > > > Sent: 07 October 2003 08:19
> > > > > > To: antlr-interest at yahoogroups.com
> > > > > > Subject: [antlr-interest] Skipping grammar
> > > > > > 
> > > > > > 
> > > > > > I am pretty new to ANTLR so maybe this question is very 
> > > trivial, 
> > > > if 
> > > > > > so even better then maybe it is a simple solution to my 
> > > problem. 
> > > > > > Anyway I am struggling with writing a new parser in ANTLR 
to 
> > > > > replace 
> > > > > > and old implementation in Flex/Bison, this to make a 
product 
> > > that 
> > > > > are 
> > > > > > open for implementation from both C++ as well as Java. 
> > > > > > 
> > > > > > The parser will parse a language that we are using to 
build 
> > > > > > databases, and it must support this language 100% if to 
be 
> > > > > accepted. 
> > > > > > 
> > > > > > Here is the code cutting that I am struggling with.
> > > > > > 
> > > > > > method name{
> > > > > >   SomeText!()text[];
> > > > > >   if(a < b && b < c){
> > > > > >      SomeText()!()[];
> > > > > >   }
> > > > > >   else{
> > > > > >      SomeText()!()[];
> > > > > >   };
> > > > > > };
> > > > > > 
> > > > > > I am not interesting in the expression that is inside the 
> > name 
> > > > > > method, I just want ANTLR to grab the text for me, and 
put it 
> > > as 
> > > > a 
> > > > > > node inside the tree. The problem is the fact that the 
> > if/else 
> > > > > > statement is ending with a "};" which is the same token 
as 
> > the 
> > > > > method 
> > > > > > end token, and I have no guarantee that there could be 
more 
> > > that 
> > > > > one 
> > > > > > inside the method. A solution would be to make a counter 
that 
> > > > will 
> > > > > > increase for each "{" and decrease for each "}", then I 
would 
> > > > know 
> > > > > > when the method ends. To my frustration I don't know how 
I 
> > > should 
> > > > > > make such a counter in ANTRL, that still supports 
implement 
> > in 
> > > > both 
> > > > > > Java or C++ code.
> > > > > > I would be really really happy if someone could help me 
with 
> > > this 
> > > > > > problem!
> > > > > > 
> > > > > > Best reagards,
> > > > > > 
> > > > > > Per
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > > Your use of Yahoo! Groups is subject to
> > > > > > http://docs.yahoo.com/info/terms/ 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
**********************************************************************
> > > > > *************
> > > > > > 
> > > > > > This transmission is intended for the named recipient 
only. 
> > It 
> > > may
> > > > > > contain private and confidential information. If this has 
> > come 
> > > to 
> > > > > you in
> > > > > > error you must not act on anything disclosed in it, nor 
must 
> > > you 
> > > > > copy
> > > > > > it, modify it, disseminate it in any way, or show it to 
> > anyone. 
> > > > > Please
> > > > > > e-mail the sender to inform us of the transmission error 
or 
> > > > > telephone
> > > > > > ECA International immediately and delete the e-mail from 
your
> > > > > > information system.
> > > > > > 
> > > > > > Telephone numbers for ECA International offices are: 
Sydney 
> > +61 
> > > > (0)2
> > > > > > 9911 7799, Hong Kong + 852 2121 2388, London +44 (0)20 
7351 
> > > 5000 
> > > > > and New
> > > > > > York +1 212 582 2333.
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
**********************************************************************
> > > > > *************
> > > > > > 
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > > Your use of Yahoo! Groups is subject to
> > > > > > http://docs.yahoo.com/info/terms/
> > > > 
> > > > 
> > > >  
> > > > 
> > > > Your use of Yahoo! Groups is subject to
> > > > http://docs.yahoo.com/info/terms/
> > > 
> > > 
> > >  
> > > 
> > > Your use of Yahoo! Groups is subject to
> > > http://docs.yahoo.com/info/terms/
> > 
> > 
> >  
> > 
> > Your use of Yahoo! Groups is subject to
> > http://docs.yahoo.com/info/terms/ 
> > 
> > 
> > 
> >  
> > 
> > Your use of Yahoo! Groups is subject to 
> > http://docs.yahoo.com/info/terms/ 
> > 
> >


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list