[antlr-interest] Re: on parsers look and feel

Cristian Amitroaie cristian at amiq.ro
Thu Nov 27 04:16:39 PST 2003


Hi Tom,

> Is that going to work? I think you'll need to have a rule that
> matches those particular characters. Literal matching only happens
> to the match of rules, not to every character on the input stream.

Nothing changes in the way lexer looks. Of course the lexer will have a rule
---
EQUALS: '=';
---

and since it imports parser's vocabulary and tests the literals on this rule 
it will return the right token for "=".

Unfortunatelly you'll have in the tokens table something like:

---
"="=6
";"=7
EQ=8
SEMI=9
---

If you assign a token in the parser to "="

---
tokens {
   ASSIGN="=";
};

than the token table (by Lexer which imports parsers vocabulary) will look 
like:
---
ASSIGN="="=6
";"=7
EQ=8
SEMI=9
---

It's kind of strange indeed...

> Having a rule which matched the characters used as literalswould
> mean you'd have to maintain it. So you probably want to add a rule
> that matches all characters that aren't matched in other rules,
> probably just all non alpha-numeric, less maintenance, though you'd
> have to have to have your own literal matching that threw an
> exception if there was no match, or you'd have weird invalid tokens
> coming through. Then there's the issue of making that informative.

I was thinking about leaving the lexer as it is. Nothing changes there. If 
there is no rule ASSIGN, than the "=" token in the parser vocabulary will 
never show up.

> And when something breaks in your parser with handling those it
> could be wierdness related to the literals handling.

This is what I am interested in. What if something breaks in the parser?
Missing tokens are the only thing that comes to my mind, and missing "=" is 
clearly reported.... Unless I am missing something here...

> True you don't have to remember the token name, but
> only in the parser, so you've got this weird stuff going on in the
> parser, somehow these tokens don't really have labels, you can't do
> this your tree parsers

1) in your tree parsers you may carry on "=" same way as you would carry on 
any literal ("try", "class"...)
2) if you want to create new nodes using the "=" token in the parser, 
apparently you can specify the token for it in the tokens section:

---
tokens {
    ASSIGN="=";
}
---

And it works, even if you get an warning at compile time:

---
ANTLR Parser Generator   Version 2.7.2   1989-2003 jGuru.com
LookWalker.g:14:12: warning:Redefinition of token in tokens {...}: ASSIGN
---

3) anyway, creating nodes like 'a new for loop' in your AST means using 
'LITERAL_for' which is not a very short name to type. 

Ideally one could build a node specifying the string (at least in case of 
literals). Something like #["class"], not limited to #[LITERAL_class] or 
#[LITERAL_class, "class"].

>  But I guess it's partly a matter of taste,
> but I'd be worried about how maintainable any code like that is, for
> you and then more so for others.

This is my concern as well and I guess I'll have to look deeper into this 
before starting (if I'll start) to use this approach.

Thanks,
Cristian

On Thursday 27 November 2003 03:02, Thomas Brandon wrote:
> Is that going to work? I think you'll need to have a rule that
> matches those particular characters. Literal matching only happens
> to the match of rules, not to every character on the input stream.
> Having a rule which matched the characters used as literalswould
> mean you'd have to maintain it. So you probably want to add a rule
> that matches all characters that aren't matched in other rules,
> probably just all non alpha-numeric, less maintenance, though you'd
> have to have to have your own literal matching that threw an
> exception if there was no match, or you'd have weird invalid tokens
> coming through. Then there's the issue of making that informative.
> And when something breaks in your parser with handling those it
> could be wierdness related to the literals handling.
>
> As to whether it's better, even without the implementation problems,
> I don't know. True you don't have to remember the token name, but
> only in the parser, so you've got this weird stuff going on in the
> parser, somehow these tokens don't really have labels, you can't do
> this your tree parsers... But I guess it's partly a matter of taste,
> but I'd be worried about how maintainable any code like that is, for
> you and then more so for others.
>
> Tom.
> --- In antlr-interest at yahoogroups.com, Cristian Amitroaie
>
> <cristian at a...> wrote:
> > Hello guys,
> >
> > Case:
> >    o sometimes I kind of foreget what name I gave to the "=" token
>
> from the
>
> > Lexer (EQ/EQUAL/EQUALS/ASSIGN) when I want to add a new rule to a
>
> parser.
>
> >    o sometimes I get bored to write LCURLEY instead of "{" or '{'
> >    o sometimes it's hard for me to follow rules full of SEMI, LCURL
>
> (E)?Y,
>
> > LBRACK, LPARENS and so on
> >
> > For example, I would like to see my parser rules look like:
> >
> > assign:
> >         ID "="^ ID ";"!
> >     ;
> >
> > I browsed throw the documentation/big examples, yet I couldn't
>
> find any
>
> > similar approach as a guideline or something.
> >
> > Yet, it doesn't seem impossible (see the attached files).
> >
> > Although the parsers token table won't have a token type attached
>
> to "=" (I
>
> > asssume LITERAL_= is not a valid id in almost any language), it
>
> reserves a
>
> > number for it. Now importing the parsers vocabulary in the lexer,
>
> and leaving
>
> > testLiterals true (default value) it seems that the lexer's token
>
> table keeps
>
> > the number from the parser for "=" and adds to it a token type
> > (EQUAL/EQ/ASSIGN, oops I don't remember).
> >
> > Are there any disadvantages/risks related to this approach?
> >
> > Of course, in the parser, if somebody likes to build new AST nodes
>
> using "=",
>
> > it may attach it a token type in the tokens section and use it...
> >
> > Either I am a maniac, or the parser gramar looks much clearer to
>
> me...
>
> > And the walkers import the lexers vocabulary (see the attached
>
> files).
>
> > Or it's just a matter of taste?
> > Cristian
> >
> >
> >
> >
> > ana = mihai;
> > mihai = maria;
> > ana = maria;
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list