[antlr-interest] Re: Switching between different lexers from with in parser

Helmut Neukirchen neukirchen at itm.mu-luebeck.de
Fri Jan 25 02:45:46 PST 2002


antlr-interest at yahoogroups.com wrote:
>    Date: Wed, 23 Jan 2002 09:01:45 -0800
>    From: mzukowski at bco.com
> Subject: RE: Switching between different lexers from with in parser
> 
> > is there a possibility to switch in a clean and deterministic way
> > beetween different lexers from within the *parser*?
> 
> Not really, since you always have already consumed at least k tokens.  To
> switch from the parser would require some sort of rewind mechanism on your
> input stream and then some synchronization from within the parser.  The real
> problem is that where you are in the parser is dependent on what was in the
> lookahead, and now you've just switched out from under it.  Doing this
> within, say, a series of alternates could really confuse the parser.

I thought so. (Moreover, there must be some features left for ANTLR 3.0 ;-)

> > In contrast to the Javadoc example, where switching between different
> > lexers is done from within the lexers, I have to deal with a language
> > where this is only possible from within the parser.
 
> Post a couple of worst case examples so we have something to chew on.  If it
> is mostly a problem with different sets of literals then it may be easy to
> solve.  How different are the lexers?  If the tokens are always broken up at
> the same boundaries then there may be a way to have multiple types to tokens
> explicitly checked with semantic predicates.
 
For those who are are more interested in this problem: 

The language is MSC-2000 (ITU-T recommendation Z.120) which is a 
specification language (like UML's Sequence Diagramms) which has 
no pre-defined language for describing data. Instead it allows to 
plug-in at *parse time* the data language (describing valid expressions
evaluating to some data) of any other arbitrary language.

An example:

mscdocument SampleDocumentUsingCplusplusAsExternalDataLanguage;
language Cplusplus;
data "#include <math.h>";
inst calling_party variables "flag" : "bool";
inst called_party variables "u", "s", "t" : "float";
utilities reference ref1 reference ref2 reference ref1


msc connection;
inst calling_party;
inst called_party; 

calling_party: instance;
condition when ("flag==true");
in off_hook from env;
in digit from env;
out seizure_int to called_party; time answer_in [ , "10"];
in ack from called_party;
label answer_in; in answer from called_party;
reference ( ref1 alt ref2 ) opt ref2;
endinstance;

called_party: instance;
in seizure_int from calling_party; time "u";
action "u":="pow(u, 1.2)";
out ack to calling_party; time answer_out ("2*u", "(2*u)++"], answer_out &"s" ;
in off_hook from env;
label answer_out; out answer to calling_party; time @"t";
endinstance;

endmsc;

Expressions from the plugged-in data language (here C++) are marked by
double quotes, but according to the MSC-2000 standard you are allowed 
to drop those quotes since a good lexer/parser would still able to 
distinguish MSC language and external data language from the context.

The *.g files of a parser (without lexer switching of course) for MSC-2000
can be found on:
http://www.itm.mu-luebeck.de/english/research/specification/msc2000parser/
(Lexer-switching is necessary each time an evil production named 
*[Ee]xpression, *[Sr]tring or *[Pp]attern is referenced.)


My approach will be to copy some productions from the parser to the lexer
in order to enable the lexer to do some of the parser work in order to 
shift the responsibility of switching from the parser to the lexer.
(Which needs some brainwork in order to identify the unique pre-fixes of
data language expressions.)

Helmut
-- 
Helmut Neukirchen                   mailto:neukirchen at itm.mu-luebeck.de
Institute for Telematics                   http://www.itm.mu-luebeck.de
Medical University of Luebeck                   phone: +49 451 500 4867
Ratzeburger Allee 160, D-23538 Luebeck, Germany   fax: +49 451 500 3722

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list