[antlr-interest] Best practice to handle Lexer backtracking demand

Joachim Schrod jschrod at acm.org
Fri Aug 13 18:41:08 PDT 2010


Gerald Rosenberg wrote:
> 
> 
> Doubt that there is any one true best practice.  A truism that may help 
> is "try not to do too much in the lexer."

Well, at the moment it looks more like "do nothing in the lexer
except deliver one char at the time", which is a quite interesting
interpretation of "not too much"... ;-)

> Given that your valid input text is pretty much defined by
> 
>>    CHAR : . ;
> 
> likely best to defer key word matching to the parser
> 
>>    name : n=text ( { helper.isKeyword($n) }? text )?  ;
>>    text : CHAR+ ;
>>
>>    CHAR : . ;
> and provide for the helper in the parser::members block.

That won't work: For example, CHAR+ would match 'prenameabc' and
the keyword would not be detected. And if I have a five char
keyword and a three char keyword, I can't see how to construct that
`collect' text token at all: If it has five chars, the three char
keyword will never match due to the longer five char text token; if
it has three chars, I have the backtracking problem again if the
input matches the first 4 chars of the keyword.

Please note: There are no delimiting chars that helps to detect
that `keyword'. I.e., no white space or such.

And, in case it's not clear, the example was made up by me to have
a minimal example to discuss. The real DSL is simple, but not as
simple: I have several marker strings that delimit character
sequences w/o any marker string in them, need to determine just
that text between the markers, and need to check if the marker
strings come in the right order and what is between them. I.e.,
lexical filters are not sufficient, I need input validation as well.

>> Using syntactic predicates? I tried that but did not succeed.
>> AFAIU, they are a parser feature, but I'd need them in the lexer.
> No, they work in the lexer as well.  It is just that the lexer defaults 
> effectively k=1, so the predicated alternatives need to be specified.  
> Or, use an embedded options block to push up the value of k.

How does one do this?

And thanks for the quick answer, though I don't see how it helps me
right now.

	Joachim

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joachim Schrod				Email: jschrod at acm.org
Roedermark, Germany



More information about the antlr-interest mailing list