[antlr-interest] Best practice to handle Lexer backtracking demand

Joachim Schrod jschrod at acm.org
Fri Aug 13 17:13:47 PDT 2010


Hi,

So today was my 1st ANTLR day, having used PCCTS more than a 15
years ago. So please bear with me; I'm proficient with other
scanner/parser tools and formal language theory, but not with ANTLR.

I learned, from searching the mailing list, that the lexer does not
support backtracking. I.e., the token specification must be
prefix-free. (I didn't find that information in the reference book,
btw. It cost me quite a few hours to discover that it's the root
cause of my problem and dig out the relevant information.) What I
did not found, even after several hours of Google and mailing list
searches, is some pointers for ANTLR best practice how to specify a
lexical grammar for a language where tokens are not prefix-free.

E.g., let's suppose I have the grammar

  name : PRENAME ? text ;
  text : CHAR+ ;

  PRENAME : 'prename' ;
  CHAR : . ;

which should accept the input string "pr", but doesn't, owing to
the missing lexer backtracking. How is one supposed to formulate
such a grammar in ANTLR? (V3 if it matters.)

Having lex token defs for all chars of all keywords and
constructing the keywords in the parser? After all, the parser
knows how to backtrack. But that's going to be ugly and not really
maintainable in the long run.

Creating prefix tokens for all keywords? I.e., having additional
tokens for 'prenam', 'prena', 'pren', 'pren', and 'pr'? Ugly as
well, IMNSHO.

Using syntactic predicates? I tried that but did not succeed.
AFAIU, they are a parser feature, but I'd need them in the lexer.

If you can point me to a text that explains the best practice for
this (IMHO common) use case, I'd be very happy.

Thanks very much in advance for helping out a newbie to ANTLR,

	Joachim

PS: Part of my confusion was the text of "What is the intended
behavior of the lexer?" at
http://www.antlr.org/wiki/pages/viewpage.action?pageId=4882470. It
implies that the lexer is backtracking and that is really confusing.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joachim Schrod				Email: jschrod at acm.org
Roedermark, Germany



More information about the antlr-interest mailing list