[antlr-interest] Making common prefixes work

Wed Dec 28 11:38:07 PST 2011

Hi,

My task is to build a parser for a language for minutes (using the
python backend).
This results in a nasty requirement: try to use as few characters for
special tokens as possible. An input I'd like to parse would look like this:

#Date;28.12.2011

#TOP Foo
Some text
{
	Some list item;
	Some other list item;
}

a main feature is the transformation into latex code. this code would
produce:

\section*{Foo}
Some text
\begin{itemize}
\item Some list item
\item some other list item
\end{itemize}

the main problem is the lexer, I tried the following:

HASH : '#';
SEMICOLON : ';';
TOPBEGIN : '#TOP ';
BLOCKBEGIN : '{';
BLOCKEND : '}';

LINE : ~( HASH | SEMICOLON | NEWLINE | BLOCKBEGIN |
BLOCKEND | TAGBEGIN | TAGEND | '\n' | '\r' )+
;

While this is working in general, it suddenly breaks if we have metadata
starting with a T:

#Temp;Some Foo

this will produce:
Lexer error: line 6:2 mismatched character u'e' expecting 'O'

Any idea how to fix this?

Thanks,
nafur