[antlr-interest] Making common prefixes work

Wed Dec 28 14:14:36 PST 2011

Instead of LINE, try:

CHAR: . ;

Then your parser deals with CHAR+

However, you might need something more complicated to go with it:

fragment TOPBEGIN :;
HASH : '#'

    (    ('TOP')=>'TOP' { $type = TOPBEGIN; }
       |
    )
   ;

Note that as you post it, your lexer will error out on \n or \r - catch
them and skip() them.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of none <
> Sent: Wednesday, December 28, 2011 11:38 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Making common prefixes work
>
> Hi,
>
> My task is to build a parser for a language for minutes (using the
> python backend).
> This results in a nasty requirement: try to use as few characters for
> special tokens as possible. An input I'd like to parse would look like
> this:
>
> #Date;28.12.2011
>
> #TOP Foo
> Some text
> {
> 	Some list item;
> 	Some other list item;
> }
>
>
> a main feature is the transformation into latex code. this code would
> produce:
>
> \section*{Foo}
> Some text
> \begin{itemize}
> \item Some list item
> \item some other list item
> \end{itemize}
>
> the main problem is the lexer, I tried the following:
>
> HASH : '#';
> SEMICOLON : ';';
> TOPBEGIN : '#TOP ';
> BLOCKBEGIN : '{';
> BLOCKEND : '}';
>
> LINE : ~( HASH | SEMICOLON | NEWLINE | BLOCKBEGIN | BLOCKEND | TAGBEGIN
> | TAGEND | '\n' | '\r' )+ ;
>
> While this is working in general, it suddenly breaks if we have
> metadata starting with a T:
>
> #Temp;Some Foo
>
> this will produce:
> Lexer error: line 6:2 mismatched character u'e' expecting 'O'
>
> Any idea how to fix this?
>
> Thanks,
> nafur
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address