[antlr-interest] Antlr lexers - implementing Here documents

Sat Sep 17 06:23:32 PDT 2005

Consider a second lexer that drops into "eat a line" mode, and a
parser that just eats lines until the /line/ matches the identifier in
question.  This is similar to the model for javadoc and other such
comment-parsing scanners.  Something like the following, in the
parser:
///
///  result is tree node with many children: #( "<<" ( LINE )* )
///
heredoc:
{ string terminator; }
   "<<"^
   id:IDENTIFIER!
   {
      terminator = id->getText();
      /**** switch to "eat-a-line" lexer here ****/
   }
   ( { LT(1)->getText() != terminator }? LINE )*
   { /*** switch back to prior lexer ***/ }
   eod:IDENTIFIER!
   { /*** make sure "eod" identifier matches, if you're paranoid ***/ }
   ;

On 9/17/05, Tommy Nordgren <tommy.nordgren at chello.se> wrote:
> For an application I need to implement a token similar to here
> documents in bash and perl.
> Each token is introduced by a special symbol, and an identifier. The
> token is ended by the
> start identifier occuring by itself on a line.
> Everything after the introducing identifier, and before the
> terminating identifier should be copied
> verbatim to the generated token.
> Like this:
> <<ENDRULE
> "This should be copied verbatim"
> ENDRULE
> 
> After this pattern occurs in the input, a token should be generated
> with the content "This should be copied verbatim",
> and an appropriate token code.
> 
> Any ideas, folks, on how to implement this?
> 
> "Home is not where you are born, but where your heart finds peace" -
> Tommy Nordgren, "The dying old crone"
> 
>