[antlr-interest] pass state from parser to lexer

Scobie Smith (Insight Global) v-scobis at microsoft.com
Tue Jul 3 20:46:52 PDT 2012

Thanks, Benjamin and Bart. That helps a lot. However, there is one more complication that makes this worse. The BODY lex rule needs to be applicable only in the context of a given parser rule. The language has only one particular statement that involves this sort of BODY; elsewhere such a string should be tokenized piecemeal. (This BODY token would grab any string with "bookends".) If there were some sort of dynamic scope, or state, or flag, that I could set in the parser at a given rule, which I could then check in lexer rules--then I could use semantic predicates to guide whether a lex rule should be used. But there seems to be nothing I can pass between parser and lexer, esp. from parser to lexer.

So, what I am doing now is using an elaborate gated semantic predicate in the lexer that effectively checks if it can parse out this odd statement. The statement is not recursive, so I can get away with this in the lexer (at least this time). If the predicate detects the statement, it also grabs the relevant pieces, and then I emit those pieces manually as separate tokens.

The lexer rule is basically this:

	{ DetectExec() }?=> ExecCommand
			MatchExecStatement();  // Just calls input.Consume() to move the char position along for all the pieces.
			EmitExecStatement();  // Emits the three pieces (exec, mode, body).
		} ;

ExecCommand: { CharPosition == 0 }? 'exec' ;

The DetectExec() method just looks down the input using LA(i) to parse out the pieces, saving them for the emit. This is looking like it is going to work.

So, in effect, this is just a hack, to have the lexer (predicate) do the syntactical parsing. If the parsing works, then the lexer rule succeeds, and that "token" (ExecStatement) will then trigger the right parser rule.

If anyone knows a hack to pass state from parser to lexer, let me know. :)


-----Original Message-----
From: Benjamin S Wolf [mailto:jokeserver at gmail.com] 
Sent: Tuesday, July 03, 2012 6:45 PM
To: Bart Kiers
Cc: Scobie Smith (Insight Global); antlr-interest at antlr.org
Subject: Re: [antlr-interest] pass state from parser to lexer

On Tue, Jul 3, 2012 at 10:31 AM, Bart Kiers <bkiers at gmail.com> wrote:
> On Tue, Jul 3, 2012 at 6:13 PM, Benjamin S Wolf <jokeserver at gmail.com>
> wrote:
>> I believe you can also use ~ as a negation, eg.
>> BODY : '#' ~'#'* '#' ;
>> (if # is your delimiter, as an example)
> The delimiter is variable and is provided (at runtime?) by the user, 
> as indicated by Scobie.

Oh, I see. Sorry I missed that. If the user gets to specify then you can't hard-code it into the Lexer, which is what those rules will assume you can do.

Bart's suggestion works for specifying the delimiter at lexer initialization time. Here's a similar way to use a delimiter specified at lex time.

BODY : delimiter=. ( {input.LA(1) != $delimiter}?=> . )* . ;


@init { char delimiter; }
BODY : a=. {delimiter=$a;} ( {input.LA(1) != delimiter}?=> . )* . ;

Use delimiter=~WS or similar if you don't want your grammar using whitespace characters as delimiters.

More information about the antlr-interest mailing list