[antlr-interest] Lexer Predicates?

Sun Aug 3 13:24:46 PDT 2008

Matt Palmer schrieb:
> I am having similar issues - I'm having to encode the parser state into 
> the lexer.  This is because I have character sequences that are subsets 
> of one another, etc.  that should only match in certain places.  The 
> high level parser rules determine this very nicely - but I have to 
> explicitly push down those rules into the lexer using predicates.
> 
> I was wondering myself if it would be possible to automate this process.
> 
> Matt.

There is an XQuery project which has a lexer which only creates token on 
demand. I don't the project page, but searching the archives should give 
you clues.

Johannes
> 
> On Sun, Aug 3, 2008 at 7:34 PM, Foust <javafoust at gmail.com 
> <mailto:javafoust at gmail.com>> wrote:
> 
>      > At 9:40pm, August 02, 2008 Gavin wrote:
>      >
>      > At 11:06 3/08/2008, Foust wrote:
>      > >Do lexer predicates work in v3?
>      >
>      > That depends on what you mean.  You can certainly use both
>      > syntactic and semantic predicates within the lexer, but they can
>      > only use lexer state.
> 
>     That would explain why setting a static flag in the Lexer from the
>     Parser
>     has no effect -- the Lexer has already run to completion before the
>     parser
>     receives the first token.
> 
>      >
>      > Also, while I'm not entirely sure about this, I think predicates
>      > in the lexer can only be used to decide between alts within a
>      > single lexer rule.  I vaguely recall some trouble when trying to
>      > use them to decide between multiple lexer rules (at the top
>      > level).
> 
>     I'll keep that in mind. I've had nothing but trouble trying to get
>     the Lexer
>     to return tokens based on context (as best determined by the Parser).
> 
> 
>      > Generally speaking, you should keep your lexer fairly
>      > straightforward and unambiguous, and defer semantic decisions (and
>      > ambiguity resolution) until the parsing phase.
> 
>     Yes... it started out that way. But to allow spaces to be part of a
>     config
>     value (read up to EOL), the Lexer needs to honor state. (Place
>     spaces in the
>     HIDDEN channel for all other cases - outside of a special
>     config/preprocessor rule).
> 
>     The parser shouldn't have to go through contortions because of lexer
>     design.
>     In fact, it seems as though the lexer itself is fine, if only it
>     would get
>     tokens as required, rather than all at once.
> 
>     Might I suggest the lexer be endowed with a state mechanism that can be
>     controlled from the parser?
> 
> 
>