[antlr-interest] Determining context in lexer?

Wed Nov 10 06:01:34 PST 2004

Hi all:

In the language I'm parsing, the square brackets are used for two completely
different things.  The first usage is as an array access operator, i.e.:

  x := id[expr]
  x := id[expr, expr]
  x := id[expr][expr]
  x := f()[expr]

and so on.  No problem.  BUT, square brackets can also delimit literal
strings, i.e.:

  x := [Hello World]   // equivalent to x := "Hello World"
  x := f( [Hello], "World" )

... etc.  The problem is that the lexer tokenizes text such as:

  [x + 1]

into five individual tokens which will eventually match a parser rule such
as:

   arraySubscr: LBRKT expr ( COMMA expr )* RBRKT;

but if the brackets delimit a string, I want the text to be parsed into a
single STRING_LITERAL token, which would eventually match a rule such as:

   literalValue:  STRING_LITERAL | INT_LITERAL | FLOAT_LITERAL | ... etc. ;

Problem is, the lexer does not have context information to decide how to
tokenize a "[" ... "]" sequence of characters.  I don't think the use of "["
is ambiguous and if I knew what the prior token was then I could probably
use a semantic predicate in the lexer rule for "[".  Syntactic and semantic
predicates can look ahead, but I need to look backwards and I didn't find
anything in the docs that addresses this kind of problem.  

-- 
Don

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/