[antlr-interest] Determining context in lexer?
Don Caton
dcaton at shorelinesoftware.com
Wed Nov 10 06:01:34 PST 2004
Hi all:
In the language I'm parsing, the square brackets are used for two completely
different things. The first usage is as an array access operator, i.e.:
x := id[expr]
x := id[expr, expr]
x := id[expr][expr]
x := f()[expr]
and so on. No problem. BUT, square brackets can also delimit literal
strings, i.e.:
x := [Hello World] // equivalent to x := "Hello World"
x := f( [Hello], "World" )
... etc. The problem is that the lexer tokenizes text such as:
[x + 1]
into five individual tokens which will eventually match a parser rule such
as:
arraySubscr: LBRKT expr ( COMMA expr )* RBRKT;
but if the brackets delimit a string, I want the text to be parsed into a
single STRING_LITERAL token, which would eventually match a rule such as:
literalValue: STRING_LITERAL | INT_LITERAL | FLOAT_LITERAL | ... etc. ;
Problem is, the lexer does not have context information to decide how to
tokenize a "[" ... "]" sequence of characters. I don't think the use of "["
is ambiguous and if I knew what the prior token was then I could probably
use a semantic predicate in the lexer rule for "[". Syntactic and semantic
predicates can look ahead, but I need to look backwards and I didn't find
anything in the docs that addresses this kind of problem.
--
Don
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list