[antlr-interest] Lexer problem

Mon May 24 15:05:20 PDT 2004

Hi All,
I'm stuck with a VHDL lexer. The ' character determination can be
highly context sensitive. Consider some examples:

c <= '(';         -- '(' should be lexed as a single token, 
representing a
character literal
D <= vector'(A);  -- ' should be lexed as a QUOTE token
D <= string'('''&'('&')'); -- should have the following token 
sequence:

"<="
string : IDENTIFER
'      : QUOTE
'''    : CHAR_LITERAL, representing a '
&      : AMPERSAND
'('    : CHAR_LITERAL, representing (

I think I have found a rule that will satisfy all conditions:

QUOTE: '\'' (
    {LA(2)=='(' && LA(3)=='\'' && LA(5)=='\''}? {$setType(QUOTE);}
    | {LA(3)=='\''}? . "'"                      {$setType(CHAR_LIT};}
    |                                           {$setType(QUOTE);}
    ;

However, when I look at the generated code, it will always test for 
CHAR_LIT first, before looking at the first QUOTE. 

I've tried a number of variations, but they are not leading anywhere. 
Any suggestions?

Thanks,
Tom

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/