[antlr-interest] spaces between tokens

Wed Nov 10 08:38:34 PST 2004

On Nov 10, 2004, at 7:39 AM, Anakreon wrote:

>
> silverio.di at qem.it wrote:
>>
>>
>>
>>
>> Hi,
>> I've a big problem.
>>
>> In my grammar, how in many others, the whitespaces are skipped in 
>> lexer,
>> but I've some circumstances in which I need to check that not any 
>> spaces
>> are present between tokens.
>>
>> Example :
>> WeekJobHour at Monday = 8
>>
>> would mean assign 8 (hours) to parameter Monday of structure 
>> WeekJobHour.
>> I would like my lexer extract following tokens:
>>
>> IDENT ATSIGN IDENT
>>
>> but my problem is to check than not any WS are present between
>> IDENT and ATSIGN and between ATSIGN and IDENT so
>>
>> WeekJobHour at Monday = 8        // is OK
>> WeekJobHour @Monday = 8       // is BAD
>> WeekJobHour@ Monday = 8       // is BAD
>> WeekJobHour  @ Monday = 8           // is BAD too !
>>
>> I could use following lexer rule:
>>
>> STRUCT_PARAMETER
>>       :     ('A'..'Z' | 'a..z')+
>>             '@'
>>             ('A'..'Z' | 'a..z')+
>>       ;
>>
>> but in parser how can I extract the structure name (WeekJobHour)
>> and the structure parameter (Monday) form STRUCT_PARAMETER
>> token ?
>>
>> I think a similar issue is present in C/C++ structure construct
>>
>> Thank you for your suggestions about
>> Silverio Diquigiovanni
> Make a class wich implements TokenStream wich uses the Lexer.
> In the nextToken method, if the lexer returns a token of type
> STRUCT_PARAM, split the token in 3 tokens where the first would be
> of type STRUCT_NAME the second STRUCT_AT and the third STRUCT_DAY
> and the text of the tokens WeekJobHour, @, Monday respectively.
> return the first token from the method and store the other 2.
> In the next 2 calls of nextToken return the stored ones.
>
> Pass the implementor of TokenStream instead of your Lexer to the
> parser.
>
> Anakreon
>

I agree with the above approach, and also read my ParserFilter paper on 
my website, http://www.codetransform.com/filterexample.html

I would recommend an alternative approach, which would be to not skip 
whitespace in the lexer.  Instead, discard it in the parser filter.  
That filter can still check that no whitespace occurs before or after 
an @ between IDENTS.

Alternately you could keep track of state in the lexer.  Set a boolean 
variable in the makeToken() method if the token made was WS.  To see 
what is coming after, inspect LA(1).  Assuming @ is not used in any 
other way, you would have a rule similar to this, where 
previousWasWhitespace is the variable set in makeToken().

AT: { !previousWasWhitespace && (LA(1)==' ' || LA(1)=='\t') }? '@' ;

Monty

ANTLR & Java Consultant -- http://www.codetransform.com
ANSI C/GCC transformation toolkit -- 
http://www.codetransform.com/gcc.html
Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/