[antlr-interest] lexer rule for string

Brian Smith brian-l-smith at uiowa.edu
Tue Oct 15 23:10:33 PDT 2002


stephane brossier wrote:
> Hi,
> 
> I am trying to recognize some strings in a C program.
> 
> I first had a lexer rule defined as is:
> 
> STRING: '"' ~'"' '"';
> 
> This worked pretty well until I had some traces like:
>  printf("The string is  \" the_string \" ");
> 
> How can i make the lexer understand that \" is not the
> end of my string but is actually part of my string
> since there is an escape char before?
> 
> Thanks,
> S.

Here is what I use to match both single-quoted and double-quoted strings 
with escape sequences like \n, \", and \t, including octal escape 
sequences. Note that these rules also make sure that the string doesn't 
contain any newlines.

options { k=3; }

QUOTED_NAME
         :   ( '"' ( QUOTED_CHARACTER | '\'' )* '"' )
         ;

STRING_LITERAL
     : ( '\'' ( QUOTED_CHARACTER | '"' )* '\'' )
     ;


// Note that QUOTED-CHARACTER doesn't allow single OR double quotes.
protected QUOTED_CHARACTER
     : (~( '\'' | '"' | '\r'  | '\n' | '\\' ))
     | '\\' ( ( '\'' | '"' | 'n' |  'r'  | 't'  |  'b' |  'f' | '\\' )
            | OCTAL_DIGIT
	     (options {greedy=true;} : OCTAL_DIGIT)?
              (options {greedy=true;} : OCTAL_DIGIT)?
	   )
     ;

protected OCTAL_DIGIT: '0'..'7'
         ;

- Brian


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list