[antlr-interest] Quoted Strings in ANTLR 3.0.1 and 3.1.1

Fromm, Stefan Fromm at dresden-informatik.de
Thu Oct 23 23:00:28 PDT 2008


 I tested it again with ANTLR version 3.0.1, 3.1 and 3.1.1. Furthermore
I added tests with a slightly changed lexer rule for quoted strings.
Here are my results.

grammar QuotedStringTest1;

QUOTED_STRING
   : '\'' ( '\'\'' | ~('\'') )* '\''
   ;
	
NUMBER
  :	 ('+' | '-')? '0'..'9'+ ('.' '0'..'9'+)?
  ;

value_expr
  :	 (QUOTED_STRING | NUMBER) EOF
  ;

Pattern         ANTLR 3.0.1                  ANTLR 3.1
ANTLR 3.1.1
                ANTLRworks 1.1.7             ANTLRworks 1.2
ANTLRworks 1.2.1
------------------------------------------------------------------------
---------------------------
''              matched as expected          matched as expected
matched as expected
''''            matched as expected          matched as expected
matched as expected
''''''          matched as expected          matched as expected
matched as expected
'''a'           matched as expected          matched as expected
matched as expected
'''a'''         matched as expected          matched as expected
matched as expected
'''a''b'        matched as expected          matched as expected
matched as expected
'''a''b'''      matched as expected          matched as expected
matched as expected
 

'a'''           matched as expected          no match, expected match
no match, expected match
'a''b'          matched as expected          matched partly, only 'a''
matched partly, only 'a''
'a''b''c'       matched as expected          no match, expected match
no match, expected match



grammar QuotedStringTest2;

QUOTED_STRING
   : '\'' ( ~('\'') | '\'\'')* '\''
   ;
	
NUMBER
  :	 ('+' | '-')? '0'..'9'+ ('.' '0'..'9'+)?
  ;

value_expr
  :	 (QUOTED_STRING | NUMBER) EOF
  ;

Pattern         ANTLR 3.0.1                  ANTLR 3.1
ANTLR 3.1.1
                ANTLRworks 1.1.7             ANTLRworks 1.2
ANTLRworks 1.2.1
------------------------------------------------------------------------
---------------------------
''              matched as expected          matched as expected
matched as expected
''''            matched as expected          no match, expected match
no match, expected match
''''''          matched as expected          no match, expected match
no match, expected match
'''a'           matched as expected          no match, expected match
no match, expected match
'''a'''         matched as expected          no match, expected match
no match, expected match
'''a''b'        matched as expected          no match, expected match
no match, expected match
'''a''b'''      matched as expected          no match, expected match
no match, expected match
                
'a'''           matched as expected          matched as expected
matched as expected
'a''b'          matched as expected          matched as expected
matched as expected
'a''b''c'       matched as expected          matched as expected
matched as expected

Interestingly changing the order of ('\'\'' | ~('\'')) to (~('\'') |
'\'\'') makes exactly the opposite test cases fail in 3.1 and 3.1.1.
First variant lets pass all expressions where first escaped quotes ''
come before any letter. Second variant lets pass all expressions where
first escaped quotes come after any letter.



I have also tried the grammar of Sam with all three versions:

grammar TestQuotedString;

value
	:	(	QUOTED_STRING
		|	NUMBER
		)*
		EOF
	;

QUOTED_STRING
	:	'\'' ( '\'\'' | ~('\'') )* '\''
	;

NUMBER
	:	('+' | '-')?
		('0'..'9')+
		('.' ('0'..'9')+)?
	;



Pattern         ANTLR 3.0.1                  ANTLR 3.1
ANTLR 3.1.1
                ANTLRworks 1.1.7             ANTLRworks 1.2
ANTLRworks 1.2.1
------------------------------------------------------------------------
---------------------------
''              matched as expected          matched as expected
matched as expected
''''            matched as expected          matched as expected
matched as expected
''''''          matched as expected          matched as expected
matched as expected
'''a'           matched as expected          matched as expected
matched as expected
'''a'''         matched as expected          matched as expected
matched as expected
'''a''b'        matched as expected          matched as expected
matched as expected
'''a''b'''      matched as expected          matched as expected
matched as expected
                
'a'''           matched as expected          matched only EOF
matched only EOF
'a''b'          matched as expected          matched only 'a''
matched only 'a''
'a''b''c'       matched as expected          matched 'a'' and 'b''
matched 'a'' and 'b''

The long test
'''a'0'''a'''0'''a''b'''0'a'''0'a''b'0'a''b''c'0''0''''0''''''0'''a''b'
matched exactly as expected in all three versions. In fact Sams grammar
would not be usable for my use case, because SQL does not allow mixing
of quoted strings/numbers in value expressions.

Stefan


More information about the antlr-interest mailing list