[antlr-interest] [newbie] Quoted identifiers vs. string literals

Charles Daniels cjdaniels4 at gmail.com
Sun Mar 18 12:01:20 PDT 2012


Hi Eric,

Thanks for the quick response. I have downloaded ANTLRWorks 1.4.2 and
created a fresh grammar using some of the defaults that the tool generates.
Below is my grammar.

This grammar successfully parses the following input:

name String "value"


However, I want to modify this grammar so that it will successfully parse
the following input instead:

"name" String "value"


In attempting to do this, I modified the grammar below by adding double
quotes around ID, like so:

ID  : '"' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* '"'
    ;


However, parsing fails (MissingTokenException) for the desired input. I'm
guessing that's because "value" is matched to ID rather than to STRING,
when I add the quotes around ID.

Is there any way to get "value" to match STRING instead of matching ID when
I add quotes to ID? Will backtracking help? If so, how would I specify it?

Thanks,
Chuck

--- BEGIN GRAMMAR ---
grammar Config;

triplet : ID type STRING
;
 type : 'Boolean' | 'Integer' | 'String'
 ;

ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;
--- END GRAMMAR ---


On Sun, Mar 18, 2012 at 12:27 PM, Eric <researcher0x00 at gmail.com> wrote:

> Hi Chuck,
>
> Off the top of my head I would guess that STRINGLITERAL  is trumping
> IDENTIFIER. In other works the lexer generates tokens. The tokens are
> created based on the rules in the lexer. Since STRINGLITERAL  comes before
> IDENTIFIER, anything that matches STRINGLITERAL will make a
> STRINGLITERAL  token even if STRINGLITERAL  defines the same character
> string patterns as IDENTIFIER, i.e.  '"' ( ~('\\'|'"') )* '"' trumps '"'
> IdentifierStart IdentifierPart* '"'
>
> Can you post your full grammar. I am having to guess at (copied from
> Java.g) and believe I have something different.
>
> Also I strongly recommend using ANTLRWorks 1.4.2 for a new user. Note this
> is not the latest version of ANTLRWorks which is 1.4.3. I am not
> recommending ANTLRWorks 1.4.3 because it can not draw the NFA and DFA
> diagrams due to a bug and when I new user tries to do this and it doesn't
> work they think they did something wrong when if fact it is a bug from
> ANTLR 3.4 which is used by ANTLRWorks 1.4.3.
>
> Also, you can search previous post to the list by using
> http://antlr.markmail.org/
>
> Hope that helps, Eric
>
>
>
>
> On Sun, Mar 18, 2012 at 11:22 AM, Charles Daniels <cjdaniels4 at gmail.com>wrote:
>
>>  I am very new to ANTLR and I having trouble properly defining part of a
>> grammar that I am constructing to recognize a specialized configuration
>> file syntax (already defined, so I cannot change it).
>>
>> The trouble I am having is recognizing the following type of entry in the
>> config file:
>>
>> "name" type "value"
>>
>>
>> where the following rules apply:
>>
>>   1. The double quotes are a required part of the syntax, both for the
>>   name and the value.
>>   2. A "name" is essentially a Java identifier
>>   3. A "value" is a string literal
>>
>>
>> I have the following grammar so far:
>>
>> triplet : IDENTIFIER type STRINGLITERAL ;
>>
>> type : 'Boolean' | 'Integer' | 'String' ;
>>
>> STRINGLITERAL : (copied from Java.g)
>>
>> IDENTIFIER : '"' IdentifierStart IdentifierPart* '"' ;
>>
>> IdentifierStart : (copied from Java.g)
>>
>> IdentifierPart : (copied from Java.g)
>>
>> When I compile this grammar, ANTLR hangs. When I remove the double quotes
>> from IDENTIFIER, it compiles successfully. I am guessing that when I
>> include the double quotes in IDENTIFIER they are somehow causing the
>> compilation to hang due to the double quotes that are also in
>> STRINGLITERAL.
>>
>> Does anybody have any suggestions on how to define this so that I can use
>> double quotes around names and values and the compiler doesn't hang?
>>
>> Thanks,
>> Chuck
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>


More information about the antlr-interest mailing list