[antlr-interest] Eliminate characters in TOKEN
Ruslan Zasukhin
ruslan_zasukhin at valentina-db.com
Wed Nov 23 08:05:48 PST 2011
On 11/23/11 11:59 AM, "Bart Kiers" <bkiers at gmail.com> wrote:
> Hi Rampon,
>
>
> On Wed, Nov 23, 2011 at 10:54 AM, Rampon Jerome <ramponjerome at yahoo.fr>wrote:
>
>> ...
>> it complained on output option to be AST.
>> If I add it in my grammar options if complains and still return error
>> It seems it automatically adds if not there but later on still return
>> error ???
>>
>> Is that normal ?
>>
>
> Yes, the `!` to exclude characters from lexer rules (as was possible in v2)
> is no longer valid in v3 grammars.
Yes, I also was in face to this change in v3.
This is examples from our Valentina SQL grammar where we use new trick to
avoid e.g. Wrapper quotes
//--------------------------------------------------------------------------
----
// String literals:
// caseSensitive = false, so we use only small chars.
fragment
Letter
: 'a'..'z'
| '@'
;
fragment
EscapeSequence
: '\\' ( QUOTE|'\\'|'b'|'t'|'n'|'f'|'r' )
;
STRING_LITERAL
@init
{
int escape_count = 0;
int theStart = $start;
}
: QUOTE
{ theStart = GETCHARINDEX(); } // skip first quote
( EscapeSequence { ++escape_count; }
| QUOTE QUOTE { ++escape_count; }
| ~( QUOTE | '\\' )
)*
{
$start = theStart;
EMIT();
// Optimization: lexer have found escaped chars, and we even
count them.
// We pass this info into parser/tree parser inside of
token,
// so later algorithms can avoid one more scan of literal to
check if
// exists any symbols to unescape. Also knowing how much
such symbols
// Alg can do immediate return when all known escapes
resolved ...
// Also this can help accurately calculate RAM for unescaped
string.
//
LTOKEN->user1 = escape_count;
}
QUOTE // and skip last quote
;
//-----------------------------------------------------------------------
IDENT
: ( Letter | '_' ) ( Letter | '_' | Digit )*
;
DELIMITED // delimited_identifier
@init
{
$type = IDENT;
int theStart = $start;
}
:
( DQUOTE { theStart = GETCHARINDEX(); }
( ~(DQUOTE) | DQUOTE DQUOTE )+
{ $start = theStart; EMIT(); }
DQUOTE
| BQUOTE { theStart = GETCHARINDEX(); }
( ~(BQUOTE) | BQUOTE BQUOTE )+
{ $start = theStart; EMIT(); }
BQUOTE
// valentina/oracle extension: [asasas '' " sd "]
| LBRACK { theStart = GETCHARINDEX(); }
( ~(']') )+
{ $start = theStart; EMIT(); }
RBRACK
)
;
--
Best regards,
Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc
Valentina - Joining Worlds of Information
http://www.paradigmasoft.com
[I feel the need: the need for speed]
More information about the antlr-interest
mailing list