[antlr-interest] Help controlling parser decisions

Wed Jul 25 23:51:01 PDT 2007

On 7/26/07, Ted Villalba <ted.villalba at gmail.com> wrote:
> Thanks for the input Gavin.
>
> I'm still finding that no matter how I tweak the parser rules, it's really
> the order of the lexer rules that determine how a token is evaluated,
> semantic predicates be damned.
>
> It seems if I have more than one lexer rule that share a common token, then
> the rule that comes first wins all the time.
Yes, lexing is independent of parsing. How a lexer rule features in
the parser has no influence on how it matches in the lexer.
>
> My objective is to allow the term NEAR to be entered as a boolean, except if
> it begins or ends a sequence of  terms, then recognize it as a WCHAR.
> TAG=APPLES NEAR ORANGES //this should parse NEAR as an OP
> TAG=NEAR APPLES ORANGES//this should treat it as a WCHAR
>
There are two options, you can either recognise keywords (e.g. near)
as identifiers (WCHAR in your case) in the lexer and then use a
predicate in the parser to check the text of identifier tokens to see
if they are keywords. This is the option I gave code for before. You
do not have lexer rules for your keywords in this case. So you want
something like:
grammar WQL;
options{
   output=AST;
   ASTLabelType=CommonTree;
}

tokens {
	NEAR;
	FAR;
}

query :  tag '=' terms ;

tag    : WCHAR ;

terms
	:	WCHAR
		(	(op WCHAR)=>(op^ terms)
		|	WCHAR terms
		)?
	;

op: near|far;

near
    : {input.LT(1).getText().toLowerCase().equals("near")}? w=WCHAR
{$w.setType(NEAR);}
    ;

far
    : {input.LT(1).getText().toLowerCase().equals("far")}? w=WCHAR
{$w.setType(FAR);}
    ;

WS      : (' '|'\t'|'\r'|'\n')+ {skip();};
WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;

Add rules like near and far for each keyword and use them instead of
lexer rules.

Or the other alternative, as Gavin presented, is to match keywords in
the lexer and have a parser rule that accepts keywords as well as
identifiers and use this when keywords are allowed. So, you'd have
something like:
grammar WQL;
options{
   output=AST;
   ASTLabelType=CommonTree;
}

query :  tag '=' terms ;

tag    : WCHAR ;

terms
	:	term
		(	(op term)=>(op^ terms)
		|	term terms
		)?
	;

term:	t=(WCHAR|NEAR|FAR) { $t.setType(WCHAR); };
op: NEAR|FAR;

NEAR:	'NEAR';
FAR	:	'FAR';
WS      : (' '|'\t'|'\r'|'\n')+ {skip();};
WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;

So, instead of using WCHAR you use term.

The setType actions in both mean that tree parsers need not deal with the issue.

Which method you use is a matter of preference. The second option
probably performs better, not requiring a semantic predicate, but
requires you to make multiple changes to add keywords.

Note that this grammar doesn't necessarily do quite what you want due
to the predicate in terms. Something like "TAG=APPLES NEAR NEAR" will
give an error as it will match the second NEAR as an operator and want
some more terms for it. Here your language is somewhat ambiguous.
Given input such as "TAG=NEAR NEAR NEAR NEAR" what should happen?
Which NEAR's are operators and which are WCHARS? But the basic stuff
of handling keywords as identifiers is there.

Tom.
> Here is my simplified grammar. Is it easy enough for someone to recognize
> the changes that need to be made to this grammar to meet that objective?
> Sorry if I'm just not getting it, this has been a challenging exercise.
>
> grammar WQL;
>
> options{
>    output=AST;
>    ASTLabelType=CommonTree;
> }
>
>
> query :  tag '=' terms ;
>
> tag    : WCHAR ;
>
> terms  : WCHAR+  (OP^ WCHAR+)*  ;
>
> OP    : 'NEAR'|'near';
> WS      : (' '|'\t'|'\r'|'\n')+ {skip();};
> WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;
>
>
>
> Thank you,
> Ted
>
>