[antlr-interest] Re : Re : How to get a list of all validoptions for the next token?

Niemeijer, R.A. r.a.niemeijer at tue.nl
Wed Sep 3 07:13:05 PDT 2008


Unfortunately that doesn't work. Both computeContextSensitiveRuleFOLLOW
and computeErrorRecoverySet call combineFollows. combineFollows uses
state.following and state.followingStackPointer, which don't contain the
information I need.
Perhaps an example is in order.

One of the possible rules in my system is a definition, which is roughly
comparable to a variable assignment. As a simple example, let's use "h
is its height". The associated text that the parser receives is
":TEXTh:IS:ITS:PROPNheight"

When I parse ":TEXTh" I get a MismatchedTokenException that says it
expects the token ':IS'. No problem there.
When I parse ":TEXTh:IS" I get a NoViableAltException. Let's look at (a
condensed version of) the code just before that:

switch ( input.LA(1) ) 
{
  case 28:
  case 33:
  case 36: { alt3 = 1; } break;
  case 11: { ... } break;
  case 12: { ... } break;
  case 34:
  case 39:
  case 40:
  case 41:
  case 42:
  case 43: { alt3 = 2; } break;
  default:
    NoViableAltException nvaes_d3s0 = new NoViableAltException("", 3, 0,
input);
    throw nvaes_d3s0;
}

Since the input ends, input.LA(1) returns -1, so the switch statement
goes to the default case and throws an error. The numbers in the case
statements (28, 33, 36, etc.) are the tokens that can legally follow the
':IS' token, and hence the ones I would like to put in the description
of the NoViableAltException so I can present these options to the user.
This information cannot be found in the following bitset, as
followingStackPointer is 0 and following[0] is an empty bitset (none of
the other entries in following have anything approaching the correct
sequence either).

Today I managed to produce the following code that seems to give me the
right tokens in several cases:

var methodName0 = new System.Diagnostics.StackFrame().GetMethod().Name;
var followBits0 =
    from field in GetType().GetFields()
    let followWhat = input.LA(1) > -1 ? input.LA(1) : input.LA(-1)
    let followWhatFixed = TokenNames[followWhat].StartsWith("'") ?
followWhat.ToString() : TokenNames[followWhat]
    where field.Name.StartsWith("FOLLOW_" + followWhatFixed + "_")
    select field;
var filtered0 = followBits0.Where(field =>
    field.Name.Contains("in_" + methodName0));
var followTokens0 =
    from field in (filtered0.Any() ? filtered0 : followBits0)
    from bit in ((BitSet)field.GetValue(this)).ToArray()
    select tokenNames[bit].Trim('\'');
    
    NoViableAltException nvaes_d4s0 =
        new NoViableAltException(followTokens0.Aggregate("", (t1, t2) =>
t1 + " " + t2), 4, 0, input);

    throw nvaes_d4s0;

As you can see it's hardly elegant due to all the special cases, but it
sort of works... provided that input has one or more tokens. It works
for definitions, as they can only start with a :TEXT token, so I get a
MismatchedTokenException. Conditions (e.g. its height must be more than
100 mm), however, have multiple valid starting tokens. When trying to
provide intellisense for the first token of a condition input.LA(1)
produces -1 and input.LA(-1) throws a NullReferenceException. It
wouldn't matter much if it didn't, since there's nothing to follow, so
none of the FOLLOW_ bitsets will be usable. Using following here doesn't
work either, as all it contains is the ':OR' token, which can combine
multiple conditions into one.

So that brings me back to my previous question:
Is there a way to process the edges variable (say, replacing all the
non-numeric characters with spaces) before spitting it out into the
generated code or, preferably, is there a way to get the list of numbers
used for creating the preceding case/if statements directly?

Thanks again for all your help.



-----Original Message-----
From: Thomas Brandon [mailto:tbrandonau at gmail.com] 
Sent: dinsdag 2 september 2008 18:48
To: Niemeijer, R.A.
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Re : Re : How to get a list of all
validoptions for the next token?

Can't you reuse the ANTLR routines for this, e.g.
BaseRecognizer.computeContextSensitiveRuleFOLLOW and
BaseRecognizer.computeErrorRecoverySet. Otherwise I think you want to
be operating on the follow set stack maintained by ANTLR
(RecognizerSharedState.following) rather than the static follow set
variables.
Otherwise a few comments based on a quick reading are below.
On Wed, Sep 3, 2008 at 1:12 AM, Niemeijer, R.A. <r.a.niemeijer at tue.nl>
wrote:
> Hello,
>
> I've given editing the templates a go and it is indeed not as hard as
I imagined. However, I have some difficulty getting the data I need.
>
> The only exceptions I've found so far that need to be changed are
NoViableAltExceptions, since MismatchedTokenExceptions already have an
Expecting variable. NoViableAltExceptions are generated in two places in
the codegen template: dfaState and dfaStateSwitch.
>
> As for the one in dfaStateSwitch:
> By replacing <description> (which always seems to be empty) with
>
> (from field in GetType().GetFields()
>  where field.Name.StartsWith("FOLLOW_" + input.LA(1))
>  from bit in ((BitSet)field.GetValue(this)).ToArray()
>  select tokenNames[bit].Trim('\'')).Aggregate((t1, t2) => t1 + " " +
t2)
>
> I can get a space-separated list of tokens. It is by no means a
perfect solution though. Some issues:
> - This gets all the tokens that could ever follow the last token, not
just the ones in the current rule. Depending on the grammar this may or
may not be a problem.
The follow set names contain the rule they apply to, you can filter on
that.
> - It only works on non-named tokens (input.LA(1) returns an int). This
should be fairly trivial to fix though.
You need to use the token name, not the token type as returned by LA.
Use the tokenNames array from the parser (or reflection if you don't
have access to the parser).
> - Throws an error if it can't find any FOLLOW_ variables. Also easily
fixed.
> - It works in the one test case I tried it with. I'm not guaranteeing
it works in others.
>
> This doesn't work in dfaState though, since there input.LA(1) is
always -1.
LA returning -1 indicates the next token is EOF. Assuming you weren't
at the end of the stream I can't see why this would occur. Also, it
should be LA(<k>) not LA(1). And you should have access to the cached
value via LA<decisionNumber>_<stateNumber>.
>What I need (and it would work for dfaStateSwitch as well) is the list
of the numbers in the case statements preceding the default/else
statement containing the NoViableAltException. The only way to sort of
get at this info I've found is the edges parameter that is passed to the
two functions, but it contains the entire generated source code for
those parts. Using it directly or in quotes results in tons of compile
errors.
>
> So my question: is there a way to process the edges variable (say,
replacing all the non-numeric characters with spaces) before spitting it
out into the generated code or, preferably, is there a way to get the
list of numbers used for creating the case statements directly?
>
> Thanks again.
>

Tom.


More information about the antlr-interest mailing list