[antlr-interest] Re : Re : How to get a list of all validoptions for the next token?

Thomas Brandon tbrandonau at gmail.com
Tue Sep 2 09:48:28 PDT 2008


Can't you reuse the ANTLR routines for this, e.g.
BaseRecognizer.computeContextSensitiveRuleFOLLOW and
BaseRecognizer.computeErrorRecoverySet. Otherwise I think you want to
be operating on the follow set stack maintained by ANTLR
(RecognizerSharedState.following) rather than the static follow set
variables.
Otherwise a few comments based on a quick reading are below.
On Wed, Sep 3, 2008 at 1:12 AM, Niemeijer, R.A. <r.a.niemeijer at tue.nl> wrote:
> Hello,
>
> I've given editing the templates a go and it is indeed not as hard as I imagined. However, I have some difficulty getting the data I need.
>
> The only exceptions I've found so far that need to be changed are NoViableAltExceptions, since MismatchedTokenExceptions already have an Expecting variable. NoViableAltExceptions are generated in two places in the codegen template: dfaState and dfaStateSwitch.
>
> As for the one in dfaStateSwitch:
> By replacing <description> (which always seems to be empty) with
>
> (from field in GetType().GetFields()
>  where field.Name.StartsWith("FOLLOW_" + input.LA(1))
>  from bit in ((BitSet)field.GetValue(this)).ToArray()
>  select tokenNames[bit].Trim('\'')).Aggregate((t1, t2) => t1 + " " + t2)
>
> I can get a space-separated list of tokens. It is by no means a perfect solution though. Some issues:
> - This gets all the tokens that could ever follow the last token, not just the ones in the current rule. Depending on the grammar this may or may not be a problem.
The follow set names contain the rule they apply to, you can filter on that.
> - It only works on non-named tokens (input.LA(1) returns an int). This should be fairly trivial to fix though.
You need to use the token name, not the token type as returned by LA.
Use the tokenNames array from the parser (or reflection if you don't
have access to the parser).
> - Throws an error if it can't find any FOLLOW_ variables. Also easily fixed.
> - It works in the one test case I tried it with. I'm not guaranteeing it works in others.
>
> This doesn't work in dfaState though, since there input.LA(1) is always -1.
LA returning -1 indicates the next token is EOF. Assuming you weren't
at the end of the stream I can't see why this would occur. Also, it
should be LA(<k>) not LA(1). And you should have access to the cached
value via LA<decisionNumber>_<stateNumber>.
>What I need (and it would work for dfaStateSwitch as well) is the list of the numbers in the case statements preceding the default/else statement containing the NoViableAltException. The only way to sort of get at this info I've found is the edges parameter that is passed to the two functions, but it contains the entire generated source code for those parts. Using it directly or in quotes results in tons of compile errors.
>
> So my question: is there a way to process the edges variable (say, replacing all the non-numeric characters with spaces) before spitting it out into the generated code or, preferably, is there a way to get the list of numbers used for creating the case statements directly?
>
> Thanks again.
>

Tom.


More information about the antlr-interest mailing list