[antlr-interest] Re : Re : How to get a list of all validoptions for the next token?

Niemeijer, R.A. r.a.niemeijer at tue.nl
Tue Sep 2 08:12:37 PDT 2008


Hello,

I've given editing the templates a go and it is indeed not as hard as I imagined. However, I have some difficulty getting the data I need.

The only exceptions I've found so far that need to be changed are NoViableAltExceptions, since MismatchedTokenExceptions already have an Expecting variable. NoViableAltExceptions are generated in two places in the codegen template: dfaState and dfaStateSwitch.

As for the one in dfaStateSwitch:
By replacing <description> (which always seems to be empty) with

(from field in GetType().GetFields()
 where field.Name.StartsWith("FOLLOW_" + input.LA(1))
 from bit in ((BitSet)field.GetValue(this)).ToArray()
 select tokenNames[bit].Trim('\'')).Aggregate((t1, t2) => t1 + " " + t2)

I can get a space-separated list of tokens. It is by no means a perfect solution though. Some issues:
- This gets all the tokens that could ever follow the last token, not just the ones in the current rule. Depending on the grammar this may or may not be a problem.
- It only works on non-named tokens (input.LA(1) returns an int). This should be fairly trivial to fix though.
- Throws an error if it can't find any FOLLOW_ variables. Also easily fixed.
- It works in the one test case I tried it with. I'm not guaranteeing it works in others.

This doesn't work in dfaState though, since there input.LA(1) is always -1. What I need (and it would work for dfaStateSwitch as well) is the list of the numbers in the case statements preceding the default/else statement containing the NoViableAltException. The only way to sort of get at this info I've found is the edges parameter that is passed to the two functions, but it contains the entire generated source code for those parts. Using it directly or in quotes results in tons of compile errors.

So my question: is there a way to process the edges variable (say, replacing all the non-numeric characters with spaces) before spitting it out into the generated code or, preferably, is there a way to get the list of numbers used for creating the case statements directly?

Thanks again.

-----Original Message-----
From: Thomas Brandon [mailto:tbrandonau at gmail.com] 
Sent: dinsdag 26 augustus 2008 15:32
To: Niemeijer, R.A.
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Re : Re : How to get a list of all validoptions for the next token?

It isn't especially hard to alter the codegen templates, assuming, I
think correctly in this case, you just need trivial alterations to the
templates and don't need to add extra template parameters which
requires altering the analysis parts of ANTLR which are more
complicated. Though you will need a basic understanding of
stringtemplate syntax, check the wiki.
You could just alter the existing
src/org/antlr/codegen/templates/Java.stg but I think it's better to
create your own target to keep your changes seperate. This requires a
bit of setup but isn't that hard and I think is worthwhile in terms of
easier maintenance and leaving the existing Java target untouched for
testing.
Here I'll create a new target called MyJava. To do this, in the
extracted ANTLR distribution:
1) In the src/org/antlr/codegen folder create a MyJavaTarget,java file
that extends from JavaTarget. You shouldn't need to change this you
just need a copy of the JavaTarget file named after your target, by
extending we avoid duplicating the existing version.
2) Go to the src/org/antlr/codegen/templates folder. Here you will
find the templates for each target, the java target templates live in
the Java subfolder. Create a copy of this folder named after your
target, MyJava here (unfortunately due to the way ANTLR is setup I
don't think you can reference the existing copies so you need to
create a new copy of the files).
3) In the src/org/antlr/codegen/templates/MyJava folder, create a new
file called MyJava.stg with the contents:
group MyJava: Java;
@dfaState.noViableAltException() ::= <<
// My code
>>
4) Modify your grammar to include the option "language=MyJava;" and
regenerate your grammar.

Now, if you look at the generated code where the NoViableAltException
is thrown you should see the "//My code" comment in between the
construction of the exception and throwing it. You can then replace
the comment with any code you need. Due to the noViableAltException
region you can probably get away with not altering the existing
dfaState template without getting too messy, e.g. to use a custom
exception type you could just subclass NoViableAltException and then
replace the existing instance in nvae with an instance of your
exception which will then be thrown, a little messy but probably
better than having to duplicate the existing template to alter it.
Some changes may require you to duplicate and alter the existing
templates if there isn't a region you can override (or, due to a ST
limitation, if there is a region with existing contents you want to
add to you can't reference the contents defined in the parent template
and so must duplicate it).
When doing this I actually create 2 new templates, one that extends
the existing Java template and copies existing contents and adds
regions for the bits I need to change and another that then adds the
new stuff I need into those regions, thus separating the changes from
the simple duplication and making it easier to copy across the changes
from new ANTLR versions. If the added regions are generically useful
there is also the chance that Ter will accept them for addition to the
ANTLR codebase.
For example, say you wanted to alter the actual creation of the
exception instead, I would create JavaMods.stg with:
group JavaMods: Java;

dfaState(k,edges,eotPredictsAlt,description,stateNumber,semPredState) ::= <<
int LA<decisionNumber>_<stateNumber> = input.LA(<k>);<\n>
<edges; separator="\nelse ">
else {
<if(eotPredictsAlt)>
    alt<decisionNumber>=<eotPredictsAlt>;
<else>
    <ruleBacktrackFailure()>
    <@throwNoViableAltException> // <---- ADDED
    NoViableAltException nvae =
        new NoViableAltException("<description>", <decisionNumber>,
<stateNumber>, input);<\n>
    <@noViableAltException()>
    throw nvae;<\n>
    <@end> // <----- ADDED
<endif>
}
>>
where the comments indicate what was added to the dfaState template I
copied from Java.stg.
Then in MyJava.stg you put something like:
group MyJava: JavaMods;
@dfaState.throwNoViableAltException() ::= <<
MyNoViableAltException nvae = ...;
throw nvae;
>>

That's all of the top of my head, so the syntax may be off. As I said
you'll need to familiarise yourself with string template syntax a bit,
but beyond that and figuring out the changes to make to the ANTLR code
it's all pretty straightforward I think.

Tom.
On Fri, Aug 22, 2008 at 8:50 PM, Niemeijer, R.A. <r.a.niemeijer at tue.nl> wrote:
> I'm afraid that unless somebody else offers to help me I'm a bit stuck at the moment. As Kay Röpke said: "right now i don't see any way of getting at the necessary information, without altering the codegen templates.". By now I've more or less learned how to work with Antlr, but I'm afraid hacking the way Antlr itself works is still a bit beyond me. So if anyone wishes to implement this I'd be very grateful, but otherwise I fear I will have to abandon the idea.
>


More information about the antlr-interest mailing list