[antlr-interest] Error I can't get my head around at all...

Tue Mar 3 12:39:09 PST 2009

Jim Idle wrote:
> Sam Barnett-Cormack wrote:
<SNIP>
> I was trying to look at your grammar for you but the one you have 
> attached is missing some token defs:
 >
> ...
> [08:08:06] warning(105): ASN_1.g:456:7: no lexer rule corresponding to 
> token: BSTRING
> [08:08:06] warning(105): ASN_1.g:273:24: no lexer rule corresponding to 
> token: NUMBER
> [08:08:06] warning(105): ASN_1.g:420:16: no lexer rule corresponding to 
> token: NUMBER
> [08:08:06] warning(105): ASN_1.g:621:9: no lexer rule corresponding to 
> token: NUMBER
> [08:08:06] warning(105): ASN_1.g:465:30: no lexer rule corresponding to 
> token: HSTRING
> 
> ...lots of these. You won't get anywhere until you have defined all the 
> tokens you are trying to use in the parser.

They're defined in the tokens section, and it only gives warnings - it 
was done that way because I didn't realise that fragment lexer rules 
still created the token definition.

> If you can post one that has all the tokens defined correctly, I will 
> try to help. Here's some more tips:
> 
> Take your tokens out of the tokens section and make them real rules, as in:
> 
>  BY='BY';
> 
> Becomes a keyword token in the lexer def:
> 
> // TOkens...
> //
>  BY:  'BY';
> 
> etc..

I've gone back and forth on which of those ways to do it - there's 
contradictary advice out there. The decider was something saying that 
using the tokens section ensured that they were given priority.

> // Existing lexer
> 
> Then resolve your token confusion - I think that you need:
> 
> fragment NUMBER:;
> fragment HSTRING:;
> 
> To define those token, but you could rename NUMBERf to be NUMBER;

As I mentioned above, I'll be making that change.

> You are trying to use backtrack in a lexer rule, and that does not work, 
> backtrack is for parser. However you are also using skip() in CSTRING 
> (which will drop the whole token, then setting $channel = DEFAULT; I 
> can't tell what you are trying to do there, but the rule is incorrect. 
> Are you trying to remove the opening and closing " marks? skip() means 
> don't produce a token, not skip the current character.

Sorry, I was working on the basis of the book saying that, for most 
purposes, both work in the same way, so backtrack would work. Predicates 
still work, though?

> Format your rules so that the () : | and ; align. This will make sure 
> you don't merge alts by accident and means you can follow the alts with 
> your eyes much easier.
> 
> Once I add the missing fragments, then I reproduce your error. However, 
> I think that your CSTRING lexer rule is neither matching what you think 
> it is, nor will return the text you want.
> 
> 
> With formatting, it looks like this:
> 
> fragment
> CSTRINGNL : WSNONL* NL WSNONL*;
> 
> CSTRING
>     : '"'
>       (
>                    (WS)=> (      (CSTRINGNL)=>CSTRINGNL 
>                            | WSNONL+
>                          )
>                | '"' '"'
>                | ~'"'
>           )*
>         '"'
>        {$channel=DEFAULT;}
>        ;
> 
> 
> Which means that you second predicate (CSTRINGNL)=> does not 
> disambiguate anything and the first WS is redundant. I think I see what 
> you are trying to do:
> 
> CSTRING
>     :    '"'
>             (
>                   (CSTRINGNL)=>CSTRINGNL
>                 | ('""')=>'""'
>                 | ~'"'
>             )*
>         '"'
>     ;

That's actually what I had before I started trying to debug the thing. 
That's my next task to tackle. I'm getting rid of the skip()s (which 
were there because someone told me that C: A B {skip();} A was 
equivalent to the old C: A !B A), and writing code to strip the unwanted 
bits in Java.

> Next, your type rule. As mentioned in prior responses, this is happening 
> because the normative grammar does not specify how to write a parser and 
> so you have dangling ambiguity with the optional constraint sequence. 
> Because constraint starts with '(' but yet is optional, it is ambiguous 
> after sequenceOfType builtinType constraint (which AW shows you, but 
> there other possibilities.
> 
> I don't know what how you want this to bind of course, but assuming that 
> nearest match wins, and that you really wnat between 0 and infinite 
> constraints and not just one, then you can have the constraints bind to 
> the type using:
> 
> type : (typeWithConstraint | builtinType | referencedType) 
> (('(')=>constraint)* ;
> 
> Given that the sequence is 0 or more, if this is not what you want, then 
> your grammar is incorrect  and the constraint is too low down in the 
> rules ;-) However, given that this is probably here because you typed it 
> in from some spec, then it doubles the chances that this is what you want.

That is indeed what I want. Thanks.

> Next your globalModuleReference rule, which is causing problems again 
> because (objectIdentifierValue | definedValue) is optional so the start 
> set for these rules is ambiguous. definedValue can be 
> externalValueReference | valueReference and the valueReference is LCID. 
> However if that option is not taken then you can still get to LCID via 
> valueAssignment, which starts with valueReference. The reason for this 
> appears to be the fact that there are no statement terminators in this 
> grammar. This may be because you have hidden NL but the language is 
> actually NL sensitive, which means all rules with trailing optional 
> elements cannot be disambiguated if they happen to match the start of a 
> new statement. Are you sure that you should not be passing the NL through?

Nope, newlines are not significant, and there's not really such a thing 
as a statement - it's really more of a declarative than procedural. 
Defining types and values.

> Again, I cannot tell where this should be associating but I strongly 
> suspect that it should associate immediately. The problem is though, 
> that without the statement terminator, then adding a predicate here will 
> rid you of the error, but may consume something that looks like a value 
> reference, and as I get to typing in this line, I think that the main 
> problem is that you cannot hide NLs for this 'language' as they are 
> significant as statement terminators. I tried to look up the specs but 
> every link on http://www.itu.int is broken, which figures as it matches 
> the language ;-)
> 
> So, you can solve the globalModuleReference ambiguities by creating 
> predicates, that include the start tokens, but this may not be correct. 
> I suspect that this is not ambiguous in the language because of the NL. 
> read the spec and see if it says something like "A statement is 
> terminated with a newline...." If it does then you cannot hide the NLs, 
> at least not without lots of tricks that you do not know about yet.

It actually says nothing about statements whatsoever; it's not 
procedural. It also says that, outside of cstrings (and even there in 
some senses), whitespace is not significant, and newlines (which are 
defined quite broadly) are counted as whitespace.

Most blocks are ended by either an END or a relevant closing bracket 
(parentheses, square, or braces), and "statement" level stuff is mostly 
ended simply by finishing... it's not the most handy language for parsing.

> Two final things:
> 
> Once again, I urge you to start again and go a little more slowly - the 
> reason that you cannot solve these things is that they are nested too 
> deep in the makeup now for you to see what is going on. Even trying to 
> retro fit the newline stuff (assuming this is the issue) is going to be 
> difficult for you at this point.
> 
> Finally, I see that your email address is from Lancaster Poly. This 
> isn't your homework/graduate project/thesis is it? I have been caught by 
> this before with someone from Sheffield Poly who offered to pay me then 
> disappeared after  receiving the grammars. I await my Comp. Sci. degree 
> from there with some impatience ;-)

Well, it's not a poly for starters, and never has been ;p

However, it's not any of those things. Plus I've not offered anyone 
money ;).

I see this more as help understanding ANTLR than help implementing 
ASN.1, although it serves as such as well. There is an expectation that 
this will end up as part of a new open source project, but we'll see how 
it goes.

Actually, I'm currently a student (part time) on an MA in Educational 
Research. I already got my CompSci BSc and MSc. I can understand that I 
might have given a misleading impression, but I'm really not as clueless 
as it seems you assume :)

-- 
Sam Barnett-Cormack