[antlr-interest] How to distinguish between an assembler comment and a literal integer

Kirby Bohling kirby.bohling at gmail.com
Tue Sep 28 12:36:34 PDT 2010


Do the #'s that start a comment have to be the first character?  Can
there ever be a '#' that is a literal on the first character of the
line?

If those questions are respectively: Yes, and No, then you use a
predicate to determine if the '#' is the first character of the line,
and pick the token type based upon that.  I'd also use the approach of
spotting a '#' and then picking comment or not a comment, rather then
the approach you are taking.

I think what you want is some type of predicate or look ahead that has
two alternatives, one to use if think it is a hex value, and one if
you think it is a comment.

That example is way more powerful then what you actually need, but it
has what you want buried in it if you dig it out:

http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs

That example uses a lot of look ahead, and has more advanced error
handling.  Understanding it is very useful if you want to produce a
more useful and end user friendly parser/grammar.

Kirby


2010/9/28 Antonio Martínez Álvarez <amartinez at atc.ugr.es>:
> Hi All ,
> I'm working on an MSP430 assembly parser and I have this problem:
>
> Fist of all this is a possible input for my grammar:
>
> ###################
> #  Theese are comments     #
> #                                            #
> #    bla bla bla                      #
> ###################
> labelA:
> MOV.W  #0x1234, R5
>
>
>
> As you an see '#' is used either for an introductory log and also to
> express a literal hex integer.
>
> I'm trying something like (without success):
>
>
> HEX_LITERAL    :   '0x' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT ;
> fragment HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
>
> COMMENT
>     :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
>     |    '#' '0x' => HEX_LITERAL
>     |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
>     ;
>
>
> Result: [20:06:35] error(100): msp430.g:111:11: syntax error: antlr:
> msp430.g:111:11: unexpected token: '0x'
>
>
> Could you please help me? How can I capture literal integer within this
> grammar?
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list