[antlr-interest] Order of token matching
Jenny Balfer
ai06087 at Lehre.BA-Stuttgart.De
Wed Sep 3 09:34:10 PDT 2008
No, it is too long. But I reproduced the error in a short one.
An example for the occuring error would be the following string:
isWorking = function(param1,param2) {
some implementation;
some expressions;
}
function throwsError(param1, param2) {
// this is a nasty comment {
something else
}
function isIgnored() {
// lexer is still searching for a closing brace
}
***********
* GRAMMAR *
***********
program : statement*
;
statement
: '(' statement ')'
| declaration
;
declaration
: ID '=' FUNCTION '(' paramList ')'
| FUNCTION ID '(' paramList ')'
;
paramList
: ID (',' ID)*
;
fragment LDOC
: '/**'
;
fragment MLCOM
: '/*'
;
fragment SLCOM
: '//'
;
fragment RCOM
: '*/'
;
FUNCTION: 'function'
;
COMMENT
: SLCOM (options{greedy=false;}: .)* NL {skip();}
| MLCOM (options{greedy=false;}: .)* RCOM {skip();}
;
IMPL
: '{' (IMPL|~'}')* '}' {skip();}
;
NL
: '\r' {skip();}
| '\n' {skip();}
;
WS
: ' ' {$channel=HIDDEN;}
| '\t' {skip();}
;
ID : ( LETTER | '$' | '_' ) ( LETTER | '$' | '_' | DIGIT )*
;
fragment LETTER
: 'A'..'Z'
| 'a'..'z'
;
fragment DIGIT
: '0'..'9'
;
DEFAULT : .
;
On Wed, 03 Sep 2008 09:21:00 -0700, Jim Idle <jimi at temporal-wave.com>
wrote:
> On Wed, 2008-09-03 at 18:14 +0200, Jenny Balfer wrote:
>
>> Thanks for that, but unfortunately this does not solve the problem. I
>> declared MLCOM etc. as fragment, but COMMENT and IMPL must not be
> fragments
>> in order to skip them.
>
>
> Have you shown all of your grammar here?
>
> Jim
>
>>
>> On Wed, 03 Sep 2008 09:05:48 -0700, Jim Idle <jimi at temporal-wave.com>
>> wrote:
>> > On Wed, 2008-09-03 at 18:00 +0200, Jenny Balfer wrote:
>> >
>> >> Hello guys,
>> >>
>> >> I think I have too little understanding of the work of my lexer. I
>> > thought
>> >> the rules that are specified first are matched first, but in my
> grammar
>> >> this is not the case.
>> >> What I am trying to do is first skipping all comments of my source
>> > files,
>> >> and then skipping everything between curly braces:
>> >>
>> >
>> >
>> > Make sure that any token that you don't want returned to the parser is
> a
>> > fragment:
>> >
>> > fragment
>> > MLCOM : '/*' ;
>> >
>> > etc. Then you should have more luck, your comment lead-ins are
> matching
>> > the MLCOM and SLCOM rules and then likely throwing recognition errors
>> > for the rest up until the '{'
>> >
>> > Jim
>> >
>> >
>> >> MLCOM : '/*'
>> >> ;
>> >> SLCOM : '//'
>> >> ;
>> >> RCOM : '*/'
>> >> ;
>> >> NL : '\r' {skip();}
>> >> | '\n' {skip();}
>> >> ;
>> >> WS : ' ' {$channel=HIDDEN;}
>> >> | '\t' {skip();}
>> >> ;
>> >>
>> >> COMMENT : SLCOM (options{greedy=false;}: .)* NL {skip();}
>> >> | MLCOM (options{greedy=false;}: .)* RCOM {skip();}
>> >> ;
>> >> IMPL : '{' (IMPL|'}')* '}' {skip();}
>> >> ;
>> >>
>> >> Rule IMPL matches everything between curly braces, but in between
> counts
>> >> them (by recursively calling itself).
>> >> Now the problem appears if there are braces in comments:
>> >>
>> >> someFunction = function(a,b) {
>> >> // this is one brace too much: {
>> >> }
>> >>
>> >> My lexer now sees the opening brace in the comment and searches for
> the
>> >> closing one until the end of file, which results in:
>> >> mismatched character '<EOF>' expecting '}'
>> >>
>> >> What I want my lexer to do is first sort out all comments, and second
>> > sort
>> >> out everything between curly braces. Are there any predicates that
> could
>> >> cause this?
>> >>
>> >> Thanks!
>> >>
>> >>
>> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> >> Unsubscribe:
>> >
>>
>
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>> >>
>>
More information about the antlr-interest
mailing list