[antlr-interest] Handling range-limited tokens
A Z
asicaddress at gmail.com
Tue Sep 7 11:22:25 PDT 2010
Hello all,
The grammar I am trying to implement has many cases where the terminals
are special cases of identifiers. Below is an excerpt from the EBNF.
seq_input_list ::= level_input_list | edge_input_list
level_input_list ::= level_symbol { level_symbol }
edge_input_list ::= { level_symbol } edge_indicator { level_symbol }
edge_indicator ::= ( level_symbol level_symbol ) | edge_symbol
current_state ::= level_symbol
next_state ::= output_symbol | -
output_symbol ::= 0 | 1 | x | X
level_symbol ::= 0 | 1 | x | X | ? | b | B
edge_symbol ::= r | R | f | F | p | P | n | N | *
simple_identifier ::= [ a-zA-Z_ ] { [ a-zA-Z0-9_$ ] }
My ANTLR grammar is coded like this
edge_input_list :
level_symbol* edge_indicator level_symbol*;
edge_indicator :
LPARAN level_symbol level_symbol RPARAN
| edge_symbol;
current_state :
level_symbol;
next_state :
output_symbol
| MINUS;
output_symbol :
BINNUM; // 0 | 1 | x | X
level_symbol :
BINNUM
| SIMPLE_IDENT; // 0 | 1 | x | X | ? | b | B
edge_symbol :
ASTERISK
| SIMPLE_IDENT; // r | R | f | F | p | P | n | N | *
I now have a problem where ANTLR can't resolve level_symbol* in rule
edge_input_list because both level_symbol and edge_indicator(through
edge_symbol) resolve to a SIMPLE_IDENT token. However you'll notice the
actual characters allowed are unique for each terminal. What is the best way
to handle this?
Originally I had separate tokens for each of the characters and made
simple_ident a parser rule as follows:
ANYCASER : 'r' | 'R';
ANYCASEB : 'b' | 'B';
SIMPLE_IDENT : (Alpha | '_') ('0'..'9' | 'a'..'z' | 'A'..'Z' | '_' | '$')*;
simple_identifier : SIMPLE_IDENT | ANYCASEB | ANYCASER | ...;
This works but quickly becomes unwieldy as there are other places in the
grammar that have similar situations using overlapping character sets.
Thanks
More information about the antlr-interest
mailing list