[antlr-interest] Is this the best way to have an ordered series of optional tokens with a minimum size?
Alex Chaphiv
achaphiv at gmail.com
Sat Jun 23 15:07:36 PDT 2012
Hi,
I have a working solution, but I'm wondering if I can represent the
following in a smaller grammar.
I have a series of ISO 3166 country codes that must come in order:
Here's the defined tokens:
ABW:'ABW'; AFG:'AFG'; AGO:'AGO'; AIA:'AIA'; ALA:'ALA'; ALB:'ALB';
AND:'AND'; ANT:'ANT'; ARE:'ARE'; ARG:'ARG'; ARM:'ARM'; ASM:'ASM'; ATA:'ATA';
ATF:'ATF'; ATG:'ATG'; AUS:'AUS'; AUT:'AUT'; AZE:'AZE'; BDI:'BDI'; BEL:'BEL';
BEN:'BEN'; BFA:'BFA'; BGD:'BGD'; BGR:'BGR'; BHR:'BHR'; BHS:'BHS'; BIH:'BIH';
(over 200 of these)
So now I want them to come in order and there can at most one of each
option:
countriesInOrder:
ABW? AFG? AGO? AIA? ALA? ALB? AND? ANT? ARE? ARG? ARM? ASM? ATA? ATF?
ATG?
AUS? AUT? AZE? BDI? BEL? BEN? BFA? BGD? BGR? BHR? BHS? BIH? BLM? BLR?
BLZ?
BMU? BOL? BRA? BRB? BRN? BTN? BVT? BWA? CAF? CAN? CCK? CHE? CHL? CHN?
CIV?
(and so on)
This is fine, but I now at least 2 countries must appear, so I use the
following:
atLeast2CountriesInOrder:
(country country) => countriesInOrder
| INVALID_UNKNOWN_TOKEN // this part seems especially wrong to me
country:
ABW|AFG|AGO|AIA|ALA|ALB|AND|ANT|ARE|ARG|ARM|ASM|ATA|ATF|ATG|AUS|AUT|AZE|BDI
|BEL|BEN|BFA|BGD|BGR|BHR|BHS|BIH|BLM|BLR|BLZ|BMU|BOL|BRA|BRB|BRN|BTN|BVT|BWA
|CAF|CAN|CCK|CHE|CHL|CHN|CIV|CMR|COD|COG|COK|COL|COM|CPV|CRI|CUB|CXR|CYM|CYP
(and so on)
// defined way at bottom after all other tokens and should never occur
INVALID_UNKNOWN_TOKEN: ('A'..'Z')+;
Now I have to do this 5+ more times for similar lists of tokens. It seems
like an awful lot of duplication and I'm concerned about the use of
INVALID_UNKNOWN_TOKEN.
So it there a better way?
More information about the antlr-interest
mailing list