[antlr-interest] Is this the best way to have an ordered series of optional tokens with a minimum size?

Alex Chaphiv achaphiv at gmail.com
Sat Jun 23 15:07:36 PDT 2012


Hi,

I have a working solution, but I'm wondering if I can represent the
following in a smaller grammar.

I have a series of ISO 3166 country codes that must come in order:
Here's the defined tokens:

ABW:'ABW'; AFG:'AFG'; AGO:'AGO'; AIA:'AIA'; ALA:'ALA'; ALB:'ALB';
AND:'AND'; ANT:'ANT'; ARE:'ARE'; ARG:'ARG'; ARM:'ARM'; ASM:'ASM'; ATA:'ATA';
ATF:'ATF'; ATG:'ATG'; AUS:'AUS'; AUT:'AUT'; AZE:'AZE'; BDI:'BDI'; BEL:'BEL';
BEN:'BEN'; BFA:'BFA'; BGD:'BGD'; BGR:'BGR'; BHR:'BHR'; BHS:'BHS'; BIH:'BIH';
(over 200 of these)


So now I want them to come in order and there can at most one of each
option:

          countriesInOrder:

    ABW? AFG? AGO? AIA? ALA? ALB? AND? ANT? ARE? ARG? ARM? ASM? ATA? ATF?
ATG?
    AUS? AUT? AZE? BDI? BEL? BEN? BFA? BGD? BGR? BHR? BHS? BIH? BLM? BLR?
BLZ?
    BMU? BOL? BRA? BRB? BRN? BTN? BVT? BWA? CAF? CAN? CCK? CHE? CHL? CHN?
CIV?
    (and so on)

This is fine, but I now at least 2 countries must appear, so I use the
following:

atLeast2CountriesInOrder:
    (country country) => countriesInOrder
    | INVALID_UNKNOWN_TOKEN // this part seems especially wrong to me

country:

ABW|AFG|AGO|AIA|ALA|ALB|AND|ANT|ARE|ARG|ARM|ASM|ATA|ATF|ATG|AUS|AUT|AZE|BDI

 |BEL|BEN|BFA|BGD|BGR|BHR|BHS|BIH|BLM|BLR|BLZ|BMU|BOL|BRA|BRB|BRN|BTN|BVT|BWA

 |CAF|CAN|CCK|CHE|CHL|CHN|CIV|CMR|COD|COG|COK|COL|COM|CPV|CRI|CUB|CXR|CYM|CYP
   (and so on)

// defined way at bottom after all other tokens and should never occur
INVALID_UNKNOWN_TOKEN: ('A'..'Z')+;

Now I have to do this 5+ more times for similar lists of tokens.  It seems
like an awful lot of duplication and I'm concerned about the use of
INVALID_UNKNOWN_TOKEN.

So it there a better way?


More information about the antlr-interest mailing list