[antlr-interest] [Fwd: Ignoring tokens in AnTLR+Python]
Daniel Hernandez Bahr
dbahr at estudiantes.uci.cu
Fri Mar 5 06:37:34 PST 2010
I should add that i DO have a whitespace rule defined as follow:
protected WS: ( ' '
| '\t'
| '\f'
// handle newlines
| ( ("\r\n") => "\r\n" // Evil DOS
| '\r' // Macintosh
| '\n' // Unix (the right way)
)
{ $newline; }
)
{ _ttype = SKIP; }
;
Is there anything wrong with the rule? I have no clue about why am
getting the unexpected char thing.
Does anyone?
Best regards,
D.H. Bahr
Daniel Hernandez Bahr wrote:
> Hi all.
>
> Sorry for posting the hole "parsing files" thing, it was rather stupid
> and i figured it out later.
>
> The thing is, I have defined the rules for the assignments and macros,
> and made a sample input file containing only such instructions i'm
> posting the first lines so you have a clearer idea:
>
> SWIG_LDFLAGS="$LDFLAGS"
> INSTALL="$abs_srcdir/$INSTALL"
> APR_VER_REGEXES=["0\.9\.[7-9] 0\.9\.1[0-9] 1\."]
> APU_VER_REGEXES=["0\.9\.[7-9] 0\.9\.1[0-9] 1\."]
>
> as can be seen there are only assignments in the first four lines and
> the assignment rule looks like this:
>
> sentences : sentence (sentences)?;
>
> sentence : assignment | macro;
>
> assignment : w:WORD EQUAL^ v:value
> {
> w = w.getText()
> e = Exception ("%s is not a valid identifier" %(w))
> print w, "::",
> if (not w[0].isalpha()):
> raise e
> else:
> try:
> index = w.index(".")
> index = w.index("-")
> raise e
> except ValueError:
> index = -1
> }
> ;
>
> value : WORD | s:STRINGLIT
> {
> print s.getText()
> }
> ;
>
> yet when i run the lexer/parser script i get this:
>
> "$LDFLAGS"
> SWIG_LDFLAGS :: UNEXPECTED CHAR: 0xA
>
> does anyone knows what i am doing wrong here??
>
> Best regards,
>
> D.H. Bahr
> Daniel Hernandez Bahr wrote:
>
>> I am back.
>>
>> I've just realized that the ignoring should be done in Parser (not in
>> Lexer), so I made some adjustments and tried again the construction:
>>
>> sentence: assignment | macro | other;
>> other: ~(assignment | macro);
>>
>> and now I'm getting that the subrule cannot be inverted. Only subrules
>> of the form:
>> (T1|T2|T3...) or
>> ('c1'|'c2'|'c3'...)
>> may be inverted (ranges are also allowed).
>>
>> So I am back to the same problem:
>>
>> How do I ignore the other sentences i don't need?
>>
>> Best regards,
>>
>> D.H. Bahr.
>>
>> -------- Original Message --------
>> Subject: [antlr-interest] Ignoring tokens in AnTLR+Python
>> Date: Thu, 04 Mar 2010 10:14:23 -0500
>> From: Daniel Hernandez Bahr <dbahr at estudiantes.uci.cu>
>> To: antlr-interest at antlr.org <antlr-interest at antlr.org>
>> References:
>> <4a051d931003031537ib220a57jf896cd43fbb5d319 at mail.gmail.com>
>> <eae205eee3744a458a11b871a47d2bfe at temporal-wave.com>
>> <9362e74e1003040511s48ff2e25h828466dc5639aea1 at mail.gmail.com>
>>
>>
>>
>> Hello everyone!
>>
>> I am fairly new to AnTLR. I am working on an interpreter for
>> configuration files ('configure.ac' files i should say); but I don't
>> need to scan every single token on the files, only variable assignments
>> and one or another macro so, my question is:
>>
>> How can I ignore every other sentence on the files?
>>
>> At first I intended to do something like
>>
>> SENTENCE: ASSIGNMENT | MACRO | OTHER;
>> OTHER: ~(ASSIGNMENT | MACRO)
>>
>> but i get that ~TOKEN is not allowed in lexer. Is there a way to achieve
>> this without me having to define the entire grammar of 'configure.ac' files?
>>
>> Best regards,
>>
>> D.H. Bahr
>>
>> PS: As remarked in subject I am using python and not Java or C.
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list