Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43286 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 54431 invoked from network); 10 Mar 2009 01:28:42 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 10 Mar 2009 01:28:42 -0000 Authentication-Results: pb1.pair.com header.from=shire@tekrat.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=shire@tekrat.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain tekrat.com from 208.43.138.18 cause and error) X-PHP-List-Original-Sender: shire@tekrat.com X-Host-Fingerprint: 208.43.138.18 sizzo.org Linux 2.6 Received: from [208.43.138.18] ([208.43.138.18:50479] helo=sizzo.org) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D5/2A-30036-942C5B94 for ; Mon, 09 Mar 2009 20:28:42 -0500 Received: from shirebook.local (outbound500a.pasd.tfbnw.net [204.15.21.171]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by sizzo.org (Postfix) with ESMTPSA id 304F4CBE472; Mon, 9 Mar 2009 18:28:38 -0700 (PDT) Message-ID: <49B5C242.8080504@tekrat.com> Date: Mon, 09 Mar 2009 18:28:34 -0700 User-Agent: Postbox 1.0b8 (Macintosh/2009030315) MIME-Version: 1.0 To: Matt Wilmas CC: PHP Internals List , Lukas Kahwe Smith References: <49B57F4F.9080901@tekrat.com> <033E05F2D7264057AEE4FCFFD7E827AE@pc1> <49B5AD5B.908@tekrat.com> <5B7B7BA5BF8C40598CABBB39CD24EE8E@pc1> In-Reply-To: <5B7B7BA5BF8C40598CABBB39CD24EE8E@pc1> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] 5.3 items From: shire@tekrat.com (shire) Matt Wilmas wrote: > > I don't have much time right now, but looked at it quick, and see that > you're actually trying to work around the re2c issues in general. :-) I > was only thinking of putting a "band-aid" on the comment symptom(s), > since those are about the only ones that occur with valid code (is the > tokenizer ext. *supposed to* handle all tokens in code that wouldn't > really compile?). Yeah I figured I should try to fix as much as a could, specifically the YYLMIIT not enforcing availability of 'n' chars makes me nervous. ;-) I would expect that tokenizer should handle all tokens in code as long as they pass the scanner phase (not the parser phase) but I'm not sure on what the intention here is. > And yeah, about excluding \x00 from ANY_CHAR, it could > change things, since it's always been allowed, although it seems strange > that code would have literal NULLs in it (generated eval()'d code?). > That was part of the reason I couldn't come up with a generic fix while > keeping all behavior. If re2c would just remember the last matching > state it was in at EOF like Flex! > It seems to me like the crux of the problem here is that we can't integrate an EOF check (such as checking the length of data) within the regular expression. While flex allows the <> we are expected to provide a unique identifier/token to match on. This assumes that we have a unique character, or that the data is in good form so that we can detect a token etc. Perhaps a good feature to add to re2c would be able to include a special regex/token match that would identify special conditions programatically such as (YYCURSOR == YYLIMIT) etc. In defense of re2c I think it could be useful in situations to have to explicitly handle EOF, as it allows you more freedom for processing different data types. I'll have to look closer at the multi-byte processing as well. I don't see a lot of cases where we would run into \x00 values in code. (Perhaps someone can provide a suggested use case that we need to watch out for?) Perhaps if someone is including binary data strings within code?. > Otherwise, I don't know what to do. :-/ I'm going to do something else > before trying to implement what I was going to do, so there's no patch > yet... > Ok, I'll keep working on this I guess then as there's a couple more tests I want to run and fix some things before I commit (like ensuring that YYLIMIT actually ensures there are 'n' bytes available to read, etc). > As far as the Warning, with " comment ..." ? Of course your patch would restore it, because it's > missing last I checked (not able to right now). > I didn't see this in the current, un-patched, php-5.3 build but I'll double check to make sure I wasn't still using my new binaries. >>> And that applies to the case Lukas gave in the bug report: WHITESPACE >>> pattern is variable length. >> >> Didn't see/find this is there a bug # or link? > > I meant the "could be related if not the same problem" comment added the > other day in Bug #46817. > Ah, I see. Yes this was actually my friend that raised my attention about getting this fixed ;-) -shire