Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:18802 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 79856 invoked by uid 1010); 12 Sep 2005 14:54:31 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 79841 invoked from network); 12 Sep 2005 14:54:31 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 12 Sep 2005 14:54:31 -0000 X-Host-Fingerprint: 80.74.107.235 mail.zend.com Linux 2.5 (sometimes 2.4) (4) Received: from ([80.74.107.235:48402] helo=mail.zend.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id CE/01-27924-6A695234 for ; Mon, 12 Sep 2005 10:54:31 -0400 Received: (qmail 10600 invoked from network); 12 Sep 2005 14:54:25 -0000 Received: from localhost (HELO zeev-notebook.zend.com) (127.0.0.1) by localhost with SMTP; 12 Sep 2005 14:54:25 -0000 Message-ID: <5.1.0.14.2.20050912174356.051c7a70@localhost> X-Sender: zeev@localhost X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Mon, 12 Sep 2005 17:54:25 +0300 To: Rasmus Lerdorf Cc: internals In-Reply-To: <432534E4.9020602@lerdorf.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Subject: Re: [PHP-DEV] ref fix revisited From: zeev@zend.com (Zeev Suraski) References: <432534E4.9020602@lerdorf.com> At 10:57 12/09/2005, Rasmus Lerdorf wrote: >Guys, could we take a look at making the ref to temp var fix a bit >narrower? Currently we try to catch it at call-time. This means that >something like: > > current(explode(' ','a b')) > >as per bug #34468 doesn't work. Now, I think there is a secondary bug >here. I see no reason for current() to take a by-ref, so this >particular one could be easily fixed. Yep, we need to fix that one. > But there are many other cases >where a function legitimately takes a by-ref and doesn't necessarily >write to it or the write is a secondary action not required for the code >to work. Could we not catch this on the write instead of on the call? The problem is that there's no way to tell that element apart at that time. It's too late. As soon as we treat a read-only zval as if it's read/write (take a ** instead of a *), it's too late, since we can't really detect later on where it came from. >The memory problem happens on the write. Or perhaps better, an E_NOTICE >or E_STRICT on the call and an E_FATAL on the write. The current >E_FATAL on the call seems out of whack. I don't really agree that it's out of whack, since you are passing a piece of data by reference, which is an undefined behavior. I agree that it would have been nice if we could allow for this and only complain if the data is written to in the function (in the PHP spirit of 'just work!'), but I don't see how that would be possible. >Gallery, for example, broke in a rather subtle way in their >gallery_remote2.php script which meant the various client-side tools >like iphototogallery and others got a cryptic "no album at URL" error >message. I had to break out ethereal to track it down to a couple of >functions where read-only function args were marked as by-ref. So they >didn't actually have a memory corruption problem yet the E_FATAL killed >them. You'd have to agree that it is a bug on Gallery's part, though, right? If they're read only, they shouldn't have been marked as by-ref (yes, we screwed up by only introducing this error now after it worked for years, but it's still problematic to let it go on working and create possible corruption). >SquirrelMail has code like this all over the place: > > $value = strtolower(array_shift(split('/\w/',trim($value)))); > >Here array_shift() does of course change the arg, so that is a potential >problem. And yes, that's a dumb way to do this, but people write code >like this. In some of these array manipulation calls, which seems to >account for a number of the BC problems we are having, we could check >for a non-ref and behave slightly differently. In the case of >array_shift() we could return the first arg and throw a notice. Same >would go for reset(), end(), next(), prev() and probably a few others. We could probably provide a way for internal functions to denote that they're handling by-ref 'wannabe' arguments, so that we won't generate a fatal error for them. We'd still have to do it for userland functions (and any other internal function). I'm not sure if it's worth it..? Zeev