Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:22937 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 6950 invoked by uid 1010); 24 Apr 2006 21:37:41 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 6935 invoked from network); 24 Apr 2006 21:37:41 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 Apr 2006 21:37:41 -0000 X-PHP-List-Original-Sender: andrei@gravitonic.com X-Host-Fingerprint: 204.11.219.139 lerdorf.com Linux 2.5 (sometimes 2.4) (4) Received: from ([204.11.219.139:42485] helo=lerdorf.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id 81/E5-19715-5254D444 for ; Mon, 24 Apr 2006 17:37:41 -0400 Received: from [66.228.175.145] (borndress-lm.corp.yahoo.com [66.228.175.145]) (authenticated bits=0) by lerdorf.com (8.13.6/8.13.6/Debian-1) with ESMTP id k3OLbcL7012871; Mon, 24 Apr 2006 14:37:38 -0700 In-Reply-To: <409317664cec60ee0d2086846f050abf@gravitonic.com> References: <7.0.1.0.2.20060413154916.014b1d88@zend.com> <7.0.1.0.2.20060413160149.03eb5d00@zend.com> <409317664cec60ee0d2086846f050abf@gravitonic.com> Mime-Version: 1.0 (Apple Message framework v623) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-ID: Content-Transfer-Encoding: 7bit Cc: Dmitry Stogov , PHP Internals Date: Mon, 24 Apr 2006 14:36:58 -0700 To: Andi Gutmans X-Mailer: Apple Mail (2.623) Subject: Re: [PHP-DEV] Re: Unicode conversion exceptions and memory leaks From: andrei@gravitonic.com (Andrei Zmievski) So, no particular opinions on this, aside from Markus's? I hoped this proposal would mollify both camps.. -Andrei On Apr 19, 2006, at 2:32 PM, Andrei Zmievski wrote: > I've had some time to think about this and Derick and I also kicked > around some ideas in a private conversation. > > The situation I am talking about is really about exceptional > circumstances, such as ISO-8859-1 string being treated as a UTF-8 one > or some other condition that results in illegal sequences. This is > very different from an unassigned character condition, which is > handled by SUBST, SKIP, etc callbacks. I disagree with the notion that > this is similar to (int)"foo" example. There, we have a well defined > semantics that say "strings not starting with a number get converted > to 0". Treating ISO-8859-1 data as UTF-8 is simply invalid and bad > behavior and should not be encouraged by silently ignoring the > conversion error. > > Now, I understand that there is resistance to the use of exceptions in > this case and I see the point of those who are against them. My > problem is this: if we do not throw exceptions, then all we are left > with is a warning, which is not helpful if you want to determine in a > programmatic fashion whether there was a conversion error. Sure, you > can check the return value of unicode_decode(), or maybe even fread() > and such, but it does not help with casting, concatenation, and other > similar operations. So, we do need a mechanism for this and it has to > be a fairly flexible one because libraries may want to do one thing on > failure, and application itself -- another. > > The best Derick and I could come up with is a user-specified > conversion error handler. It would be invoked only when the converter > encounters an illegal sequence or other serious error. The existing > subst, skip, etc error modes would still apply. The error handler > signature would be something like: > > function my_handler($direction, $encoding, $string, $char_byte, > $offset) { .. } > > Where $direction is the direction of conversion (FROM_UNICODE or > TO_UNICODE), $encoding is the name of the encoding in use during the > attempted conversion, $string is the source string that converter > tried to process, $char_byte is either failed Unicode character or > byte sequence (depending on direction), and $offset is the offset of > that character/byte sequence in the source string. The user error > handler then is free to silence the warning, throw an exception (throw > UnicodeConversionException($message, $direction, $char_byte, $offset), > or do something else. I have no yet decided whether it's a good idea > to allow user handler to continue the conversion or not. I'd rather > the conversion always stopped. > > -Andrei > > On Apr 13, 2006, at 4:02 PM, Andi Gutmans wrote: > >> Yeah but we can't only tailor to the default. If you cast "abc" to an >> integer today PHP will do the conversion (e.g. 0). I think we should >> stick to that paradigm and provide users with validation methods if >> they want to strictly validate... > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php