Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:22940 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 83742 invoked by uid 1010); 25 Apr 2006 02:16:37 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 83721 invoked from network); 25 Apr 2006 02:16:36 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 25 Apr 2006 02:16:36 -0000 X-PHP-List-Original-Sender: andi@zend.com X-Host-Fingerprint: 80.74.107.235 mail.zend.com Linux 2.5 (sometimes 2.4) (4) Received: from ([80.74.107.235:27759] helo=mail.zend.com) by pb1.pair.com (ecelerity 2.0 beta r(6323M)) with SMTP id D2/DE-19715-0868D444 for ; Mon, 24 Apr 2006 22:16:33 -0400 Received: (qmail 29413 invoked from network); 25 Apr 2006 02:16:26 -0000 Received: from localhost (HELO ANDI-NOTEBOOK.zend.com) (127.0.0.1) by localhost with SMTP; 25 Apr 2006 02:16:26 -0000 Message-ID: <7.0.1.0.2.20060424191401.04019250@zend.com> X-Mailer: QUALCOMM Windows Eudora Version 7.0.1.0 Date: Mon, 24 Apr 2006 19:16:24 -0700 To: Andrei Zmievski Cc: Dmitry Stogov ,PHP Internals In-Reply-To: References: <7.0.1.0.2.20060413154916.014b1d88@zend.com> <7.0.1.0.2.20060413160149.03eb5d00@zend.com> <409317664cec60ee0d2086846f050abf@gravitonic.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Subject: Re: [PHP-DEV] Re: Unicode conversion exceptions and memory leaks From: andi@zend.com (Andi Gutmans) I'm wondering whether it's technically feasible that any places where such a conversion could fail would be allowed to throw an exception (i.e. internal functions, stream handlers, INI reader, etc...) At 02:36 PM 4/24/2006, Andrei Zmievski wrote: >So, no particular opinions on this, aside from Markus's? I hoped >this proposal would mollify both camps.. > >-Andrei > >On Apr 19, 2006, at 2:32 PM, Andrei Zmievski wrote: > >>I've had some time to think about this and Derick and I also kicked >>around some ideas in a private conversation. >> >>The situation I am talking about is really about exceptional >>circumstances, such as ISO-8859-1 string being treated as a UTF-8 >>one or some other condition that results in illegal sequences. This >>is very different from an unassigned character condition, which is >>handled by SUBST, SKIP, etc callbacks. I disagree with the notion >>that this is similar to (int)"foo" example. There, we have a well >>defined semantics that say "strings not starting with a number get >>converted to 0". Treating ISO-8859-1 data as UTF-8 is simply >>invalid and bad behavior and should not be encouraged by silently >>ignoring the conversion error. >> >>Now, I understand that there is resistance to the use of exceptions >>in this case and I see the point of those who are against them. My >>problem is this: if we do not throw exceptions, then all we are >>left with is a warning, which is not helpful if you want to >>determine in a programmatic fashion whether there was a conversion >>error. Sure, you can check the return value of unicode_decode(), or >>maybe even fread() and such, but it does not help with casting, >>concatenation, and other similar operations. So, we do need a >>mechanism for this and it has to be a fairly flexible one because >>libraries may want to do one thing on failure, and application >>itself -- another. >> >>The best Derick and I could come up with is a user-specified >>conversion error handler. It would be invoked only when the >>converter encounters an illegal sequence or other serious error. >>The existing subst, skip, etc error modes would still apply. The >>error handler signature would be something like: >> >> function my_handler($direction, $encoding, $string, $char_byte, >> $offset) { .. } >> >>Where $direction is the direction of conversion (FROM_UNICODE or >>TO_UNICODE), $encoding is the name of the encoding in use during >>the attempted conversion, $string is the source string that >>converter tried to process, $char_byte is either failed Unicode >>character or byte sequence (depending on direction), and $offset is >>the offset of that character/byte sequence in the source string. >>The user error handler then is free to silence the warning, throw >>an exception (throw UnicodeConversionException($message, >>$direction, $char_byte, $offset), or do something else. I have no >>yet decided whether it's a good idea to allow user handler to >>continue the conversion or not. I'd rather the conversion always stopped. >> >>-Andrei >> >>On Apr 13, 2006, at 4:02 PM, Andi Gutmans wrote: >> >>>Yeah but we can't only tailor to the default. If you cast "abc" to >>>an integer today PHP will do the conversion (e.g. 0). I think we >>>should stick to that paradigm and provide users with validation >>>methods if they want to strictly validate... >> >>-- >>PHP Internals - PHP Runtime Development Mailing List >>To unsubscribe, visit: http://www.php.net/unsub.php