Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:37853 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 83959 invoked from network); 24 May 2008 20:11:07 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 May 2008 20:11:07 -0000 Authentication-Results: pb1.pair.com smtp.mail=johannes@php.net; spf=unknown; sender-id=unknown Authentication-Results: pb1.pair.com header.from=johannes@php.net; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 83.243.58.163 as permitted sender) X-PHP-List-Original-Sender: johannes@php.net X-Host-Fingerprint: 83.243.58.163 mail4.netbeat.de Received: from [83.243.58.163] ([83.243.58.163:51050] helo=mail4.netbeat.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 78/DA-21001-95678384 for ; Sat, 24 May 2008 16:11:06 -0400 Received: (qmail 15303 invoked by uid 507); 24 May 2008 20:11:00 -0000 Received: from unknown (HELO ?192.168.1.103?) (postmaster%schlueters.de@88.217.37.66) by mail4.netbeat.de with ESMTPA; 24 May 2008 20:11:00 -0000 To: Steph Fox Cc: Andrei Zmievski , Antony Dovgal , internals@lists.php.net In-Reply-To: <014201c8bdd5$8aa482a0$4401a8c0@foxbox> References: <7d6e34d80805191240k64cb1ba6k3e8f7a50ddf068c@mail.gmail.com> <4831F27B.7030001@suse.de> <296949B4-D328-49FE-968B-4942B28FE869@pooteeweet.org> <7d6e34d80805191454m69614624v7a05037fa947328e@mail.gmail.com> <698DE66518E7CA45812BD18E807866CE019F60DE@us-ex1.zend.net> <34.64.28995.1BE23384@pb1.pair.com> <02e701c8bab7$19a3dd10$4401a8c0@foxbox> <4833FD5B.2010308@daylessday.org> <003f01c8bb33$81ae5030$4401a8c0@foxbox> <48346ED3.9040505@gravitonic.com> <001501c8bbf6$5702cf50$4401a8c0@foxbox> <48371131.50003@gravitonic.com> <008f01c8bd9d$63f0cde0$4401a8c0@foxbox> <1211656471.11520.36.camel@goldfinger.johannes.nop> <014201c8bdd5$8aa482a0$4401a8c0@foxbox> Content-Type: text/plain; charset=utf-8 Date: Sat, 24 May 2008 22:11:00 +0200 Message-ID: <1211659860.11520.39.camel@goldfinger.johannes.nop> Mime-Version: 1.0 X-Mailer: Evolution 2.12.3 (2.12.3-4.fc8) Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] Unicode progress [Was: unicode.semantics adinfinitum] From: johannes@php.net (Johannes =?ISO-8859-1?Q?Schl=FCter?=) Steph, On Sat, 2008-05-24 at 20:37 +0100, Steph Fox wrote: > Heya Johannes, > > > For some functions taking binary strings is critical for working nicely > > with an automatic conversion in this case > > crc32(u"äöü") > > and > > crc32(b"äöü") > > would give completely different results depending on the runtime > > encoding, > > Yes - but why should the user have to do the casting? Why can't the function > itself cast to binary when it has an 'S' modifier? Like, during > zend_parse_parameters() for example? Whatever happened to keeping PHP > simple? Since only the users knows the correct encoding. The function might fallback to unicode.runtime_encoding which can be wrong. And it's hard to track the reason. > relying on a implicit conversion there is most likely a bug > > (at least for apps written with PHP 6 in mind). > > > > Oh and I might probably also argue that > > crc32(u"äöü") > > should give the crc32 of the internal representation (utf-16...) of the > > string, which is a total wtf for the user then. > > Nobody's asking to be able to cast it to unicode. I'm asking whether it's > entirely necessary to force users to cast to binary all over the place, and > a strict binary parameter spec looks like being one place where the cast > could be done internally. In this case there's no cast but the most simple implementation of crc32() on a unicode string. > > The correct solution is to make safe use of the "S" modifier and not > > using it too much. > > > > As binary casts are allowed in modern PHP versions I don't see this as > > an issue, while such a cast isn't absolutely the best thing to do: I'd > > go with unicode_encode() to be sure about the encoding being used, > > everything else is prone to fail some time (some code changing > > unicode.runtime_encoding for some random reason...) > > You're telling me an explicit cast to binary could fail internally but not > externally? That doesn't make a lot of sense somehow. Externally the user is responsible to select the proper encoding internally PHP has to guess. johannes