Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:29946 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 63975 invoked by uid 1010); 30 May 2007 08:21:14 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 63960 invoked from network); 30 May 2007 08:21:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 May 2007 08:21:14 -0000 Authentication-Results: pb1.pair.com header.from=antony@zend.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=antony@zend.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 212.25.124.162 as permitted sender) X-PHP-List-Original-Sender: antony@zend.com X-Host-Fingerprint: 212.25.124.162 mail.zend.com Linux 2.5 (sometimes 2.4) (4) Received: from [212.25.124.162] ([212.25.124.162:46626] helo=mail.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 22/B9-24960-7F33D564 for ; Wed, 30 May 2007 04:21:13 -0400 Received: (qmail 26311 invoked from network); 30 May 2007 08:21:09 -0000 Received: from internal.zend.office (HELO ?127.0.0.1?) (10.1.1.1) by internal.zend.office with SMTP; 30 May 2007 08:21:09 -0000 Message-ID: <465D33F3.7070407@zend.com> Date: Wed, 30 May 2007 12:21:07 +0400 User-Agent: Thunderbird 2.0.0.0 (X11/20070326) MIME-Version: 1.0 To: ceo@l-i-e.com CC: php-dev References: <465BF195.2080300@zend.com> <60449.216.230.84.67.1180484129.squirrel@www.l-i-e.com> In-Reply-To: <60449.216.230.84.67.1180484129.squirrel@www.l-i-e.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] bitwise operations and Unicode strings From: antony@zend.com (Antony Dovgal) On 30.05.2007 04:15, Richard Lynch wrote: >> This code outputs "3" in native mode and "Fatal error: Unsupported >> operand types" in Unicode mode. >> I believe this is an inconsistency and it should be possible to use >> Unicode strings there. > > Given that there are probably a bazillion PHP scripts written by > newbies just like that, I'd have to say it's crucial for wide-spread > Unicode adoption for it to "just work" No, I'd say there might be like ten scripts on the planet relying on bitwise operations with strings, since numeric strings behaviour is different here. >> There are several possible ways to implement it: >> 1) the same as with native strings - apply the operator to each >> element of the string separately; >> 2) convert the string to binary (using say iso-8859-1) and then see >> 1); > > How do you type-juggle Unicode strings now when they are used as > (int), regardless of the bit-wise operator or not? > > Seems to me that you'd want the exact same operations as: > $a = '1'; > $a += '2'; > var_dump($a); > ?> Well, it already works in a different way with native strings, so we can't change it. > Doesn't seem like the bitwise operator is relevant, really... > >> We can also leave it as is (since it doesn't seem very useful) or even >> drop the native strings support (it doesn't seem very useful to me >> either). >> Opinions? > > Dropping support for the bazillion scripts that expected PHP to > type-juggle '2' into 2 and do math with them is probably a Bad Idea... It doesn't cast '2' into 2 in this case. '22' is not 22, it's '2' (chr(50)) and '2' (chr(50)). > It's so bad, I must be missing something here, because I don't think > you'd suggest it... :-) > > Ultimately, though, it seems like it should just do what PHP has > always done with that code, because if it doesn't, here's what'll > happen: > > ISP turns on Unicode support. > Client scripts break. > ISP reverts to PHP 5, or PHP 4 even, or turns off Unicode support. > > I am almost certain that I have code that does something not unlike: > > $value = 0; > //read checkboxes > foreach($_GET['flag'] as $flag){ > $value |= $flag; > } > //store $flag in DB as int > ?> > > I'm not claiming it's the Best Code Ever, but it's not exactly > Horrible either... > > Seems like with Unicode turned "on" I'd still expect this to "work" > without throwing an (int) in there. > -- Wbr, Antony Dovgal