Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73685 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 44013 invoked from network); 14 Apr 2014 11:12:21 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Apr 2014 11:12:21 -0000 Authentication-Results: pb1.pair.com header.from=are.you.winning@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=are.you.winning@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.192.48 as permitted sender) X-PHP-List-Original-Sender: are.you.winning@gmail.com X-Host-Fingerprint: 209.85.192.48 mail-qg0-f48.google.com Received: from [209.85.192.48] ([209.85.192.48:38071] helo=mail-qg0-f48.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2E/D5-31471-392CB435 for ; Mon, 14 Apr 2014 07:12:20 -0400 Received: by mail-qg0-f48.google.com with SMTP id i50so289468qgf.21 for ; Mon, 14 Apr 2014 04:12:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=tIYIhnvtp9bktwJWkfouYOtYUyc3rud1X+hxU7Qhx94=; b=x7hwwmn9nki2zo4OJwMHwC2EdhUk3yL8NGVSvx810x4GcAX4za7MfJkm985+bHudqD jrrz43jqBhE3UsGWR3diXdokZgzUq509UE6ZR2sFCeIdUinrAjyOR2ysDvEPLhfD84MJ 5iLlrDXfRtABlbEAWKBkptFF3PnYVFZ8fnIqZ0HAyJWHX5JHXsshoIMX+WUDgHuFqWmN VphxByq24oI4sjk+kndyFp0u5i0Sdc5dKrPDAIwGzxCq7r0gloTg0+XWFMjJ8jDHbFL1 fd4mWHSS3j4eAMwBhlyUQODAuZn95w/xnPZTO62Hlg7vYij+bJ53hhyzvhX3240MB802 ymww== MIME-Version: 1.0 X-Received: by 10.140.84.40 with SMTP id k37mr46616031qgd.65.1397473936926; Mon, 14 Apr 2014 04:12:16 -0700 (PDT) Sender: are.you.winning@gmail.com Received: by 10.229.159.210 with HTTP; Mon, 14 Apr 2014 04:12:16 -0700 (PDT) In-Reply-To: <534A8121.6090205@marc-bennewitz.de> References: <534A8121.6090205@marc-bennewitz.de> Date: Mon, 14 Apr 2014 12:12:16 +0100 X-Google-Sender-Auth: Rq6XHtgX2_cV6cUWJmzlo5XEryo Message-ID: To: Marc Bennewitz Cc: PHP Internals Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] Shifting bits of a binary string From: daverandom@php.net (Chris Wright) Hi Marc On 13 April 2014 13:20, Marc Bennewitz wrote: > Hi List, > > I hope I'm on the right list but I can't find any other helpful. > > I have a binary string and I would like to work with bitwise operators. > The only help I found was to convert it to an integer. That's ok but it > results in some questions: > > - What if the binary data is more than 32/64 bits long? > - Why converting binary data of form one into binary data of another form > only to manipulate bits? > > So I simply tested what's going on if I operate on a string directly but on > shifting I get the same wrong result every time. > (Testscript below) > > On reading the manual the only note for strings are the following: > (http://www.php.net/manual/en/language.operators.bitwise.php) >> Be aware of data type conversions. If both the left-hand and right-hand >> parameters are strings, the bitwise operator will operate on the characters' >> ASCII values. > > Why such bit operators doesn't work with strings? > Why there is not helpful information about in the manual. > Why on operation something not working doesn't result in an error/notice but > in a completely unexpected value? > > Greetings > Marc > > > Shift to the left: > var_dump(decbin(ord(chr(1)))); > for ($i=0; $i<10; $i++) { > var_dump(decbin(ord(chr(1) << $i))); > } > > Output: > string(1) "1" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > > Shift to the right: > var_dump(decbin(ord(chr(32)))); > for ($i=0; $i<10; $i++) { > var_dump(decbin(ord(chr(32) >> $i))); > } > > Output: > string(1) "100000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" > string(6) "110000" First, an explanation of why you see the results you show here: At present, the shift operations act on long integers, when either operand is not an integer, they converted to integers. In the case of strings, this means they are passed though strtol() with an explicit base of 10, meaning that in your example above (and for any string that is not a decimal integer) the result of the conversion will be zero. The operation will then be performed with a left operand of zero, so the result will also be zero. When your code dumps this you pass it through ord(), which (via a zpp call) converts the integer to a string, and then converts the first character of this string back to an integer, resulting in 48, the ordinal value of ASCII "0". You then pass it to decbin(), which returns the binary representation of 48. The following gives a result that may be more like what you would expect: $base = '1'; var_dump(decbin($base)); for ($i=0; $i<10; $i++) { var_dump(decbin($base << $i)); } With regards to the actual issue, this is something I would also like to see "fixed". There is an issue with bitwise operations on strings, and that is that they are not as cheap as they are with integers - people may expect bitwise operations to be lightweight wrappers around very basic processor instructions, and in that case of strings this is not true because the operation must be performed byte-by-byte in a O(n) manner. With shifts the functional complexity further increases, as there is additional branching required as often bits must be carried between bytes, in which case each byte must be visited twice. None of these issues actually prevent this from being possible though, and while the use cases case for this are few and far between I think the current behaviour unexpected and not the sensible option. *However* there is a very real BC issue here. Consider some code that relies on the result of $_GET['mask'] << 2 or something similar - something that I can imagine someone somewhere as done, and will break if the behaviour is "fixed". Anywhere that input is collected is it usually present as a string, and the current behaviour allows you to treat it as an integer and get the result you expect. I would argue that this person did it wrong in the first place and that they should be paying attention to types if they are performing bitwise operations. I would also be happy to break BC on this part of the language that I doubt is used very often. But at the end of the day what really matters is what everyone thinks, not just what I think. I would be surprised if this hasn't been previously discussed on the list - I know I've had a few discussions on the subject with various people off-list over the last year or two. I will try and throw a patch together at lunch to give the behaviour I would expect by means of special-case handling for a left operand of type IS_STRING, but it is a BC break that would probably only be accepted into 5++, if at all. Thanks, Chris