Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73712 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 36763 invoked from network); 17 Apr 2014 09:06:56 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Apr 2014 09:06:56 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@marc-bennewitz.de; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php@marc-bennewitz.de; sender-id=unknown Received-SPF: error (pb1.pair.com: domain marc-bennewitz.de from 80.237.132.171 cause and error) X-PHP-List-Original-Sender: php@marc-bennewitz.de X-Host-Fingerprint: 80.237.132.171 wp164.webpack.hosteurope.de Received: from [80.237.132.171] ([80.237.132.171:51427] helo=wp164.webpack.hosteurope.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3A/90-29140-DA99F435 for ; Thu, 17 Apr 2014 05:06:54 -0400 Received: from dslb-178-005-227-110.pools.arcor-ip.net ([178.5.227.110] helo=[192.168.178.27]); authenticated by wp164.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) id 1WaiHS-0005q1-Ef; Thu, 17 Apr 2014 11:06:50 +0200 Message-ID: <534F99A1.4070904@marc-bennewitz.de> Date: Thu, 17 Apr 2014 11:06:41 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Chris Wright CC: PHP Internals References: <534A8121.6090205@marc-bennewitz.de> <534C2E6B.5020400@marc-bennewitz.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;php@marc-bennewitz.de;1397725614;eb429554; Subject: Re: [PHP-DEV] Shifting bits of a binary string From: php@marc-bennewitz.de (Marc Bennewitz) On 15.04.2014 02:04, Chris Wright wrote: > On 14 April 2014 19:52, Marc Bennewitz wrote: >> - Bit shift left on strings: >> - act byte-by-byte in a O(n) manner >> - shifting out left bit(s) >> - append NULL bits to the right >> - preserve string length >> >> - Bit shift right on strings: >> - act byte-by-byte in a O(n) manner >> - shifting out right bit(s) >> - prepend NULL bits to the left (don't preserve the first bit) >> - preserve string length > > Agreed. > >> - Bit shifting on other types: >> - error/warning > > Probably agreed. > >> - If the number of bits to shift isn't an integer, than: >> - error > > Floats would also need to be permitted here, albeit converted to > integers for the purposes of performing the operation. Ho do you handle values like 3.5? Is it an error if fractions exists? > >> >> Only my 2 cents >> Marc >> >> >> On 14.04.2014 13:12, Chris Wright wrote: >>> >>> Hi Marc >>> >>> On 13 April 2014 13:20, Marc Bennewitz wrote: >>>> >>>> Hi List, >>>> >>>> I hope I'm on the right list but I can't find any other helpful. >>>> >>>> I have a binary string and I would like to work with bitwise operators. >>>> The only help I found was to convert it to an integer. That's ok but it >>>> results in some questions: >>>> >>>> - What if the binary data is more than 32/64 bits long? >>>> - Why converting binary data of form one into binary data of another >>>> form >>>> only to manipulate bits? >>>> >>>> So I simply tested what's going on if I operate on a string directly but >>>> on >>>> shifting I get the same wrong result every time. >>>> (Testscript below) >>>> >>>> On reading the manual the only note for strings are the following: >>>> (http://www.php.net/manual/en/language.operators.bitwise.php) >>>>> >>>>> Be aware of data type conversions. If both the left-hand and right-hand >>>>> parameters are strings, the bitwise operator will operate on the >>>>> characters' >>>>> ASCII values. >>>> >>>> >>>> Why such bit operators doesn't work with strings? >>>> Why there is not helpful information about in the manual. >>>> Why on operation something not working doesn't result in an error/notice >>>> but >>>> in a completely unexpected value? >>>> >>>> Greetings >>>> Marc >>>> >>>> >>>> Shift to the left: >>>> var_dump(decbin(ord(chr(1)))); >>>> for ($i=0; $i<10; $i++) { >>>> var_dump(decbin(ord(chr(1) << $i))); >>>> } >>>> >>>> Output: >>>> string(1) "1" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> >>>> Shift to the right: >>>> var_dump(decbin(ord(chr(32)))); >>>> for ($i=0; $i<10; $i++) { >>>> var_dump(decbin(ord(chr(32) >> $i))); >>>> } >>>> >>>> Output: >>>> string(1) "100000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>>> string(6) "110000" >>> >>> >>> First, an explanation of why you see the results you show here: >>> >>> At present, the shift operations act on long integers, when either >>> operand is not an integer, they converted to integers. In the case of >>> strings, this means they are passed though strtol() with an explicit >>> base of 10, meaning that in your example above (and for any string >>> that is not a decimal integer) the result of the conversion will be >>> zero. The operation will then be performed with a left operand of >>> zero, so the result will also be zero. >>> >>> When your code dumps this you pass it through ord(), which (via a zpp >>> call) converts the integer to a string, and then converts the first >>> character of this string back to an integer, resulting in 48, the >>> ordinal value of ASCII "0". You then pass it to decbin(), which >>> returns the binary representation of 48. >>> >>> The following gives a result that may be more like what you would expect: >>> >>> $base = '1'; >>> var_dump(decbin($base)); >>> for ($i=0; $i<10; $i++) { >>> var_dump(decbin($base << $i)); >>> } >>> >>> With regards to the actual issue, this is something I would also like >>> to see "fixed". >>> >>> There is an issue with bitwise operations on strings, and that is that >>> they are not as cheap as they are with integers - people may expect >>> bitwise operations to be lightweight wrappers around very basic >>> processor instructions, and in that case of strings this is not true >>> because the operation must be performed byte-by-byte in a O(n) manner. >>> With shifts the functional complexity further increases, as there is >>> additional branching required as often bits must be carried between >>> bytes, in which case each byte must be visited twice. >>> >>> None of these issues actually prevent this from being possible though, >>> and while the use cases case for this are few and far between I think >>> the current behaviour unexpected and not the sensible option. >>> >>> *However* there is a very real BC issue here. Consider some code that >>> relies on the result of $_GET['mask'] << 2 or something similar - >>> something that I can imagine someone somewhere as done, and will break >>> if the behaviour is "fixed". Anywhere that input is collected is it >>> usually present as a string, and the current behaviour allows you to >>> treat it as an integer and get the result you expect. >>> >>> I would argue that this person did it wrong in the first place and >>> that they should be paying attention to types if they are performing >>> bitwise operations. I would also be happy to break BC on this part of >>> the language that I doubt is used very often. But at the end of the >>> day what really matters is what everyone thinks, not just what I >>> think. I would be surprised if this hasn't been previously discussed >>> on the list - I know I've had a few discussions on the subject with >>> various people off-list over the last year or two. >>> >>> I will try and throw a patch together at lunch to give the behaviour I >>> would expect by means of special-case handling for a left operand of >>> type IS_STRING, but it is a BC break that would probably only be >>> accepted into 5++, if at all. >>> >>> Thanks, Chris >>> >> >