Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73689 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 68884 invoked from network); 14 Apr 2014 18:52:42 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 14 Apr 2014 18:52:42 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@marc-bennewitz.de; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php@marc-bennewitz.de; sender-id=unknown Received-SPF: error (pb1.pair.com: domain marc-bennewitz.de from 80.237.132.171 cause and error) X-PHP-List-Original-Sender: php@marc-bennewitz.de X-Host-Fingerprint: 80.237.132.171 wp164.webpack.hosteurope.de Received: from [80.237.132.171] ([80.237.132.171:60210] helo=wp164.webpack.hosteurope.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 25/E0-63736-77E2C435 for ; Mon, 14 Apr 2014 14:52:40 -0400 Received: from dslb-178-012-042-050.pools.arcor-ip.net ([178.12.42.50] helo=[192.168.178.27]); authenticated by wp164.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) id 1WZlzf-0003vD-Ej; Mon, 14 Apr 2014 20:52:35 +0200 Message-ID: <534C2E6B.5020400@marc-bennewitz.de> Date: Mon, 14 Apr 2014 20:52:27 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Chris Wright CC: PHP Internals References: <534A8121.6090205@marc-bennewitz.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;php@marc-bennewitz.de;1397501560;cabb88b0; Subject: Re: [PHP-DEV] Shifting bits of a binary string From: php@marc-bennewitz.de (Marc Bennewitz) Hi Chris, Thanks a lot for your detailed explanation! For a way forward to get it "fixed" the PHP Community needs to find out what of the current behaviors (not only this one) are bugs and what are features I think and how to handle changes in each case. But that's another discussion. (I like the "use strict" idea) In the case of bit shifting I would expect the following but this is only my point of view: - Bit shifting on integers: - Same as current behavior but don't preserve the first bit or PHP needs an unsigned integer type - I would love it :) - Bit shift left on strings: - act byte-by-byte in a O(n) manner - shifting out left bit(s) - append NULL bits to the right - preserve string length - Bit shift right on strings: - act byte-by-byte in a O(n) manner - shifting out right bit(s) - prepend NULL bits to the left (don't preserve the first bit) - preserve string length - Bit shifting on other types: - error/warning - If the number of bits to shift isn't an integer, than: - error Only my 2 cents Marc On 14.04.2014 13:12, Chris Wright wrote: > Hi Marc > > On 13 April 2014 13:20, Marc Bennewitz wrote: >> Hi List, >> >> I hope I'm on the right list but I can't find any other helpful. >> >> I have a binary string and I would like to work with bitwise operators. >> The only help I found was to convert it to an integer. That's ok but it >> results in some questions: >> >> - What if the binary data is more than 32/64 bits long? >> - Why converting binary data of form one into binary data of another form >> only to manipulate bits? >> >> So I simply tested what's going on if I operate on a string directly but on >> shifting I get the same wrong result every time. >> (Testscript below) >> >> On reading the manual the only note for strings are the following: >> (http://www.php.net/manual/en/language.operators.bitwise.php) >>> Be aware of data type conversions. If both the left-hand and right-hand >>> parameters are strings, the bitwise operator will operate on the characters' >>> ASCII values. >> >> Why such bit operators doesn't work with strings? >> Why there is not helpful information about in the manual. >> Why on operation something not working doesn't result in an error/notice but >> in a completely unexpected value? >> >> Greetings >> Marc >> >> >> Shift to the left: >> var_dump(decbin(ord(chr(1)))); >> for ($i=0; $i<10; $i++) { >> var_dump(decbin(ord(chr(1) << $i))); >> } >> >> Output: >> string(1) "1" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> >> Shift to the right: >> var_dump(decbin(ord(chr(32)))); >> for ($i=0; $i<10; $i++) { >> var_dump(decbin(ord(chr(32) >> $i))); >> } >> >> Output: >> string(1) "100000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" >> string(6) "110000" > > First, an explanation of why you see the results you show here: > > At present, the shift operations act on long integers, when either > operand is not an integer, they converted to integers. In the case of > strings, this means they are passed though strtol() with an explicit > base of 10, meaning that in your example above (and for any string > that is not a decimal integer) the result of the conversion will be > zero. The operation will then be performed with a left operand of > zero, so the result will also be zero. > > When your code dumps this you pass it through ord(), which (via a zpp > call) converts the integer to a string, and then converts the first > character of this string back to an integer, resulting in 48, the > ordinal value of ASCII "0". You then pass it to decbin(), which > returns the binary representation of 48. > > The following gives a result that may be more like what you would expect: > > $base = '1'; > var_dump(decbin($base)); > for ($i=0; $i<10; $i++) { > var_dump(decbin($base << $i)); > } > > With regards to the actual issue, this is something I would also like > to see "fixed". > > There is an issue with bitwise operations on strings, and that is that > they are not as cheap as they are with integers - people may expect > bitwise operations to be lightweight wrappers around very basic > processor instructions, and in that case of strings this is not true > because the operation must be performed byte-by-byte in a O(n) manner. > With shifts the functional complexity further increases, as there is > additional branching required as often bits must be carried between > bytes, in which case each byte must be visited twice. > > None of these issues actually prevent this from being possible though, > and while the use cases case for this are few and far between I think > the current behaviour unexpected and not the sensible option. > > *However* there is a very real BC issue here. Consider some code that > relies on the result of $_GET['mask'] << 2 or something similar - > something that I can imagine someone somewhere as done, and will break > if the behaviour is "fixed". Anywhere that input is collected is it > usually present as a string, and the current behaviour allows you to > treat it as an integer and get the result you expect. > > I would argue that this person did it wrong in the first place and > that they should be paying attention to types if they are performing > bitwise operations. I would also be happy to break BC on this part of > the language that I doubt is used very often. But at the end of the > day what really matters is what everyone thinks, not just what I > think. I would be surprised if this hasn't been previously discussed > on the list - I know I've had a few discussions on the subject with > various people off-list over the last year or two. > > I will try and throw a patch together at lunch to give the behaviour I > would expect by means of special-case handling for a left operand of > type IS_STRING, but it is a BC break that would probably only be > accepted into 5++, if at all. > > Thanks, Chris >