Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73692 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 86207 invoked from network); 15 Apr 2014 00:04:13 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 Apr 2014 00:04:13 -0000 Authentication-Results: pb1.pair.com smtp.mail=are.you.winning@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=are.you.winning@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.44 as permitted sender) X-PHP-List-Original-Sender: are.you.winning@gmail.com X-Host-Fingerprint: 209.85.216.44 mail-qa0-f44.google.com Received: from [209.85.216.44] ([209.85.216.44:51624] helo=mail-qa0-f44.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id A7/83-63736-7777C435 for ; Mon, 14 Apr 2014 20:04:08 -0400 Received: by mail-qa0-f44.google.com with SMTP id hw13so8591738qab.17 for ; Mon, 14 Apr 2014 17:04:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=FiEVny92q0H3TZ3nBMrgfYchqKi5aT0A3Q1lVQJ2nzY=; b=WKIGzMt7hHtP7lXzW/Hto7wqfIjgbVzIfFTruV40abPGI3Hn+e2b7zw0lBUzcrg6Mf 21YLwvKB3PRW6ThAh+Qyd4tyqHGLRer+PlYkwc11B1HcvTPWNOHGNzYwp5WsD78swWGm 4jRC8Glt6wo5py9fz5hXtz1b7kco/9MUFY8l0IMbxt2MJ3laXEHi0PRYzZ1SpQoJqTUK 4Y0Aea1/1XwZa+Ux9cmiREoKvkgD307D4hM1N3Cq5H8x5rb4gFiJfXMlozlf/VR96qaH UP7axOqytEXON629NKvDMKwx8TMTPnJuxOXno0BujispLy1dEOzkh/XibPsWkphLMGrU 3IyA== MIME-Version: 1.0 X-Received: by 10.229.179.65 with SMTP id bp1mr55575926qcb.11.1397520245065; Mon, 14 Apr 2014 17:04:05 -0700 (PDT) Sender: are.you.winning@gmail.com Received: by 10.229.159.210 with HTTP; Mon, 14 Apr 2014 17:04:04 -0700 (PDT) In-Reply-To: <534C2E6B.5020400@marc-bennewitz.de> References: <534A8121.6090205@marc-bennewitz.de> <534C2E6B.5020400@marc-bennewitz.de> Date: Tue, 15 Apr 2014 01:04:04 +0100 X-Google-Sender-Auth: hVNUC1IbhRTbljjDi6al1KHeVus Message-ID: To: Marc Bennewitz Cc: Chris Wright , PHP Internals Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] Shifting bits of a binary string From: daverandom@php.net (Chris Wright) On 14 April 2014 19:52, Marc Bennewitz wrote: > - Bit shift left on strings: > - act byte-by-byte in a O(n) manner > - shifting out left bit(s) > - append NULL bits to the right > - preserve string length > > - Bit shift right on strings: > - act byte-by-byte in a O(n) manner > - shifting out right bit(s) > - prepend NULL bits to the left (don't preserve the first bit) > - preserve string length Agreed. > - Bit shifting on other types: > - error/warning Probably agreed. > - If the number of bits to shift isn't an integer, than: > - error Floats would also need to be permitted here, albeit converted to integers for the purposes of performing the operation. > > Only my 2 cents > Marc > > > On 14.04.2014 13:12, Chris Wright wrote: >> >> Hi Marc >> >> On 13 April 2014 13:20, Marc Bennewitz wrote: >>> >>> Hi List, >>> >>> I hope I'm on the right list but I can't find any other helpful. >>> >>> I have a binary string and I would like to work with bitwise operators. >>> The only help I found was to convert it to an integer. That's ok but it >>> results in some questions: >>> >>> - What if the binary data is more than 32/64 bits long? >>> - Why converting binary data of form one into binary data of another >>> form >>> only to manipulate bits? >>> >>> So I simply tested what's going on if I operate on a string directly but >>> on >>> shifting I get the same wrong result every time. >>> (Testscript below) >>> >>> On reading the manual the only note for strings are the following: >>> (http://www.php.net/manual/en/language.operators.bitwise.php) >>>> >>>> Be aware of data type conversions. If both the left-hand and right-hand >>>> parameters are strings, the bitwise operator will operate on the >>>> characters' >>>> ASCII values. >>> >>> >>> Why such bit operators doesn't work with strings? >>> Why there is not helpful information about in the manual. >>> Why on operation something not working doesn't result in an error/notice >>> but >>> in a completely unexpected value? >>> >>> Greetings >>> Marc >>> >>> >>> Shift to the left: >>> var_dump(decbin(ord(chr(1)))); >>> for ($i=0; $i<10; $i++) { >>> var_dump(decbin(ord(chr(1) << $i))); >>> } >>> >>> Output: >>> string(1) "1" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> >>> Shift to the right: >>> var_dump(decbin(ord(chr(32)))); >>> for ($i=0; $i<10; $i++) { >>> var_dump(decbin(ord(chr(32) >> $i))); >>> } >>> >>> Output: >>> string(1) "100000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >>> string(6) "110000" >> >> >> First, an explanation of why you see the results you show here: >> >> At present, the shift operations act on long integers, when either >> operand is not an integer, they converted to integers. In the case of >> strings, this means they are passed though strtol() with an explicit >> base of 10, meaning that in your example above (and for any string >> that is not a decimal integer) the result of the conversion will be >> zero. The operation will then be performed with a left operand of >> zero, so the result will also be zero. >> >> When your code dumps this you pass it through ord(), which (via a zpp >> call) converts the integer to a string, and then converts the first >> character of this string back to an integer, resulting in 48, the >> ordinal value of ASCII "0". You then pass it to decbin(), which >> returns the binary representation of 48. >> >> The following gives a result that may be more like what you would expect: >> >> $base = '1'; >> var_dump(decbin($base)); >> for ($i=0; $i<10; $i++) { >> var_dump(decbin($base << $i)); >> } >> >> With regards to the actual issue, this is something I would also like >> to see "fixed". >> >> There is an issue with bitwise operations on strings, and that is that >> they are not as cheap as they are with integers - people may expect >> bitwise operations to be lightweight wrappers around very basic >> processor instructions, and in that case of strings this is not true >> because the operation must be performed byte-by-byte in a O(n) manner. >> With shifts the functional complexity further increases, as there is >> additional branching required as often bits must be carried between >> bytes, in which case each byte must be visited twice. >> >> None of these issues actually prevent this from being possible though, >> and while the use cases case for this are few and far between I think >> the current behaviour unexpected and not the sensible option. >> >> *However* there is a very real BC issue here. Consider some code that >> relies on the result of $_GET['mask'] << 2 or something similar - >> something that I can imagine someone somewhere as done, and will break >> if the behaviour is "fixed". Anywhere that input is collected is it >> usually present as a string, and the current behaviour allows you to >> treat it as an integer and get the result you expect. >> >> I would argue that this person did it wrong in the first place and >> that they should be paying attention to types if they are performing >> bitwise operations. I would also be happy to break BC on this part of >> the language that I doubt is used very often. But at the end of the >> day what really matters is what everyone thinks, not just what I >> think. I would be surprised if this hasn't been previously discussed >> on the list - I know I've had a few discussions on the subject with >> various people off-list over the last year or two. >> >> I will try and throw a patch together at lunch to give the behaviour I >> would expect by means of special-case handling for a left operand of >> type IS_STRING, but it is a BC break that would probably only be >> accepted into 5++, if at all. >> >> Thanks, Chris >> >