Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73713 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 39884 invoked from network); 17 Apr 2014 09:22:19 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Apr 2014 09:22:19 -0000 Authentication-Results: pb1.pair.com smtp.mail=are.you.winning@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=are.you.winning@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.46 as permitted sender) X-PHP-List-Original-Sender: are.you.winning@gmail.com X-Host-Fingerprint: 209.85.216.46 mail-qa0-f46.google.com Received: from [209.85.216.46] ([209.85.216.46:45424] helo=mail-qa0-f46.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D1/00-39661-A4D9F435 for ; Thu, 17 Apr 2014 05:22:18 -0400 Received: by mail-qa0-f46.google.com with SMTP id i13so145600qae.19 for ; Thu, 17 Apr 2014 02:22:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=tVks+7f52Uui2yHqWFEixx0/aJ6HDFYxt2inKIwvqGk=; b=lItKGVwsmfWXFOJJz4TZ5D/o/peItmyJV7rXv8xb2w1Si4cRAMn3JhCm2kN/hiauiz cDNssIXPIG6KpYisZgUbP+6lzuWpoRww2lGZu323THibSGKdhvi9m+pMS/abMhG9z6lX gJs8EbW8ezYHJSXCvHTcJnGE+HzAaz8lOnOF5LQ0CEeb5AChU9qQ7U4bsmGYCQTDuNoI EndYHv9FZ1gj7sNSpcdHOfmbVXAqfdDs83PZ8RTsgRArYrEphgU/PBtVk2KPcRdYXx/B jF0I3RTguU3IWkwUEyxOb6mIQqBw23jQBAXptE3BhHxUctSpggW0Pc68wpAxvm7Nre97 37Bg== MIME-Version: 1.0 X-Received: by 10.140.44.2 with SMTP id f2mr9064222qga.73.1397726535006; Thu, 17 Apr 2014 02:22:15 -0700 (PDT) Sender: are.you.winning@gmail.com Received: by 10.229.159.210 with HTTP; Thu, 17 Apr 2014 02:22:14 -0700 (PDT) In-Reply-To: <534F99A1.4070904@marc-bennewitz.de> References: <534A8121.6090205@marc-bennewitz.de> <534C2E6B.5020400@marc-bennewitz.de> <534F99A1.4070904@marc-bennewitz.de> Date: Thu, 17 Apr 2014 10:22:14 +0100 X-Google-Sender-Auth: ALNtdzCiWDwASm8beXjsBA8iD4o Message-ID: To: Marc Bennewitz Cc: Chris Wright , PHP Internals Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] Shifting bits of a binary string From: daverandom@php.net (Chris Wright) On 17 April 2014 10:06, Marc Bennewitz wrote: > > > On 15.04.2014 02:04, Chris Wright wrote: >> >> On 14 April 2014 19:52, Marc Bennewitz wrote: >>> >>> - Bit shift left on strings: >>> - act byte-by-byte in a O(n) manner >>> - shifting out left bit(s) >>> - append NULL bits to the right >>> - preserve string length >>> >>> - Bit shift right on strings: >>> - act byte-by-byte in a O(n) manner >>> - shifting out right bit(s) >>> - prepend NULL bits to the left (don't preserve the first bit) >>> - preserve string length >> >> >> Agreed. >> >>> - Bit shifting on other types: >>> - error/warning >> >> >> Probably agreed. >> >>> - If the number of bits to shift isn't an integer, than: >>> - error >> >> >> Floats would also need to be permitted here, albeit converted to >> integers for the purposes of performing the operation. > > Ho do you handle values like 3.5? Is it an error if fractions exists? This would be handled as it is currently, with a standard integer conversion, i.e. rounded towards zero. It doesn't make sense to check for fractions and error out IMO, as floating point precision errors would likely cause someone somewhere some unexpected behaviour. In the case of float(3.5) it's obvious this isn't "correct", but in the case of float(3.005) the user probably did some calculation that they expected to result in int(3), so emitting an error would be unexpected. While a margin of error threshold could be defined, it's an extra branch and a somewhat arbitrary limitation. > > >> >>> >>> Only my 2 cents >>> Marc >>> >>> >>> On 14.04.2014 13:12, Chris Wright wrote: >>>> >>>> >>>> Hi Marc >>>> >>>> On 13 April 2014 13:20, Marc Bennewitz wrote: >>>>> >>>>> >>>>> Hi List, >>>>> >>>>> I hope I'm on the right list but I can't find any other helpful. >>>>> >>>>> I have a binary string and I would like to work with bitwise operators. >>>>> The only help I found was to convert it to an integer. That's ok but it >>>>> results in some questions: >>>>> >>>>> - What if the binary data is more than 32/64 bits long? >>>>> - Why converting binary data of form one into binary data of another >>>>> form >>>>> only to manipulate bits? >>>>> >>>>> So I simply tested what's going on if I operate on a string directly >>>>> but >>>>> on >>>>> shifting I get the same wrong result every time. >>>>> (Testscript below) >>>>> >>>>> On reading the manual the only note for strings are the following: >>>>> (http://www.php.net/manual/en/language.operators.bitwise.php) >>>>>> >>>>>> >>>>>> Be aware of data type conversions. If both the left-hand and >>>>>> right-hand >>>>>> parameters are strings, the bitwise operator will operate on the >>>>>> characters' >>>>>> ASCII values. >>>>> >>>>> >>>>> >>>>> Why such bit operators doesn't work with strings? >>>>> Why there is not helpful information about in the manual. >>>>> Why on operation something not working doesn't result in an >>>>> error/notice >>>>> but >>>>> in a completely unexpected value? >>>>> >>>>> Greetings >>>>> Marc >>>>> >>>>> >>>>> Shift to the left: >>>>> var_dump(decbin(ord(chr(1)))); >>>>> for ($i=0; $i<10; $i++) { >>>>> var_dump(decbin(ord(chr(1) << $i))); >>>>> } >>>>> >>>>> Output: >>>>> string(1) "1" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> >>>>> Shift to the right: >>>>> var_dump(decbin(ord(chr(32)))); >>>>> for ($i=0; $i<10; $i++) { >>>>> var_dump(decbin(ord(chr(32) >> $i))); >>>>> } >>>>> >>>>> Output: >>>>> string(1) "100000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>>> string(6) "110000" >>>> >>>> >>>> >>>> First, an explanation of why you see the results you show here: >>>> >>>> At present, the shift operations act on long integers, when either >>>> operand is not an integer, they converted to integers. In the case of >>>> strings, this means they are passed though strtol() with an explicit >>>> base of 10, meaning that in your example above (and for any string >>>> that is not a decimal integer) the result of the conversion will be >>>> zero. The operation will then be performed with a left operand of >>>> zero, so the result will also be zero. >>>> >>>> When your code dumps this you pass it through ord(), which (via a zpp >>>> call) converts the integer to a string, and then converts the first >>>> character of this string back to an integer, resulting in 48, the >>>> ordinal value of ASCII "0". You then pass it to decbin(), which >>>> returns the binary representation of 48. >>>> >>>> The following gives a result that may be more like what you would >>>> expect: >>>> >>>> $base = '1'; >>>> var_dump(decbin($base)); >>>> for ($i=0; $i<10; $i++) { >>>> var_dump(decbin($base << $i)); >>>> } >>>> >>>> With regards to the actual issue, this is something I would also like >>>> to see "fixed". >>>> >>>> There is an issue with bitwise operations on strings, and that is that >>>> they are not as cheap as they are with integers - people may expect >>>> bitwise operations to be lightweight wrappers around very basic >>>> processor instructions, and in that case of strings this is not true >>>> because the operation must be performed byte-by-byte in a O(n) manner. >>>> With shifts the functional complexity further increases, as there is >>>> additional branching required as often bits must be carried between >>>> bytes, in which case each byte must be visited twice. >>>> >>>> None of these issues actually prevent this from being possible though, >>>> and while the use cases case for this are few and far between I think >>>> the current behaviour unexpected and not the sensible option. >>>> >>>> *However* there is a very real BC issue here. Consider some code that >>>> relies on the result of $_GET['mask'] << 2 or something similar - >>>> something that I can imagine someone somewhere as done, and will break >>>> if the behaviour is "fixed". Anywhere that input is collected is it >>>> usually present as a string, and the current behaviour allows you to >>>> treat it as an integer and get the result you expect. >>>> >>>> I would argue that this person did it wrong in the first place and >>>> that they should be paying attention to types if they are performing >>>> bitwise operations. I would also be happy to break BC on this part of >>>> the language that I doubt is used very often. But at the end of the >>>> day what really matters is what everyone thinks, not just what I >>>> think. I would be surprised if this hasn't been previously discussed >>>> on the list - I know I've had a few discussions on the subject with >>>> various people off-list over the last year or two. >>>> >>>> I will try and throw a patch together at lunch to give the behaviour I >>>> would expect by means of special-case handling for a left operand of >>>> type IS_STRING, but it is a BC break that would probably only be >>>> accepted into 5++, if at all. >>>> >>>> Thanks, Chris >>>> >>> >> >