Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:96628 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 90164 invoked from network); 27 Oct 2016 15:02:08 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 27 Oct 2016 15:02:08 -0000 Authentication-Results: pb1.pair.com smtp.mail=ben.coutu@zeyos.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ben.coutu@zeyos.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zeyos.com designates 89.163.237.165 as permitted sender) X-PHP-List-Original-Sender: ben.coutu@zeyos.com X-Host-Fingerprint: 89.163.237.165 mx.zeyos.com Received: from [89.163.237.165] ([89.163.237.165:53714] helo=mx.zeyos.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 52/05-46126-CE612185 for ; Thu, 27 Oct 2016 11:02:06 -0400 Received: from mx.zeyos.com (localhost [127.0.0.1]) by mx.zeyos.com (Postfix) with ESMTP id 8557A5FAAC for ; Thu, 27 Oct 2016 17:02:01 +0200 (CEST) Authentication-Results: mx.zeyos.com (amavisd-new); dkim=pass reason="pass (just generated, assumed good)" header.d=zeyos.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=zeyos.com; h= content-transfer-encoding:content-type:content-type:mime-version :to:subject:subject:from:from:date:date; s=dkim; t=1477580521; x=1478444522; bh=t7bLyDRoROcRDCgpOppzjuxvW2huXbFj3iGBF12PtB4=; b= LFmKpr5rjG94MKYtQDSSpi9KLyrOk0dnsU/rIcB42bR0h3vdvYqXhI8o/tGcE1k6 EEqZlai01d5qN3D4TrMGLISInWUx4ICMTvJhb1o1HRXrJMS/EKGNSWMC+9i+xf8F VXBf8tO6eNptL3U3dHLtO7hV6foyMCNWDB1nkXDiUWU= X-Virus-Scanned: Debian amavisd-new at mx.zeyos.com Received: from mx.zeyos.com ([127.0.0.1]) by mx.zeyos.com (mx.zeyos.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id RW8YGsz_aPK7 for ; Thu, 27 Oct 2016 17:02:01 +0200 (CEST) Received: from 127.0.0.1 (srv32.dedicated.server-hosting.expert [89.163.135.32]) by mx.zeyos.com (Postfix) with ESMTPSA id 285DB5FAA8; Thu, 27 Oct 2016 17:02:01 +0200 (CEST) Date: Thu, 27 Oct 2016 17:02:01 +0200 To: PHP Internals , Dmitry Stogov Cc: Nikita Popov , Xinchen Hui MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Message-ID: <20161027150201.8557A5FAAC@mx.zeyos.com> Subject: Re: [PHP-DEV] Directly embed small strings in zvals From: ben.coutu@zeyos.com (Benjamin Coutu) Hi Andrea, I have been thinking about this a bit more. Here are a few thoughts. Considering the added complexity, the effort would only be worth it if we c= ould come up with a solution that would cover more cases. Max. 6-7 characte= r length strings don't really justify the implied overhead, cause they are = not ubiquitous enough. One idea that springs to mind is a form of compression by limiting it to a = subset of ASCII characters, perhaps those generally used by identifiers and= number-like strings ([a-zA-Z0-9_.]). Those strings are most common (array-= keys, JSON-keys, file names, function names, numbers out of DB, etc.). The = more characters we can pack into a single datum the less visible the overhe= ad will be performance-wise. Limiting ourselves to these common 64 characters ([a-zA-Z0-9_.]) would allo= w us to effectively store (256 / 64) * 7 =3D 28 characters in those availab= le 7-bytes plus 1 byte (minus pointer tag bit) for the length. Of course un= packing those kind of strings entails CPU and memory reallocation overhead.= We can mitigate allocating and deallocating memory over and over again by = using a stack-like buffer pool for unpacked small strings with fixed bucket= sizes of + <28 bytes> + \x00 for unpacking into. I actually like your idea of using pointer tagging to distinguish between p= acked and regular strings, so that we can apply this to all zend_strings, n= ot just ZVAL strings. I don't think it's a real issue that we'd be practica= lly limiting this optimization to 64-bit systems (most that run PHP are 64-= bit nowadays anyways). We can either simply deactivate it for 32-bit or use= it with the available (256 / 64 * 3) =3D 12 characters for 32-bit (we'd st= ill have room for the pointer tag bit on 32-bit machines). Now, I realize that this would be a massiv undertaking (PHP8!). Without pro= per abstraction we'd be converting strings all over the place. Therefore, o= ne would have to build a separate zend_strings abstraction layer for all co= mmon string functions (zend_strcat(), zend_strcmp(), zend_strlen(), zend_st= rcpy(), zend_strncat(), etc.) that expect zend_string parameters and can op= erate on both internal types of strings. And, one would actually have to us= e them throughout the code base. Such abstraction might be interesting and = useful in and of itself though.=0A=0ALet's summarize the trade-off.=0A=0ATh= e negatives of packed strings are:=0A- Additional branching (hence occasion= al branch mispredictions) for distinction of two types of strings=0A- Memor= y allocation for unpacking (can be mitigated by using global pre-allocated = stack-like buffer pool)=0A- Extra CPU cycles for decompression/unpacking (c= an be minimized with proper abstraction of zend_string functions) The positives of packed strings are: - No initial separate heap allocation=0A- One less indirection because no p= ointer has to be chased - Implicitly interned, no need for extra interning (if we can guarantee tha= t all eligible strings are converted to packed format before usage)=0A- Val= ue equals hash key, no need to generate extra hash key (that's a huge plus = considering that array-keys will be very likely eligible for packed strings= )=0A- Smaller memory footprint because of compression - Less CPU usage for comparison (also, no need to ever unpack if we can gua= rantee that all eligible strings are converted to packed format before usag= e)=0A Let me know if you (or anyone else) is interested in discussing this approa= ch further. Cheers, Benjamin Coutu=0A=0A=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Original =3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=0AFrom: Benjamin Coutu =0ATo: Nikita P= opov , Dmitry Stogov , Xinchen Hui <= xinchen.h@zend.com>=0ADate: Tue, 13 Sep 2016 18:29:10 +0200=0ASubject: [PHP= -DEV] Directly embed small strings in zvals=0A=0A> Hello everyone,=0A> =0A>= I was wondering if it would make sense to store small strings (length <=3D= 7) directly inside the zval struct, thereby avoiding the need to extra all= ocate a zend_string, which would also not entail any costly indirection and= refcounting for such strings.=0A> =0A> The idea would be to add a new sruc= t ``struct { uint8_t len; char val[7]; } sval`` to the _zend_value union ty= pe in order to embed it directly into the zval struct and use a type flag (= zval.u1.v.type_flags) such as IS_SMALL_STRING to destinguish between a regu= lar heap allocated zend_string and the directly embedded compact representa= tion.=0A> =0A> Small strings are quite common IMHO. In fact quickly samplin= g my company's PHP code base I found well over 50% of the strings to be of = length <=3D 7. It would save a lot of memory allocations as well as pointer= indirection, and could also bypass refcounting logic. Also, comparing smal= l strings for equality would become a trivial operation (just comparing two= pre-aligned 64bit integers) - no more need to keep small strings interned.= =0A> =0A> Of course it wouldn't longer be possible to also persistently sto= re the hash value of a small string, though calculating the hash value for = small strings is less costly anyways because less characters equals less it= erations, so that might not be an issue in practice.=0A>=20 > I don't see such an idea in https://wiki.php.net/php-7.1-ideas and I was = wondering: Has anybody experimented with that approach yet? Is it worth dis= cussing?=0A> =0A> Please let me know your thoughts,=0A> =0A> Ben >