Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:95991 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 75993 invoked from network); 13 Sep 2016 19:58:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 13 Sep 2016 19:58:15 -0000 Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 104.47.37.122 as permitted sender) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 104.47.37.122 mail-cys01nam02on0122.outbound.protection.outlook.com Received: from [104.47.37.122] ([104.47.37.122:9952] helo=NAM02-CY1-obe.outbound.protection.outlook.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 81/43-60695-45A58D75 for ; Tue, 13 Sep 2016 15:58:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=RWSoftware.onmicrosoft.com; s=selector1-zend-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=ApmR30Mf3jDaLqBIdEaNgDQCgVBlVP/9fsfH7N+ymvk=; b=HjfbnzgoJUeLM1An+eqAjcszXW4HxAp92WF9DEDn2x/WoPjDefHwm4Hku/9av3Ma3donQBGkKlHg6dP/3sRKrcezkfA+yBJWmjjBvMWnNgQtq4h5x0xH8Xm+uBHEgqP1s16qwE2/jWHqAgLaMqpx81CF9Gr+yVds5E7m/RjynqY= Received: from MWHPR02MB2477.namprd02.prod.outlook.com (10.168.204.147) by MWHPR02MB2478.namprd02.prod.outlook.com (10.168.204.148) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P384) id 15.1.619.10; Tue, 13 Sep 2016 19:58:09 +0000 Received: from MWHPR02MB2477.namprd02.prod.outlook.com ([10.168.204.147]) by MWHPR02MB2477.namprd02.prod.outlook.com ([10.168.204.147]) with mapi id 15.01.0619.011; Tue, 13 Sep 2016 19:58:09 +0000 To: Andrea Faulds , "internals@lists.php.net" Thread-Topic: [PHP-DEV] Directly embed small strings in zvals Thread-Index: AQHSDdw3GJRZ9q16JkSowPffmR7z/KB3vNiAgAAUyCs= Date: Tue, 13 Sep 2016 19:58:08 +0000 Message-ID: References: <20160913163000.D68A15FA84@mx.zeyos.com>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=dmitry@zend.com; x-originating-ip: [92.62.57.172] x-ms-office365-filtering-correlation-id: db0fe673-2c2c-42da-9126-08d3dc104901 x-microsoft-exchange-diagnostics: 1;MWHPR02MB2478;6:cTEtoaT36NwLIK2gSqkspFRdIcCBmwmabQ+FX+u0DBX5t/yjGYxWIN6K+pvayaC0em9ars9XGqilo9IWXuiZOvC43GNEKiyac5Lf/KbAMGtYH86IeB8xh1K1tiRBfaIix1v7uqhT77P0CTiBl7RzwJyMGB/ywapJpdSBUucwOIitDIYFAynfut460GKwKnhegQOlmWXClcAdZ+U35V0VCGoV5+UIKIKrktPURlXcYWJEx/UwDCV+3OJulWPpMTsj/Drz6MlWzLm6af0Zv4jebE4vURIklUNFOaPKAyRRoGs=;5:woN4Kph7dZ6VHGXKHIfTo7ZcJwLLZ0moG7jcQBAgXy/2hp92yW40lQyDCyNuH3o/hgSlOfSG3eAhHYhgDE9gH+mVRU5arUA04feUfcIEfOEdedEAyJNmj/oS+NUtrZ/ZIA89g/kOaTh8b3p4ZDys0A==;24:nevE1Q0JXiuLfqcg5aebkj1jCpgyy6I/+k0epq85GJivv3fFSlJanj94xQgaH8N0UVqYvHk0AdxHKKxFfJS55hXjFqeeEdUsoUTqN3ioL7M=;7:OtI0RIaBmgGevslzOiI3s/pLS563ZJ5wlF7xpeyh2hU8knOfrHDvqaW8zQxMiHTbYnQTZoM6rxsmqybIy9ZRabuhexZ104fHU1DYhv6mwPTK/8QE7HQG6bUavh3dfFwQ57E7cuuGj4xv4vJ4vmbuK0I5B66zArr8Q17tp7GtTtrNkjrXG3asuMg9IjyEpMdvy0eGHhGCbRWhbJPDe9z6dmnhyMgKKQaTqAm2V77bPzQzywVQxGAlM6Od7bhfV+0e x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:MWHPR02MB2478; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863)(23657631684272)(166708455590820)(131327999870524)(17755550239193); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:MWHPR02MB2478;BCL:0;PCL:0;RULEID:;SRVR:MWHPR02MB2478; x-forefront-prvs: 0064B3273C x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(7916002)(189002)(24454002)(52314003)(377454003)(199003)(10400500002)(5660300001)(107886002)(189998001)(15188155005)(76576001)(19580395003)(92566002)(7736002)(7906003)(102836003)(6116002)(19580405001)(586003)(3846002)(97736004)(5002640100001)(5001770100001)(16236675004)(86362001)(87936001)(7846002)(9686002)(2950100001)(74316002)(2501003)(11100500001)(15975445007)(77096005)(2900100001)(2906002)(54356999)(7696004)(66066001)(16799955002)(3660700001)(3280700002)(8676002)(101416001)(81166006)(19617315012)(81156014)(122556002)(106116001)(106356001)(105586002)(99286002)(76176999)(50986999)(68736007)(33656002)(19625215002)(8936002)(556834004);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR02MB2478;H:MWHPR02MB2477.namprd02.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; received-spf: None (protection.outlook.com: zend.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_MWHPR02MB2477F28CA461BEFEA49BA209BFFE0MWHPR02MB2477namp_" MIME-Version: 1.0 X-OriginatorOrg: zend.com X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Sep 2016 19:58:08.9857 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 32210298-c08b-4829-8097-6b12c025a892 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR02MB2478 Subject: Re: [PHP-DEV] Directly embed small strings in zvals From: dmitry@zend.com (Dmitry Stogov) --_000_MWHPR02MB2477F28CA461BEFEA49BA209BFFE0MWHPR02MB2477namp_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, I was skeptical about this idea, but the PoC looks interesting and quite si= mple. This might be too big change for 7.*, if we won't completely hide implemen= tation details for extensions using existing macros... I'm not sure if this will lead to performance improvement. On one hand, we won't need to read elements form referenced zend_string str= ucture (should improve data locality and cache usage), on the other, we'll have to perform additional checks for ZSTR_IS_PACKED() = (may increase branch miss-prediction and iCache misses). Andrea, did you try to run some benchmarks? real-life apps? Thanks. Dmitry. ________________________________ From: Andrea Faulds Sent: Tuesday, September 13, 2016 9:26:03 PM To: internals@lists.php.net Subject: Re: [PHP-DEV] Directly embed small strings in zvals Hi Ben! Benjamin Coutu wrote: > I was wondering if it would make sense to store small strings (length <= =3D 7) directly inside the zval struct, thereby avoiding the need to extra = allocate a zend_string, which would also not entail any costly indirection = and refcounting for such strings. > > The idea would be to add a new sruct ``struct { uint8_t len; char val[7];= } sval`` to the _zend_value union type in order to embed it directly into = the zval struct and use a type flag (zval.u1.v.type_flags) such as IS_SMALL= _STRING to destinguish between a regular heap allocated zend_string and the= directly embedded compact representation. > > Small strings are quite common IMHO. In fact quickly sampling my company'= s PHP code base I found well over 50% of the strings to be of length <=3D 7= . It would save a lot of memory allocations as well as pointer indirection,= and could also bypass refcounting logic. Also, comparing small strings for= equality would become a trivial operation (just comparing two pre-aligned = 64bit integers) - no more need to keep small strings interned. > > Of course it wouldn't longer be possible to also persistently store the h= ash value of a small string, though calculating the hash value for small st= rings is less costly anyways because less characters equals less iterations= , so that might not be an issue in practice. > > I don't see such an idea in https://wiki.php.net/php-7.1-ideas and I was = wondering: Has anybody experimented with that approach yet? Is it worth dis= cussing? Funnily enough, I was thinking of trying to implement this recently. It's an interesting idea. I've previously tried implementing this with a slightly different approach, whereby the string is embedded within a tagged zend_string pointer, rather than as a separate zval type. This would mean you only need to change the zend_string functions and the string macros, and you can use such strings anywhere a zend_string pointer is used, not just in zvals. My quick-and-dirty implementation can be found here: https://github.com/php/php-src/compare/master...TazeTSchnitzel:packedString= s However, I don't think it benchmarked that well, and it broke some things (when you contain the string data inside the pointer itself, it breaks certain assumptions existing code makes). I don't know how performant such an approach could be, considering it adds an extra branch at the site of every index into a string (though some sort of caching, or compiler optimisations, might improve this). And I don't know quite how much stuff it breaks. My inspiration was a blog post about how Apple's Objective-C packs strings inside pointers, where possible. I think it was this one: https://www.mikeash.com/pyblog/friday-qa-2015-07-31-tagged-pointer-strings.= html My approach was less sophisticated than Apple's. Whereas Apple can store strings as 8-bit, 6-bit or 5-bit, I took the easier approach of just having 8-bit strings, for direct indexing. For a 64-bit pointer, my implemention let you have up to 6 bytes for character data, 1 byte for a null terminator, and 1 byte for containing the pointer tag bit. Your suggested approach (packing the string inside the zval as another kind of zval) is also possible, but I haven't tried it. It would have the advantage of letting you use strings that are a byte longer, but you could only use it for strings in zvals, and it would possibly require more code changes overall (adding support for a new zval type everywhere, vs. only updating code making the wrong assumptions about zend_string pointers). Also, a potential problem is that possibly a lot of PHP functions deal with strings only as zend_strings, so PHP might end up converting back and forth between packed/small string zvals and zend_strings just to satisfy such functions. In particular, zend_parse_parameters might have to do that. Your approach is less hacky and more portable, though. Notably, it works on 32-bit systems (whose pointers are too small for useful string packing, but which still give you 8 bytes to play with in the zval). Regarding hash values, there's actually a simple solution to that, at least on 64-bit platforms. I changed the hash function to check the string length, and if it's small enough to be a packed string, use the packed string as the hash itself. (After all, they're both 64-bit numbers!) This costs nothing if the string is already a packed string, and if it isn't, the conversion is extremely simple. Of course, this comes at the cost of adding an extra branch to the hash function for non-packed strings. Anyway, generally, I have no idea if either approach is all that good an idea or not. More research might be needed. It's certainly fun to discuss! :) -- Andrea Faulds https://ajf.me/ -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php --_000_MWHPR02MB2477F28CA461BEFEA49BA209BFFE0MWHPR02MB2477namp_--