Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:101174 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 72851 invoked from network); 27 Nov 2017 23:11:12 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 27 Nov 2017 23:11:12 -0000 Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain zend.com from 104.47.32.106 cause and error) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 104.47.32.106 mail-sn1nam01on0106.outbound.protection.outlook.com Received: from [104.47.32.106] ([104.47.32.106:53152] helo=NAM01-SN1-obe.outbound.protection.outlook.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 72/4E-26862-E8B9C1A5 for ; Mon, 27 Nov 2017 18:11:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=RWSoftware.onmicrosoft.com; s=selector1-zend-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=3j81LrYkiGbUYXMq79LEUQDM0Ucc8Gs/uuRYk9dNWOw=; b=cuPt/Cd6ivCUU7ds9w6TF+6dqYmrH0lLjHcD+2ybFxjqowVK0YvvaNnPaLAYOqZmqdDyrRgjxTr7vqE6AxcT+82n7Qf0kgt0X3bSsX7B8SPTBdajXI2hKwXp0eDHIfxsX/SF7xfgtBvUextTtWeKGH2OU9KFBnvdcM6czODFWYo= Received: from BN6PR02MB3234.namprd02.prod.outlook.com (10.161.152.32) by BY2PR02MB297.namprd02.prod.outlook.com (10.141.140.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.239.5; Mon, 27 Nov 2017 23:11:04 +0000 Received: from BN6PR02MB3234.namprd02.prod.outlook.com ([10.161.152.32]) by BN6PR02MB3234.namprd02.prod.outlook.com ([10.161.152.32]) with mapi id 15.20.0260.006; Mon, 27 Nov 2017 23:11:02 +0000 To: PHP internals list CC: Andrea Faulds , Nikita Popov , Xinchen Hui , "Anatol Belski (ab@php.net)" , Zeev Suraski Thread-Topic: Packed Strings Thread-Index: AQHTZ9Bb7cNkjJeSX0C+VWuY3u+UbA== Date: Mon, 27 Nov 2017 23:11:02 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=dmitry@zend.com; x-originating-ip: [213.21.45.232] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;BY2PR02MB297;6:yibDSvvUG0mimKYhnJFwuNWBKDqfFLWB+6PHFT0uKq4yDJ0CfuAC1DunXuOyB/BMthAXlREtDPscKgCSaJS7qLnj74Hl0KvGCOMZOnYYFRvG3uOJtPbeeiUjMD3p2P+APVA28aHDZiWRlkfUGZeo1GoRGiTKNnLZ8sEuNuH6KruySO1C2C0FD2V7HS2AzaFdhfg8dgcHS8danUXs01kB30wEa4IuQsxf6bymAOBTvvnB+EydB3xxgNVIhyLZBDniuZ52u6lBCHS2Z4XfAp8BDJU5C5/E8nBKMa3hKtuncoB8hnyIkIssmFqE6hW/sx/MgXuJixdfQTXfL30m27An5YTUPq5rmbUgR0oG9Y5ySn0=;5:90a41CRlSbSLG2dQzJvvQq2oPaKRsiygY+6+AlEccEFxOKlkoBHBkOpSe5TXeHHbBTWCC/9V2a5UWav/sKF2cGzMD/BDB1FTjbGMBn6C+8pjYC74bb3PdG3ZEy6GISfsbs2VsUyDsmPPbtD+VP/sYMXgNHkFUzGKL51MhzQxfVo=;24:j6Hl3/g/30/1bbn04YC43iWziLK3IaynIMm54lysERLGwS4KTskVBNjonW5jEVtQLIlPHpKNcOSMEKZuZH6jaLqpvWjTg48LeRdm0dFS+J8=;7:ymqvBAenlacv6/8Z7SPrKJ6YGd4EDM7aSQGvnrhKax2f6U8AnjCzTZwEpJJRPCflS0Rv+rGq8yMsNCZ9LfyTn9+mrXfEzaieE3JmayXRWSl8oMdHHBmsZ51HNjz4NoHeDR1xDe4iHQL3O65GG9ii9Jbw4UgcL/UkPzFe9lI+oO6+DOvFxyGwN2x4LLPOoR+SqST05OOUbabS1IAuDJc/huJg0JAWAmca/KUAEJ6k6EZAcgfLUYwXCMCu5Kv+XyHh x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI;SCL:-1;SFV:NSPM;SFS:(10019020)(366004)(376002)(346002)(189002)(199003)(107886003)(39060400002)(3280700002)(3660700001)(2906002)(4326008)(54906003)(106356001)(105586002)(236005)(19627405001)(55016002)(9686003)(6306002)(33656002)(7696005)(53936002)(966005)(6436002)(81166006)(74316002)(2900100001)(68736007)(221733001)(316002)(3480700004)(8936002)(7736002)(606006)(54896002)(478600001)(189998001)(81156014)(6506006)(16799955002)(25786009)(5660300001)(102836003)(3846002)(6116002)(7116003)(6606003)(6916009)(8676002)(99286004)(14454004)(101416001)(50986999)(54356999)(86362001)(77096006)(66066001)(97736004);DIR:OUT;SFP:1102;SCL:1;SRVR:BY2PR02MB297;H:BN6PR02MB3234.namprd02.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; x-ms-office365-filtering-correlation-id: 7fca897c-d003-4ce0-8d32-08d535ec212d x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(2017052603199);SRVR:BY2PR02MB297; x-ms-traffictypediagnostic: BY2PR02MB297: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(166708455590820); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040450)(2401047)(5005006)(8121501046)(3002001)(93006095)(93001095)(10201501046)(3231022)(6041248)(20161123555025)(20161123560025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123558100)(6072148)(201708071742011);SRVR:BY2PR02MB297;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:BY2PR02MB297; x-forefront-prvs: 0504F29D72 received-spf: None (protection.outlook.com: zend.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_BN6PR02MB32343666A31ECC9C9318475EBF250BN6PR02MB3234namp_" MIME-Version: 1.0 X-OriginatorOrg: zend.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7fca897c-d003-4ce0-8d32-08d535ec212d X-MS-Exchange-CrossTenant-originalarrivaltime: 27 Nov 2017 23:11:02.4855 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 32210298-c08b-4829-8097-6b12c025a892 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR02MB297 Subject: Packed Strings From: dmitry@zend.com (Dmitry Stogov) --_000_BN6PR02MB32343666A31ECC9C9318475EBF250BN6PR02MB3234namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, I spent some time, reviewing an old Andrea's idea about packed strings. https://github.com/hikari-no-yume/php-src/tree/packedStrings The idea is simple. In every place were we use zend_string*, we may store c= haracters directly. We use low byte to encode packed string marker and string length, we also n= eed one byte for trailing zero, so we can keep up to 2-characters on 32-bit= system and up to 6 characters on 64-bit without allocation of additional m= emory. The refreshed dirty PoC implementation https://github.com/php/php-src/compa= re/master...dstogov:packedStrings2?expand=3D1 You may take a quick look only into zend_string.h changes (the rest is almo= st a monkey work). I was able to run bench.php, and probably won't go forward. Unfortunately, I got into two serious problems: 1) The original implementation used packed strings their selves as their ha= sh value. This leaded to huge slowdown, because of hash collisions. (e.g. o= n bench.php hash1()). I switched to hash recalculation on each usage, but t= his negates the benefit of allocation elimination. Probably, we may use a c= heaper hash function for packed strings... 2) PHP still uses char* in many places. When we take ZSTR_VAL() from a pack= ed string stored in local variable (or function argument), we may very easy= get a dangling pointer. (e.g. INI directives processed by OnUpdateString, = internal functions parameters received as char*, ...). Changing all this ch= ar* into zend_string* would help, but looks unrealistic for PHP-7.3. So, I gave up for now. I decided, to share these results. May be someone would get related ideas. Thanks. Dmitry. --_000_BN6PR02MB32343666A31ECC9C9318475EBF250BN6PR02MB3234namp_--