Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:111694 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 54842 invoked from network); 26 Aug 2020 13:59:40 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 26 Aug 2020 13:59:40 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 4A7051804AA for ; Wed, 26 Aug 2020 06:02:50 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: * X-Spam-Status: No, score=1.3 required=5.0 tests=BAYES_20,BODY_8BITS, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE, SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mg02.mydataknox.com (mg02.mydataknox.com [185.62.73.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 26 Aug 2020 06:02:49 -0700 (PDT) X-AuditID: c0a88e79-2a9ff70000004872-f1-5f465d785bc0 Received: from cp007.mydataknox.com (cp007.mydataknox.com [185.62.73.141]) (using TLS with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mg02.mydataknox.com (Mydataknox Mail Gateway) with SMTP id 66.5F.18546.87D564F5; Wed, 26 Aug 2020 15:02:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=voxdiversa.hr; s=default; h=Content-Type:MIME-Version:Date:Message-ID: Subject:From:To:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=WBNodwyOlR+lneLdYEetb69AhC21OB5dX8C6YESzdnU=; b=Qt5H4v0dXcd31f8+GWGtsn4XBm lhPG8jbTsFrssx5b/5PzXKxd2j4UpGHJ0DZaJL7Wvj5l8JsiLtMxX4J6tzvxRCykVar/HNiCSBLdC gLUwnmJbUFIRmeU+mR6I/KUUJmK/plBHtYV8UJzPkP6mAaeqXIzGz0xqD7VYFlnHDdaZCKyINjtv2 nEdRJmW5KLi8kOicxdNTOkI+JY1+2JGaAdYV+tY1wQqopDpK+VtEW7HSLOu06eVKw3GIoBJaGl36D AyoxgoYIppEDTKfnp25cbvhfzab6SV//D5YOVqfN7oFEDtdgTW9MdIvYrt1HjmF2w6lrhrdAZC6jV xFVGrybA==; Received: from [82.214.97.81] (port=38850 helo=[10.0.0.201]) by cp007.mydataknox.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1kAv4d-0004zR-T6 for internals@lists.php.net; Wed, 26 Aug 2020 15:02:47 +0200 To: PHP internals Message-ID: <08b959f5-7652-5e8a-562a-c2659facec8c@voxdiversa.hr> Date: Wed, 26 Aug 2020 15:02:44 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:80.0) Gecko/20100101 Thunderbird/80.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------1A59B12C26E8F957F120522E" Content-Language: en-US X-OutGoing-Spam-Status: No, score=-1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - cp007.mydataknox.com X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - voxdiversa.hr X-Get-Message-Sender-Via: cp007.mydataknox.com: authenticated_id: dino.pejakovic@voxdiversa.hr X-Authenticated-Sender: cp007.mydataknox.com: dino.pejakovic@voxdiversa.hr X-Source: X-Source-Args: X-Source-Dir: X-From-Rewrite: unmodified, already matched X-Brightmail-Tracker: H4sIAAAAAAAAA1WTfUwbdRjH/V1LuTa99np09hkuMTSTqJvIiHPKkKBzRI2aaSIsU8SDni1Z X0hbDDUxYZ0IDCFrRhNej1eRV4tjkBYXiq0gLcaxZFvccDOtLEM2o4xsI7hl3nHt2P558vk9 z/f5/p57cj9cRB3Fk/ESs52xmmmjViIT+7LfrHuuvCC3MN11RvHS1N/jkhz0xuJ3Q+ID6JAs S8cYSz5jrM9nfyIz3GpdSyx1F5R7XA2iCjT39jEkxYF8AYavrWDHkAynyDEMwv3XxfxBTM5K oKbHnSBUBjFY6OgU8S1qcgfcCa8jniXk6+Dtdol5TiJfBc8J/4aGIHPAObuA8SwmnwJPx28b +S1kHkR7T8Y0Kgg1LXK9OC4iD8CRUCKfFpEauLTYjgnTpUJ0pDdR4ENw74fz4vjUZ6bbYpqd cJ11xvKZ8F9wKMYZUFtzI0Hgj6GW9cd83oLGWW8sL4MpNysRWAnBlitIYAVMdo7E8jtg/O6f kuNoa/NDUzdvTt380NQC58Ho4F8xfhHY0YhI4Cfh6FiLSGhNhd5K7aNpnvdAvXs1QeAUaKiN xOz3wr3WLklcM9p6BD2q4S1fgeFTRAdSDKAkkz49I83k0NF2+rDZUp5WbDGdRNzfMtVX7/Ci Sq8+gDAcBdBWHNNuIX5+P7eQUhRZdA4DbTMUWsuMjC2AtnOW0ZHBeZQsNlvMjFZNqM7uK6QI He34nLFa4rIncLFWQ8yvoQKK1NN25jDDlDLWeHUbjmuB0HzIXaCyMnqm/NMSo32zjOHSAAJc zpnfyuc0hK2UNtlK9EI9jF7GfdNslwhfGejh4h0+UhvTJGuI7bwpyTcYyswPPONvYhlNIhxp k4gsXibnnswD24werkJGHofl++lw5atGBJ4LLgzqw/MYsN4TIliK9iWAf+6iBNjmlUQ4d60T h0uBHhxWq4Nc+GcBB5Z1SqGu+0sZnBv+Vsb3ysAzViMHd/+MHJZ+mZXDT/MX5RCquswfOxUw 3lClhO7fm5TQuHBBCb7qGRWsng1RUHFqnYJ/B+5T0O68mQTHl1xq8PkG1cvcYjBuMW15+/nF 2OnNL0iuQDPf3C5isQZ1i+bHdidZHfSdbsocCl8+nW28ajuolqT079aHPtg96a3/dXVaeiOr ad+uCcfNP3Ipb3D66+XiVP9eU87Ta3ev7kkrKCsu0n2vDEcPrvu7dj62P2KcqzvvUVX3dT8z USt9V58dUad8YXpt8qN33rtdSbQ1TlQR+fnbMrVim4He9azIaqP/B1pqjKKWBAAA Subject: PDO fetch performance problems with many bind parameters From: dino.pejakovic@voxdiversa.hr (=?UTF-8?Q?Dino_Pejakovi=c4=87?=) --------------1A59B12C26E8F957F120522E Content-Type: multipart/alternative; boundary="------------E4D9AC3C93DB72AEAA49E019" --------------E4D9AC3C93DB72AEAA49E019 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Hi, First time messing around with PHP internal code and mailing lists, so sorry if I'm in the wrong place or doing something wrong in general. I recently noticed some weird performance issues while doing bulk inserts with prepared statements (single INSERT with a lot of VALUES) and using RETURNING clause to get back IDs and other columns. So I wrote a little benchmark to insert 8000 random rows (3 columns each) into a table and spent some time tracking down why it's slow. Suprisingly it seems that INSERT itself takes 100-200ms, but fetch/fetchAll returning id and one of the columns takes 2-3 seconds. I'm sending the benchmark script and postgres database schema used. After digging around PHP source code (pulled master branch), the problem seems to be in PDO calling param_hook with PDO_PARAM_EVT_FETCH_PRE and again PDO_PARAM_EVT_FETCH_POST  for each fetched row, which causes param_hook to be executed for each row x each param twice. In my little benchmark inserting 8000 rows with 3 columns and returning 2 columns for each row that means param_hook is called 8000x3x8000x2 = 384 000 000 times! So I took a look at pgsql_stmt_param_hook in ext/pdo_pgsql/pgsql_statement.c and it doesn't seem to do anything for PDO_PARAM_EVT_FETCH_PRE or PDO_PARAM_EVT_FETCH_POST. So if my understanding is correct, it's calling a function that does nothing meaningful 384 000 000 times, and this number grows exponentially with the number of rows and columns. I'm using postgres, but looking at ext/pdo_mysql code this seems to also be the case for mysql's drivers, didn't benchmark it though. As a test I commented out the lines dispatching those events in ext/pdo/pdo_stmt.c: |diff --git a/ext/pdo/pdo_stmt.c b/ext/pdo/pdo_stmt.c|| ||index 96f7574638..49703a7d68 100644|| ||--- a/ext/pdo/pdo_stmt.c|| ||+++ b/ext/pdo/pdo_stmt.c|| ||@@ -592,9 +592,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen|| ||                return 0;|| ||        }|| |||| ||-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {|| ||-               return 0;|| ||-       }|| ||+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {|| ||+       //      return 0;|| ||+       // }|| |||| ||        if (!stmt->methods->fetcher(stmt, ori, offset)) {|| ||                return 0;|| ||@@ -605,9 +605,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen|| ||                return 0;|| ||        }|| |||| ||-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {|| ||-               return 0;|| ||-       }|| ||+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {|| ||+       //      return 0;|| ||+       // }|| |||| ||        if (do_bind && stmt->bound_columns) {|| ||                /* update those bound column variables now */| | | After this change, fetching takes ~5ms and nothing seems broken, but the whole INSERT/RETURNING is now 10 times faster. Am I understanding this right? Could this be solved by letting each pdo_* driver set some kind of a flag (bitmask?) telling PDO which hooks it wants called from dispatch_param_event? For example for pdo_pgsql that flag would be |PDO_PARAM_EVT_FREE | PDO_PARAM_EVT_NORMALIZE | PDO_PARAM_EVT_ALLOC | PDO_PARAM_EVT_EXEC_PRE|, as other case statements seem empty. --------------E4D9AC3C93DB72AEAA49E019 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit

Hi,


First time messing around with PHP internal code and mailing lists, so sorry if I'm in the wrong place or doing something wrong in general.


I recently noticed some weird performance issues while doing bulk inserts with prepared statements (single INSERT with a lot of VALUES) and using RETURNING clause to get back IDs and other columns.

So I wrote a little benchmark to insert 8000 random rows (3 columns each) into a table and spent some time tracking down why it's slow. Suprisingly it seems that INSERT itself takes 100-200ms, but fetch/fetchAll returning id and one of the columns takes 2-3 seconds.

I'm sending the benchmark script and postgres database schema used.


After digging around PHP source code (pulled master branch), the problem seems to be in PDO calling param_hook with PDO_PARAM_EVT_FETCH_PRE and again PDO_PARAM_EVT_FETCH_POST  for each fetched row, which causes param_hook to be executed for each row x each param twice. In my little benchmark inserting 8000 rows with 3 columns and returning 2 columns for each row that means param_hook is called 8000x3x8000x2 = 384 000 000 times! So I took a look at pgsql_stmt_param_hook in ext/pdo_pgsql/pgsql_statement.c and it doesn't seem to do anything for PDO_PARAM_EVT_FETCH_PRE or PDO_PARAM_EVT_FETCH_POST. So if my understanding is correct, it's calling a function that does nothing meaningful 384 000 000 times, and this number grows exponentially with the number of rows and columns.

I'm using postgres, but looking at ext/pdo_mysql code this seems to also be the case for mysql's drivers, didn't benchmark it though.


As a test I commented out the lines dispatching those events in ext/pdo/pdo_stmt.c:

diff --git a/ext/pdo/pdo_stmt.c b/ext/pdo/pdo_stmt.c
index 96f7574638..49703a7d68 100644
--- a/ext/pdo/pdo_stmt.c
+++ b/ext/pdo/pdo_stmt.c
@@ -592,9 +592,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen
                return 0;
        }
 
-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {
-               return 0;
-       }
+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {
+       //      return 0;
+       // }
 
        if (!stmt->methods->fetcher(stmt, ori, offset)) {
                return 0;
@@ -605,9 +605,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen
                return 0;
        }
 
-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {
-               return 0;
-       }
+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {
+       //      return 0;
+       // }
 
        if (do_bind && stmt->bound_columns) {
                /* update those bound column variables now */


After this change, fetching takes ~5ms and nothing seems broken, but the whole INSERT/RETURNING is now 10 times faster.


Am I understanding this right? Could this be solved by letting each pdo_* driver set some kind of a flag (bitmask?) telling PDO which hooks it wants called from dispatch_param_event? For example for pdo_pgsql that flag would be PDO_PARAM_EVT_FREE | PDO_PARAM_EVT_NORMALIZE | PDO_PARAM_EVT_ALLOC | PDO_PARAM_EVT_EXEC_PRE, as other case statements seem empty.

--------------E4D9AC3C93DB72AEAA49E019-- --------------1A59B12C26E8F957F120522E Content-Type: application/x-php; name="bench_pdo.php" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="bench_pdo.php" PD9waHAKCmRlY2xhcmUoc3RyaWN0X3R5cGVzPTEpOwoKZnVuY3Rpb24gZ2VuRGF0YShpbnQg JGNudCk6IGFycmF5IHsKCSRkYXRhID0gW107CgoJZm9yKCRpID0gMDsgJGkgPCAkY250OyAr KyRpKSB7CgkJJGRhdGFbXSA9IFsKCQkJJ3N0cjEnID0+IHVuaXFpZCgpLAoJCQknc3RyMicg PT4gdW5pcWlkKCksCgkJCSdudW0nID0+IHJhbmQoKSwKCQldOwoJfQoKCXJldHVybiAkZGF0 YTsKfQoKJGRhdGEgPSBnZW5EYXRhKDgwMDApOwoKJGRiID0gbmV3IFBETygncGdzcWw6aG9z dD1sb2NhbGhvc3Q7ZGJuYW1lPXRlc3QnLCAndGVzdCcsICd0ZXN0Jyk7CiRkYi0+c2V0QXR0 cmlidXRlKFBETzo6QVRUUl9FUlJNT0RFLCBQRE86OkVSUk1PREVfRVhDRVBUSU9OKTsKCiR2 YWxzID0gW107CmZvcmVhY2goJGRhdGEgYXMgJHJvdykgewoJJHZhbHNbXSA9ICRyb3dbJ3N0 cjEnXTsKCSR2YWxzW10gPSAkcm93WydzdHIyJ107CgkkdmFsc1tdID0gJHJvd1snbnVtJ107 Cn0KJHZhbFNxbCA9IFxpbXBsb2RlKCcsICcsIFxhcnJheV9maWxsKDAsIFxjb3VudCgkZGF0 YSksICcoPywgPywgPyknKSk7CiRzcWwgPSA8PDxTUUwKCUlOU0VSVCBJTlRPIHRoaW5naWVz IChzdHIxLCBzdHIyLCBudW0pCglWQUxVRVMgeyR2YWxTcWx9CglSRVRVUk5JTkcgaWQsIHN0 cjIKU1FMOwoKJHByZXBTdGFydCA9IFxtaWNyb3RpbWUodHJ1ZSk7CiRxdWVyeSA9ICRkYi0+ cHJlcGFyZSgkc3FsKTsKJGV4ZWNTdGFydCA9IFxtaWNyb3RpbWUodHJ1ZSk7CiRxdWVyeS0+ ZXhlY3V0ZSgkdmFscyk7CiRmZXRjaFN0YXJ0ID0gXG1pY3JvdGltZSh0cnVlKTsKJHJldCA9 ICRxdWVyeS0+ZmV0Y2hBbGwoKTsKZWNobyAnR290IHJvd3M6ICcuXGNvdW50KCRyZXQpLlBI UF9FT0w7CiRlbmQgPSBcbWljcm90aW1lKHRydWUpOwoKJHByZXBEdXJhdGlvbiA9IFxudW1i ZXJfZm9ybWF0KCRleGVjU3RhcnQgLSAkcHJlcFN0YXJ0LCAzKTsKJGV4ZWNEdXJhdGlvbiA9 IFxudW1iZXJfZm9ybWF0KCRmZXRjaFN0YXJ0IC0gJGV4ZWNTdGFydCwgMyk7CiRmZXRjaER1 cmF0aW9uID0gXG51bWJlcl9mb3JtYXQoJGVuZCAtICRmZXRjaFN0YXJ0LCAzKTsKCi8vIHZh cl9kdW1wKCRyZXQpOwoKZWNobyAiW1RpbWluZ3NdIHByZXA6IHskcHJlcER1cmF0aW9ufSwg ZXhlYzogeyRleGVjRHVyYXRpb259LCBmZXRjaDogeyRmZXRjaER1cmF0aW9ufSIuUEhQX0VP TDsK --------------1A59B12C26E8F957F120522E Content-Type: application/sql; name="test-schema.sql" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="test-schema.sql" Q1JFQVRFIFRBQkxFIHRoaW5naWVzICgKCWlkIFNFUklBTCBQUklNQVJZIEtFWSwKCXN0cjEg VEVYVCBOT1QgTlVMTCwKCXN0cjIgVEVYVCBOT1QgTlVMTCwKCW51bSBJTlRFR0VSIE5PVCBO VUxMCik7Cg== --------------1A59B12C26E8F957F120522E--