Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119564 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 40928 invoked from network); 16 Feb 2023 13:37:50 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 16 Feb 2023 13:37:50 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 0F46918033A for ; Thu, 16 Feb 2023 05:37:49 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 16 Feb 2023 05:37:48 -0800 (PST) Received: by mail-ed1-f44.google.com with SMTP id fi26so2828339edb.7 for ; Thu, 16 Feb 2023 05:37:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=ALMStcoNhOA6VU9sH3eykWFM7IQx9m5lng8P5eFeI1E=; b=KwU+nvtFflgv3dDonC/NEt6wGmo+r3Xepd0XS9neOFFkOMhLtoS7zzOzKFv0ktFIV5 HAmm+i/FIZb/yWzFixx55ORrL3X/v/BuWYCJP1loSYNg+DpGRLH9P/bp7F4ubfEESP/A ie8Ax4RhVsjWxoF1ClMAP0Yf5/BQi/AIqBCmajPuhuW8cvIMQkzBixLTxVoZ+rMLX+n5 zMW7x9Ljqis1716p3NG1bhDWJnKtrIt+46UC/xeNqh3WzGsQqmpKkovbtAxa7cpNnSC6 U7JuQKbTR+FMuQgBhf8Usb/dCZrLmsqenAa1zVNVHoRzOGPe+psRt0cczEHvjPoZ7cLo bgeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ALMStcoNhOA6VU9sH3eykWFM7IQx9m5lng8P5eFeI1E=; b=hCAUFF8jv0rFbWPCfO9H6Cj4XtpZ/7/4AhWaDlOPfU3Emcy8/putxVBZ41Zs3loBq3 F8w0jdv/CoKQ+ntzaMyBk/1nKHEVUMxZAolIfEl9YHS8Jp1BUw3VVe8cHvCrggsgC3Rb y5t/IS25Ua0sL42G06LnRA07zqq3pZ9J3otOmfnASQ7nKANZipcyb54gCbFQFOmnmCgD nWhiwhDsCGgE2O6E1a5a7tLlqPehI8GMLr2ZRNHcUBy534cfCP4Slh2cV10tNYmryF5A qGMb0vwjKbmieYNbaThF2VOSQSZvaM/waMuyo/N4LzXVqKVFSYbrxH7v6N74zZ2ruR+v k0EQ== X-Gm-Message-State: AO0yUKX9gox+NhMdeibJleTj9WYqXtKsLVn56GX3ZKf/jv9Z0nVwOQAE /iMVqb61ZgvWf2yL1Nwi8gkYDqN943/yKgNe2uJ8opt7EgU= X-Google-Smtp-Source: AK7set8qKGqFB8ZO9JtSPdyo1BAgY6GvLkbrlIY+9Hj+b6hhn5GNzs8NfKax4+xDd1X0Wem87oDR+OqGOBZeXfOxDpE= X-Received: by 2002:a50:9986:0:b0:4a2:5b11:1a47 with SMTP id m6-20020a509986000000b004a25b111a47mr3037765edb.2.1676554666696; Thu, 16 Feb 2023 05:37:46 -0800 (PST) MIME-Version: 1.0 References: <92c4514f-70e3-75c9-7084-9e29641e25e7@gmail.com> <7e86a2d2-b971-592c-64e3-e86c13b5be80@cubiclesoft.com> <84204896-F9CE-4186-8A72-573A0B46FC1D@gmail.com> <68DBCD9C-849A-4840-9437-AE59F90A8B9C@php.net> In-Reply-To: <68DBCD9C-849A-4840-9437-AE59F90A8B9C@php.net> Date: Thu, 16 Feb 2023 14:37:19 +0100 Message-ID: To: internals@lists.php.net Content-Type: multipart/alternative; boundary="0000000000005ae6d605f4d14dc7" Subject: Re: [PHP-DEV] [RFC] Working With Substrings From: flexjoly@gmail.com (Lydia de Jongh) --0000000000005ae6d605f4d14dc7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Derick, Thomas, Op do 16 feb. 2023 om 08:57 schreef Derick Rethans : > > > https://wiki.php.net/rfc/unicode_text_processing > > And yes, that won't be as fast as just calling strtoupper. > > cheers > Derick > Looks great!!! Complex string manipulation inside an object will be faster then all copying variables around in memory, like Thomas kindly explained in his post. If I understand correctly.... And it would make php even more mature, gaining from more OOP. Op wo 15 feb. 2023 om 20:35 schreef Thomas Hruska = : > <......> Doing that operation one time is fast enough and not really a problem. > Doing it 1,000,000 times in a loop is where we end up constantly copying > memory around when we could potentially work on the same memory buffer > the entire time. We still might end up using the same memory buffers > over and over due to recycling them through the PHP memory pool, which > means the buffers might get to sit in the L1 or L2 cache in the CPU, but > it does leave some performance on the table because copying a buffer or > portions of it repeatedly can be an unnecessary operation. Buffers that > are larger than the CPU's cache line sizes are going to suffer the most > because there will be constant requests to main memory for the > information that the CPU needs to modify and will constantly flush the > cache lines and stall out while waiting for more data to arrive. That's > not exactly optimal/ideal. Modifying the same buffer inline will be > more likely stay in the L1 and L2 cache lines and therefore be much > closer to the CPU core, resulting in notably faster performance. > Pointers in C are much faster than copying memory. The problem is > exposing pointers to userland, especially in Internet-facing software. > Pointers are notoriously unsafe - just look at the zillion buffer > overflow vulnerabilities (CVEs) that are reported annually across all > software products. Copy-on-write, by comparison, is a much safer > operation at the cost of performance. However, pointers let us just > point at a substring or general chunk of memory instead of copying it, > which significantly reduces the overhead since pointers are simple > integer values that contain a memory address. And those values are > small enough to sit in CPU registers, which are blazing fast. CPUs only > have a handful of registers though because each register dramatically > increases the cost of the CPU die. So if we can just point at the memory we want to "extract" instead of actually copying the data into > its own string object, we can potentially save a ton of CPU cycles, > especially when working with data inside a loop. > > > Overall, I think substrings offer the most obvious/apparent area for > performance gains and probably have, implementation details aside, the > least amount of friction. But maybe we should consider the larger > ecosystem of string functions as well? Or should this just be a > possible longer term idea that requires more thought and research and > thus the scope should be limited and we put Lydia's idea under Future > Scope in the RFC? Other thoughts/comments? > > Added as Open Issue 10 to the RFC. Thank you for your input. > > Thomas Hruska > Thanks for your kind and extended explanation. I know a little about the memory allocations. But I am not sure about what to conclude from your explanation. If an object would take less copying around or not. This memory conversation brings up other old memories =E2=98=BA... peek, po= ok, assembly etc =F0=9F=98=8D Greetz, flexJoly (aka Lydia) --0000000000005ae6d605f4d14dc7--