Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:119564
Return-Path: <flexjoly@gmail.com>
Delivered-To: mailing list internals@lists.php.net
Received: (qmail 40928 invoked from network); 16 Feb 2023 13:37:50 -0000
Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5)
  by pb1.pair.com with SMTP; 16 Feb 2023 13:37:50 -0000
Received: from php-smtp4.php.net (localhost [127.0.0.1])
	by php-smtp4.php.net (Postfix) with ESMTP id 0F46918033A
	for <internals@lists.php.net>; Thu, 16 Feb 2023 05:37:49 -0800 (PST)
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net
X-Spam-Level: 
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE,
	RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,
	T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2
X-Spam-ASN: AS15169 209.85.128.0/17
X-Spam-Virus: No
X-Envelope-From: <flexjoly@gmail.com>
Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by php-smtp4.php.net (Postfix) with ESMTPS
	for <internals@lists.php.net>; Thu, 16 Feb 2023 05:37:48 -0800 (PST)
Received: by mail-ed1-f44.google.com with SMTP id fi26so2828339edb.7
        for <internals@lists.php.net>; Thu, 16 Feb 2023 05:37:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=to:subject:message-id:date:from:in-reply-to:references:mime-version
         :from:to:cc:subject:date:message-id:reply-to;
        bh=ALMStcoNhOA6VU9sH3eykWFM7IQx9m5lng8P5eFeI1E=;
        b=KwU+nvtFflgv3dDonC/NEt6wGmo+r3Xepd0XS9neOFFkOMhLtoS7zzOzKFv0ktFIV5
         HAmm+i/FIZb/yWzFixx55ORrL3X/v/BuWYCJP1loSYNg+DpGRLH9P/bp7F4ubfEESP/A
         ie8Ax4RhVsjWxoF1ClMAP0Yf5/BQi/AIqBCmajPuhuW8cvIMQkzBixLTxVoZ+rMLX+n5
         zMW7x9Ljqis1716p3NG1bhDWJnKtrIt+46UC/xeNqh3WzGsQqmpKkovbtAxa7cpNnSC6
         U7JuQKbTR+FMuQgBhf8Usb/dCZrLmsqenAa1zVNVHoRzOGPe+psRt0cczEHvjPoZ7cLo
         bgeQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=to:subject:message-id:date:from:in-reply-to:references:mime-version
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=ALMStcoNhOA6VU9sH3eykWFM7IQx9m5lng8P5eFeI1E=;
        b=hCAUFF8jv0rFbWPCfO9H6Cj4XtpZ/7/4AhWaDlOPfU3Emcy8/putxVBZ41Zs3loBq3
         F8w0jdv/CoKQ+ntzaMyBk/1nKHEVUMxZAolIfEl9YHS8Jp1BUw3VVe8cHvCrggsgC3Rb
         y5t/IS25Ua0sL42G06LnRA07zqq3pZ9J3otOmfnASQ7nKANZipcyb54gCbFQFOmnmCgD
         nWhiwhDsCGgE2O6E1a5a7tLlqPehI8GMLr2ZRNHcUBy534cfCP4Slh2cV10tNYmryF5A
         qGMb0vwjKbmieYNbaThF2VOSQSZvaM/waMuyo/N4LzXVqKVFSYbrxH7v6N74zZ2ruR+v
         k0EQ==
X-Gm-Message-State: AO0yUKX9gox+NhMdeibJleTj9WYqXtKsLVn56GX3ZKf/jv9Z0nVwOQAE
	/iMVqb61ZgvWf2yL1Nwi8gkYDqN943/yKgNe2uJ8opt7EgU=
X-Google-Smtp-Source: AK7set8qKGqFB8ZO9JtSPdyo1BAgY6GvLkbrlIY+9Hj+b6hhn5GNzs8NfKax4+xDd1X0Wem87oDR+OqGOBZeXfOxDpE=
X-Received: by 2002:a50:9986:0:b0:4a2:5b11:1a47 with SMTP id
 m6-20020a509986000000b004a25b111a47mr3037765edb.2.1676554666696; Thu, 16 Feb
 2023 05:37:46 -0800 (PST)
MIME-Version: 1.0
References: <e352423f-b740-07c9-2c4a-996112e17bbe@cubiclesoft.com>
 <92c4514f-70e3-75c9-7084-9e29641e25e7@gmail.com> <7e86a2d2-b971-592c-64e3-e86c13b5be80@cubiclesoft.com>
 <E963ED74-0404-4A5D-9811-8D1E662F764A@gmail.com> <84204896-F9CE-4186-8A72-573A0B46FC1D@gmail.com>
 <CAM9Wwz7Si98GDoJHaUKoJtOWt_UzzkjacohP4Z0XdRJsMnOPgg@mail.gmail.com> <68DBCD9C-849A-4840-9437-AE59F90A8B9C@php.net>
In-Reply-To: <68DBCD9C-849A-4840-9437-AE59F90A8B9C@php.net>
Date: Thu, 16 Feb 2023 14:37:19 +0100
Message-ID: <CAM9Wwz79raER=ovZ7V85rjf1ZSLAx3yqdBvDmHpccHtAULr6rQ@mail.gmail.com>
To: internals@lists.php.net
Content-Type: multipart/alternative; boundary="0000000000005ae6d605f4d14dc7"
Subject: Re: [PHP-DEV] [RFC] Working With Substrings
From: flexjoly@gmail.com (Lydia de Jongh)

--0000000000005ae6d605f4d14dc7
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Derick, Thomas,


Op do 16 feb. 2023 om 08:57 schreef Derick Rethans <derick@php.net>:

>
>
> https://wiki.php.net/rfc/unicode_text_processing
>
> And yes, that won't be as fast as just calling strtoupper.
>
> cheers
> Derick
>

Looks great!!!

Complex string manipulation inside an object will be faster then all
copying variables around in memory,
like Thomas kindly explained in his post. If I understand correctly....

And it would make php even more mature, gaining from more OOP.



Op wo 15 feb. 2023 om 20:35 schreef Thomas Hruska <thruska@cubiclesoft.com>=
:

> <......>

Doing that operation one time is fast enough and not really a problem.
> Doing it 1,000,000 times in a loop is where we end up constantly copying
> memory around when we could potentially work on the same memory buffer
> the entire time.  We still might end up using the same memory buffers
> over and over due to recycling them through the PHP memory pool, which
> means the buffers might get to sit in the L1 or L2 cache in the CPU, but
> it does leave some performance on the table because copying a buffer or
> portions of it repeatedly can be an unnecessary operation.  Buffers that
> are larger than the CPU's cache line sizes are going to suffer the most
> because there will be constant requests to main memory for the
> information that the CPU needs to modify and will constantly flush the
> cache lines and stall out while waiting for more data to arrive.  That's
> not exactly optimal/ideal.  Modifying the same buffer inline will be
> more likely stay in the L1 and L2 cache lines and therefore be much
> closer to the CPU core, resulting in notably faster performance.
> Pointers in C are much faster than copying memory.  The problem is
> exposing pointers to userland, especially in Internet-facing software.
> Pointers are notoriously unsafe - just look at the zillion buffer
> overflow vulnerabilities (CVEs) that are reported annually across all
> software products.  Copy-on-write, by comparison, is a much safer
> operation at the cost of performance.  However, pointers let us just
> point at a substring or general chunk of memory instead of copying it,
> which significantly reduces the overhead since pointers are simple
> integer values that contain a memory address.  And those values are
> small enough to sit in CPU registers, which are blazing fast.  CPUs only
> have a handful of registers though because each register dramatically
> increases the cost of the CPU die.  So if we can just point at the

memory we want to "extract" instead of actually copying the data into
> its own string object, we can potentially save a ton of CPU cycles,
> especially when working with data inside a loop.
>
>
> Overall, I think substrings offer the most obvious/apparent area for
> performance gains and probably have, implementation details aside, the
> least amount of friction.  But maybe we should consider the larger
> ecosystem of string functions as well?  Or should this just be a
> possible longer term idea that requires more thought and research and
> thus the scope should be limited and we put Lydia's idea under Future
> Scope in the RFC?  Other thoughts/comments?
>
> Added as Open Issue 10 to the RFC.  Thank you for your input.
>
> Thomas Hruska
>

Thanks for your kind and extended explanation.
I know a little about the memory allocations.

But I am not sure about what to conclude from your explanation. If an
object would take less copying around or not.

This memory conversation brings up other old memories =E2=98=BA... peek, po=
ok,
assembly etc =F0=9F=98=8D

Greetz, flexJoly (aka Lydia)

--0000000000005ae6d605f4d14dc7--