Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:124651 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 939221A00B7 for ; Sat, 27 Jul 2024 14:31:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1722090790; bh=kISFe6VSy+ACNyZFRzrGQnJcpZaCK+TJIVf+ltIkl9M=; h=In-Reply-To:References:Date:From:To:Subject:From; b=XkHrSGJ0pGpgx+PvPmJoZGBB6JzTHb8Qf4AIVBAdWV3JEue+UMyOcyyyM+Ihm6DpV DlRZeasBg0FGwYZYYArBODC3zCKoi/IgVud9rwnJ3eJ1oDLwQVJrZkmcmogAKMy7ly BO2q3pWVi7Ua8Ue9Hy8EFqyes6ofO/qKpiJa3UIVnCrfbKO7N5XtoUEBa1TK9dqXbS XdwCXWUSednlNrYS6dYYL7mUiPLr3QROduOaTSTPB0tLstEiB3TmHbfXNSH7YDl0dY N65dMwc7I/46z2r8gZ7Nw+kF0dRqUn4ElhtjmJ6MxsReoj5usgu+gV8skiFOpuNgny Dl8cGEFdVKsJQ== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id A52421806D1 for ; Sat, 27 Jul 2024 14:33:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,HTML_MESSAGE, RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fout5-smtp.messagingengine.com (fout5-smtp.messagingengine.com [103.168.172.148]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 27 Jul 2024 14:33:08 +0000 (UTC) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailfout.nyi.internal (Postfix) with ESMTP id 3A6CB13801B4; Sat, 27 Jul 2024 10:31:32 -0400 (EDT) Received: from imap49 ([10.202.2.99]) by compute3.internal (MEProxy); Sat, 27 Jul 2024 10:31:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bottled.codes; h=cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1722090692; x=1722177092; bh=c3+fuNaeWR zVzxWVehSWOkMBclyNiJTm/m37jlVhikk=; b=V0j5DJ8dDjp2s1Gq4aeqOB3mp1 nNU0u3xxq3OA+tugC8YnbCD7bg2SBAJY1qlj2QyCP48vlXGQkxDAilVhpJQpl0xB bFX6MoDbmS/PZXtnXe4eMXm0bvc71RjncVL6UhXPRwBn+mJzGbzbOIJGpq/oTL5R jUtq/HLSTQRskFheYp80HYH+6Z89mwYwNmpPU6ZIIqshfufKUN9Fbu+3Agb5WZx5 jKvGNiDA6ugcYoQRmPmr3Hn7uoIs8EnuW4CGhG3ULR8kRpYizmg8cFGdsbNAyt5q O/tmbRefpgbGgin5nYJWDJMU51gmi9G5EvTSzd7CS+SDYj3ft0uJX2fjaKaw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1722090692; x=1722177092; bh=c3+fuNaeWRzVzxWVehSWOkMBclyN iJTm/m37jlVhikk=; b=qQ8+9xMdbnACmNTpGMkz8v1a+g5zdLe+oS9LKvX3pl7I bKjf98msmbWY5rUuIZTHwxhjD/T23uYmKSLUrmZ2EtC70X+/8f8JkJRyOwSkodYK OqTjPJwt56ayVicgJUsdmPrQ2nsSbGhSZz1fddqiCJzSXoINzgMowllXTHrJKdRr UQHWg8ymf8RtF5H9ks7Cd5LQsp5wyTBSHeT983pV41fQtC5yFib7vd327pIP3pGc ehVn8a4DABIrBPYgfwcaT+FqxOrswiAnrmd/JVhgDqvG5f627O32P5PLQkFjVwLV jRdHkZ3V4GS2GK0rm5lo2HTOKeFb/w1bE3cppVJV+w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrieejgdejlecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefofgggkfgjfhffhffvufgtsegrtderreerreejnecuhfhrohhmpedftfhosgcu nfgrnhguvghrshdfuceorhhosgessghothhtlhgvugdrtghouggvsheqnecuggftrfgrth htvghrnhepudelteeggfevfedtgeekhfffffefuefhveevgfduuefhudegudekuefhgedu tedunecuffhomhgrihhnpehgihhthhhusgdrtghomhdpphhhphhinhhtvghrnhgrlhhssg hoohhkrdgtohhmpdhphhhprdhnvghtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghr rghmpehmrghilhhfrhhomheprhhosgessghothhtlhgvugdrtghouggvshdpnhgspghrtg hpthhtoheptd X-ME-Proxy: Feedback-ID: ifab94697:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 9D0AC15A0092; Sat, 27 Jul 2024 10:31:31 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-582-g5a02f8850-fm-20240719.002-g5a02f885 Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Message-ID: In-Reply-To: <5155581a-3531-4af7-98d3-e637accac16e@gmx.de> References: <92c4514f-70e3-75c9-7084-9e29641e25e7@gmail.com> <7e86a2d2-b971-592c-64e3-e86c13b5be80@cubiclesoft.com> <5155581a-3531-4af7-98d3-e637accac16e@gmx.de> Date: Sat, 27 Jul 2024 16:31:09 +0200 To: "Christoph M. Becker" , "Rowan Tommins" , internals@lists.php.net, "Thomas Hruska" Subject: Re: [PHP-DEV] [RFC] Working With Substrings Content-Type: multipart/alternative; boundary=d558cd7714c84ed49a020f51ad73180f From: rob@bottled.codes ("Rob Landers") --d558cd7714c84ed49a020f51ad73180f Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable On Sat, Jul 27, 2024, at 15:26, Christoph M. Becker wrote: > On 15.02.2023 at 06:18, Rowan Tommins wrote: >=20 > > On 15 February 2023 02:35:42 GMT, Thomas Hruska wrote: > > > >> On 2/14/2023 2:02 PM, Rowan Tommins wrote: > >> > >> I thought about that but didn't know how well it would be received = nor, perhaps more importantly, the direction it should take (i.e. a form= al Zend type in the engine, extending the existing zend_string type, a c= lass, some combination, or something else entirely). All of the more ad= vanced options I came up with would have required some code changes to t= he PHP source itself with a new data type being the most involved and pr= obably the most controversial. > > > > My instinct was that it could just be a built-in class, with an inte= rnal pointer to a zend_string that's completely invisible to userland. S= omething like how the SimpleXML and DOM objects just point into a libxml= parse result. > > > > Then to add to existing functions requires changing an argument type= from string to string|Buffer, rather than adding new arguments. > > > > No change to the type system needed, internally or externally, just = some code to unwrap the pointer. But perhaps I'm being naive and oversim= plifying, as I don't have a deep understanding of the engine. > > > >> I'm not entirely sure what the next step here should be. Should I = go research the above, or go back and develop/test and then propose some= thing concrete in an OO direction and gather feedback at that point, or = should we hash it out a bit more here on the list to get a more specific= direction to go in? > > > > Well, those were just my thoughts; maybe someone else will come alon= g shortly with a very different take. >=20 > I'm very late on this discussion, but I think it is an interesting > topic, and maybe , which I > had written long ago just to check some assumptions, can serve as POC. > It is certainly possible to have such a string buffer class without > having to patch the engine; it could even be made available as PECL > extension (first). >=20 > Note that this StringBuilder uses `smart_str`s[1] what might be a good > idea or not. But certainly you could use some other internal handling; > interoperability with `zend_string`s[2] requires to copy the char arra= ys > in most cases anyway, since these have a fixed length, and if these > copies are reduced to a minimum (i.e. the new class has enough > flexibility to work without casting to and from string), that should be > bearable. >=20 > Not sure if that would work for the "gd imageexportpixels() and > imageimportpixels()" RFC[3], but it might be worth investigating. >=20 > [1] > > [2] > > [3] >=20 > Cheers, > Christoph >=20 Huh, I am also very late and somewhat poignant, last weekend, I managed = to refactor all zend_strings to contain a char* instead of char[1] and t= he char* pointed to the memory just after the pointer. It increased zend= _string by a few bytes on a 64bit machine, but would allow for some nice= optimizations, such as zend_strings sharing memory (effectively removin= g the need for the current interned strings implementation). I ended up = ditching it because it would break literally every extension that does i= ts own allocations instead of calling zend_string_alloc|init() and it wa= s also hard to manage when copying strings, which also some core extensi= ons do instead of calling core zend_string_* functions. Needless to say,= "vanilla php" worked fine and all tests passed. I did submit a small part of my refactoring here: https://github.com/php= /php-src/pull/15054 but even something that simple didn't seem well rece= ived. So, I won't continue this approach. But, fwiw, I wouldn't advise changing zend_strings too much, many extens= ions appear to do one of two things: their own allocations and/or their = own copying and/or their own freeing. =E2=80=94 Rob --d558cd7714c84ed49a020f51ad73180f Content-Type: text/html;charset=utf-8 Content-Transfer-Encoding: quoted-printable
On Sat, Jul 27,= 2024, at 15:26, Christoph M. Becker wrote:
On 15.02.2023 at 06:18, Rowan Tommins w= rote:

> On 15 February 2023 02:35:42 GMT= , Thomas Hruska <thruska@c= ubiclesoft.com> wrote:
>
>> = On 2/14/2023 2:02 PM, Rowan Tommins wrote:
>>
>> I thought about that but didn't know how well it would = be received nor, perhaps more importantly, the direction it should take = (i.e. a formal Zend type in the engine, extending the existing zend_stri= ng type, a class, some combination, or something else entirely).  A= ll of the more advanced options I came up with would have required some = code changes to the PHP source itself with a new data type being the mos= t involved and probably the most controversial.
>
> My instinct was that it could just be a built-in class, wi= th an internal pointer to a zend_string that's completely invisible to u= serland. Something like how the SimpleXML and DOM objects just point int= o a libxml parse result.
>
> Then to a= dd to existing functions requires changing an argument type from string = to string|Buffer, rather than adding new arguments.
>
> No change to the type system needed, internally or ext= ernally, just some code to unwrap the pointer. But perhaps I'm being nai= ve and oversimplifying, as I don't have a deep understanding of the engi= ne.
>
>> I'm not entirely sure what= the next step here should be.  Should I go research the above, or = go back and develop/test and then propose something concrete in an OO di= rection and gather feedback at that point, or should we hash it out a bi= t more here on the list to get a more specific direction to go in?
>
> Well, those were just my thoughts; mayb= e someone else will come along shortly with a very different take.

I'm very late on this discussion, but I think it= is an interesting
had written long ago just to chec= k some assumptions, can serve as POC.
It is certainly poss= ible to have such a string buffer class without
having to = patch the engine; it could even be made available as PECL
= extension (first).

Note that this StringBui= lder uses `smart_str`s[1] what might be a good
idea or not= .  But certainly you could use some other internal handling;
interoperability with `zend_string`s[2] requires to copy the cha= r arrays
in most cases anyway, since these have a fixed le= ngth, and if these
copies are reduced to a minimum (i.e. t= he new class has enough
flexibility to work without castin= g to and from string), that should be
bearable.
<= div>
Not sure if that would work for the "gd imageexportpi= xels() and
imageimportpixels()" RFC[3], but it might be wo= rth investigating.

[1]
<https://www.phpinternalsbook.com/php7/internal_types/strin= gs/smart_str.html>
[2]

Cheers,
=
Christoph


Huh, I am also very late and somewhat poignant, last weekend, I ma= naged to refactor all zend_strings to contain a char* instead of char[1]= and the char* pointed to the memory just after the pointer. It increase= d zend_string by a few bytes on a 64bit machine, but would allow for som= e nice optimizations, such as zend_strings sharing memory (effectively r= emoving the need for the current interned strings implementation). I end= ed up ditching it because it would break literally every extension that = does its own allocations instead of calling zend_string_alloc|init() and= it was also hard to manage when copying strings, which also some core e= xtensions do instead of calling core zend_string_* functions. Needless t= o say, "vanilla php" worked fine and all tests passed.
I did submit a small part of my refactoring here: https://github.com/php/= php-src/pull/15054 but even something that simple didn't seem w= ell received. So, I won't continue this approach.

But, fwiw, I wouldn't advise changing zend_strings too much, many= extensions appear to do one of two things: their own allocations and/or= their own copying and/or their own freeing.

=E2=80=94 Rob
--d558cd7714c84ed49a020f51ad73180f--