Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125512 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 2414C1A00BD for ; Wed, 11 Sep 2024 20:43:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1726087516; bh=Yr+hsSMXDsSJcMCSZSuQTmL9JmDBrecKKQ7F0SsQD5g=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=iXG0JRrurj01365OgYJV6zQ3O88HUtRuMaXSw7Bq3VZO/xnR7A5ZkMcKe23dMJrKy tQ37nexVF0O1JpFKuWsxqW6sZx2wK2nXwq6yrT5FswBcFt8v+B7mZvjgGik6zTity/ YUasH+NpqxRLdt8vNZDmRbwkMKZgKcJN1goI1chHEBWcJTPg1l+TLWqDJXrCQ89ddO V/yYP+r+avVZeOTsPy02ejkNjYuqpncT3Dofwk7BzdET5bbACxnF6/E+1GxEnaMXEt DY4tbBQqOBeq3kUIArVj4zoxjSrQ+PHlv3sxF+0vOoJ2OgNDChXFr4LaRT5y2XYVgW 6lOzyTcRMhBog== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 663D618007D for ; Wed, 11 Sep 2024 20:45:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: * X-Spam-Status: No, score=1.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS,URI_DOTEDU autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 11 Sep 2024 20:45:14 +0000 (UTC) Received: by mail-ot1-f54.google.com with SMTP id 46e09a7af769-710f63ff31eso93494a34.1 for ; Wed, 11 Sep 2024 13:43:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726087392; x=1726692192; darn=lists.php.net; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dJNsaFnvPANdvCsZkVXboWWma0mR3NSn4GBOanv86CM=; b=ciPyV6MkGkZjAgvX5lfgoAvtEMWwxMfg4x83Gtjt5k67TCxQezf7K7JezbrqS+pYE5 mfwhqKUgQSfxGV0UG0V+huK9mGEy56IyWXgswZun64hSjuC9/hXZ+ZEhPLluUPdZDE98 EiwCF/wlcJ/WjD9p+p9unziNFmbcKgoIS/E+O+NJr47Cj4k2gUsjvOIzoGC2MISZqTHV 2DRvIsdOBGxvuLJS5kbVnNcby4qSgQb7S1ZdW3nG2Yuh+w8DW6aek0NrxnDlVsRkmn75 LtYeapMWNSaSdtzx3+N03RhGt6M0whAWYz1gjjvsdE54a+fN6E5Iq+4tQYgp8+xTOCrc KoAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726087392; x=1726692192; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dJNsaFnvPANdvCsZkVXboWWma0mR3NSn4GBOanv86CM=; b=BQLuwWHbSJ2JISI2/7r9ynYMZXBnNnMrcDP2EkUVtEwOuZm2Xdjp3+clq+dK1dFZUU Hgsdg/iGa6j/3eNl9eX19LOgwfdqr4N0/T8HUlxBdC5yp1Pgiy5FiB8MzE5PyB5ASGUe FGuMHk5dExfc35iJlGmM9n/HWyRjx1rNvqLJ7u/dWJpCg6DfuyzbJ+s7HmH8aElxN7MC ST6JB+KmgpGrz5olRJzaZ8ClUVvrstLSWCjW1ljb+mxLIGhkNBp542mlLc3j5qQnsZvT EVEIKiEndcMsMXlepUEPrVkti6egDBuELOiCFdnnqxRDlVRgoBOUhgswnsgsA8JWdRcl jMOQ== X-Forwarded-Encrypted: i=1; AJvYcCVuQJTHRxfb53rrijTH0fghT3ThBbkAibrRtsd6fmMIXhmPj5v57oZ6O3xXiUoBlwqxmsR0Xl+LXA0=@lists.php.net X-Gm-Message-State: AOJu0Yyyu1rYPtLRtxFcbQV8V/RZnNyE3uLRwI/w7ciUICa326eRcTya 6N9iL7RfMqqw2GP5Q5KCJJEcIZR6NfW9drnlj0fTaxO8iaY/xp3stySieLr1lSnF1G/+a6gVU++ JXCKLzHBzg/GrAin7JMukAozDHVsksIWL X-Google-Smtp-Source: AGHT+IEAK8nTfkBLLItQXEcMk2Zui3Gb91cilUDc1sQE5o1Rk25MAqBqpoaxvs/VHX4Ziy2cjza1KncJm9mm1yK6AMM= X-Received: by 2002:a05:6830:25d4:b0:710:f22b:c825 with SMTP id 46e09a7af769-71109b65fe4mr197732a34.1.1726087392150; Wed, 11 Sep 2024 13:43:12 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 References: <8D420123-4ECF-48FD-A9C3-F80C60457A37@newclarity.net> In-Reply-To: <8D420123-4ECF-48FD-A9C3-F80C60457A37@newclarity.net> Date: Wed, 11 Sep 2024 14:43:01 -0600 Message-ID: Subject: Re: [PHP-DEV] Zephir, and other tangents To: Mike Schinkel Cc: "Rowan Tommins [IMSoP]" , internals@lists.php.net Content-Type: multipart/alternative; boundary="000000000000dc4fb40621de09cd" From: hamiegold@gmail.com (Hammed Ajao) --000000000000dc4fb40621de09cd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Sep 11, 2024 at 1:13=E2=80=AFPM Mike Schinkel = wrote: > Hi Rowan, > > > On Sep 11, 2024, at 2:55 AM, Rowan Tommins [IMSoP] > wrote: > > Perhaps you're unaware that classes in core already can, and do, provid= e > operator overloading. GMP is the "poster child" for it, overloading a bun= ch > of mathematical operators, but the mechanism it uses to do so is reasonab= ly > straightforward and available to any extension. > > I was making an (evidently) uninformed assuming that it was non-trivial t= o > add operator overloading at the C level. If it is easy, then my comments > were moot. > > That said, writing extensions in C and deploying them is non-trivial > =E2=80=94comparing to writing code in PHP=E2=80=94 so there is that. =C2= =AF\_(=E3=83=84)_/=C2=AF > > > I've never liked that approach, because it means users can't write > polyfills, or even stub objects, that have these special behaviours. It > feels weird for the language to define behaviour that isn't expressible i= n > the language. > > Understood. In _general_ I don't like it either, but I will use as an > analogy a prior discussion regarding __toArray, and I quote[1]: > > "For the "convertible to array" case, I think __toArray, or an interface > specifying just that one method, would make more sense than combining it > with the existing interfaces. I'm sceptical of that concept, though, > because most objects could be converted to many different arrays in > different circumstances, each of which should be given a different and > descriptive name." > > I am of course quoting you. > > Similarly, operators could mean different things, e.g. it is possible to > have different meaning of equal, and even different meanings of plus. Or > worse be applied in ways that are non-sensical to anybody but the develop= er > who implements them (that would be the same kind of developer who names > their variables after Game of Thrones characters.) > > That is why I am not a fan of operator overloading, just as you were not = a > fan of __toArray which to me is less problematic than overloaded operator= s > because it has such smaller scope and is actually quote useful for a comm= on > set of use-cases regardless of the potential for confusion. But I digress= . > > > It also risks conflicting with a future language feature that overlaps, > as happened with all native functions marked as accepting string > automatically coercing nulls, but all userland ones rejecting it. > Deprecating that difference has caused a lot of friction. > > That is a little different in that it was a behavior that occurred in bot= h > core and userland whereas only allowing operator overloading in core woul= d > mean there would be not userland differences that could conflict. > > Whatever the case, if there are only two options: 1.) no operator > overloading, and 2.) userland operator overloading I would far prefer the > former. > > > This is the tricky part for me: some of the things people want to do in > extensions are explicitly the kinds of thing a shared host would not want > them to, such as interface to system libraries, perform manual memory > management, interact with other processes on the host. > > > > If WASM can provide some kind of sandbox, while still allowing a good > portion of the features people actually want to write in extensions, I ca= n > imagine that being useful. But how exactly that would work I have no idea= , > so can't really comment further. > > WebAssembly has a deny-by-default design so could be something to > seriously consider for extensibility in PHP. Implementations start with a > full sandbox[2] and only add what they need to avoid those kinds of > concerns. > > Also, all memory manipulations sandboxed, though there are still potentia= l > vulnerabilities within the sandbox so the project that incorporates WASM > needs to be careful. WASM written in C/C++ can have memory issues just > like in regular C/C++, for example. One option would be to allow only > AssemblyScript source for WASM. Another would be a config option that a > web-host could set to only allow signed modules, but that admittedly woul= d > open another can of worms. But the memory issues cannot leak out of the > module or affect other modules nor the system, if implemented with total > memory constraints. > > That said, web hosts can't stop PHP developers from creating infinite > loops so the memory issues with WASM don't feel like too much bigger of a > concern given their sandboxed nature. I've copied numerous other links f= or > reference: [4][5][6] > > > >>> The overall trend is to have only what's absolutely necessary in an > extension. > >> > >> Not sure what you mean here. > > > > I mean, like Phalcon plans to, ship both a binary extension and a PHP > library, putting only certain essential functionality in the extension. > It's how MongoDB ships their PHP bindings, for instance - the extension > provides low-level protocol support which is not intended for every day > use; the library is then free to evolve the user-facing parts more freely= . > > Gotcha. > > I think that actually supports what I was saying; people would gravitate > to only doing in an extension what they cannot do in PHP itself, and over > time if PHP itself improves there is reason to migrate more code to PHP. > > But there can still be reasons to not allow some thing in userland. Some > things like __toArray. > > -Mike > > [1] https://www.mail-archive.com/internals@lists.php.net/msg100001.html > [2] > https://thenewstack.io/how-webassembly-offers-secure-development-through-= sandboxing/ > [3] https://radu-matei.com/blog/practical-guide-to-wasm-memory/ > [4] > https://www.cs.cmu.edu/~csd-phd-blog/2023/provably-safe-sandboxing-wasm/ > [5] https://chatgpt.com/share/b890aede-1c82-412a-89a9-deae99da506e > [6] https://www.assemblyscript.org/ Using WebAssembly (Wasm) for PHP doesn't make much sense. PHP already runs on its own virtual machine server-side, so adding another VM (Wasm) would just introduce unnecessary complexity and overhead. Additionally, would this be the LLVM or Cranelift variant of Wasm? For extensions, Wasm would perform even worse than current implementations, no matter how it's integrated. Presently, I define zif_handler function pointers that operate on the current execution frame and return value, triggered when the engine detects an internal function (fbc). This approach is as direct as it gets. Suggesting AssemblyScript, especially in this context, seems illogical. Have you actually worked with WebAssembly and considered performance implications, or is this based on theoretical knowledge? Your point about operator overloading doesn't seem valid either. Consider the following: ```php class X { public function plus(X $that) {} public function equals(X $that) {} } ``` In this case, `plus` could represent any behavior, as could `equals`. If I wanted to, I could implement `plus` to perform what `equals` does and vice versa. Should we consider methods broken just because their names can be arbitrary? PHP already distinguishes between comparison operators for objects: ```php hash =3D=3D=3D $that->hash` when `$this =3D=3D $that` is invoked, instead of all properties? Without operator overloading, I'd have to define an `equals` method and replace every `$obj =3D=3D $x` call with `$obj->equals($x)`. Moreover, operator overloading unlocks new possibilities for algorithm design. For example, you could define complex mathematical operations on custom objects, enabling you to express algorithms more concisely and naturally. Imagine implementing vector addition, matrix multiplication, or symbolic computation directly in PHP. Instead of verbose method calls like `$vec1->add($vec2)` or `$matrix1->multiply($matrix2)`, you could use simple and intuitive syntax like `$vec1 + $vec2` or `$matrix1 * $matrix2`. This is particularly useful for domain-specific algorithms where overloading enhances readability and performance. Operator overloading isn't just about convenience. It opens the door to more expressive algorithms, better readability, and reduces boilerplate code, all while maintaining backward compatibility with existing PHP behavior. Cheers, Hammed. --000000000000dc4fb40621de09cd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, Sep 11, 2024 at 1:13=E2=80=AF= PM Mike Schinkel <mike@newclarity= .net> wrote:
Hi Rowan,

> On Sep 11, 2024, at 2:55 AM, Rowan Tommins [IMSoP] <imsop.php@rwec.co.uk> wro= te:
> Perhaps you're unaware that classes in core already can, and do, p= rovide operator overloading. GMP is the "poster child" for it, ov= erloading a bunch of mathematical operators, but the mechanism it uses to d= o so is reasonably straightforward and available to any extension.

I was making an (evidently) uninformed assuming that it was non-trivial to = add operator overloading at the C level. If it is easy, then my comments we= re moot.=C2=A0

That said, writing extensions in C and deploying them is non-trivial =E2=80= =94comparing to writing code in PHP=E2=80=94 so there is that. =C2=AF\_(=E3= =83=84)_/=C2=AF

> I've never liked that approach, because it means users can't w= rite polyfills, or even stub objects, that have these special behaviours. I= t feels weird for the language to define behaviour that isn't expressib= le in the language.

Understood. In _general_ I don't like it either, but I will use as an a= nalogy a prior discussion regarding __toArray, and I quote[1]:

"For the "convertible to array" case, I think __toArray, or = an interface specifying just that one method, would make more sense than co= mbining it with the existing interfaces. I'm sceptical of that concept,= though, because most objects could be converted to many different arrays i= n different circumstances, each of which should be given a different and de= scriptive name."

I am of course quoting you.=C2=A0 =C2=A0

Similarly, operators could mean different things, e.g. it is possible to ha= ve different meaning of equal, and even different meanings of plus. Or wors= e be applied in ways that are non-sensical to anybody but the developer who= implements them (that would be the same kind of developer who names their = variables after Game of Thrones characters.)=C2=A0

That is why I am not a fan of operator overloading, just as you were not a = fan of __toArray which to me is less problematic than overloaded operators = because it has such smaller scope and is actually quote useful for a common= set of use-cases regardless of the potential for confusion. But I digress.=

> It also risks conflicting with a future language feature that overlaps= , as happened with all native functions marked as accepting string automati= cally coercing nulls, but all userland ones rejecting it. Deprecating that = difference has caused a lot of friction.

That is a little different in that it was a behavior that occurred in both = core and userland whereas only allowing operator overloading in core would = mean there would be not userland differences that could conflict.

Whatever the case, if there are only two options: 1.) no operator overloadi= ng, and 2.) userland operator overloading I would far prefer the former.
> This is the tricky part for me: some of the things people want to do i= n extensions are explicitly the kinds of thing a shared host would not want= them to, such as interface to system libraries, perform manual memory mana= gement, interact with other processes on the host.
>
> If WASM can provide some kind of sandbox, while still allowing a good = portion of the features people actually want to write in extensions, I can = imagine that being useful. But how exactly that would work I have no idea, = so can't really comment further.

WebAssembly has a deny-by-default design so could be something to seriously= consider for extensibility in PHP. Implementations start with a full sandb= ox[2] and only add what they need to avoid those kinds of concerns.

Also, all memory manipulations sandboxed, though there are still potential = vulnerabilities within the sandbox so the project that incorporates WASM ne= eds to be careful.=C2=A0 WASM written in C/C++ can have memory issues just = like in regular C/C++, for example.=C2=A0 One option would be to allow only= AssemblyScript source for WASM. Another would be a config option that a we= b-host could set to only allow signed modules, but that admittedly would op= en another can of worms.=C2=A0 But the memory issues cannot leak out of the= module or affect other modules nor the system, if implemented with total m= emory constraints.

That said, web hosts can't stop PHP developers from creating infinite l= oops so the memory issues with WASM don't feel like too much bigger of = a concern given their sandboxed nature.=C2=A0 I've copied numerous othe= r links for reference: [4][5][6]


>>> The overall trend is to have only what's absolutely necess= ary in an extension.
>>
>> Not sure what you mean here.
>
> I mean, like Phalcon plans to, ship both a binary extension and a PHP = library, putting only certain essential functionality in the extension. It&= #39;s how MongoDB ships their PHP bindings, for instance - the extension pr= ovides low-level protocol support which is not intended for every day use; = the library is then free to evolve the user-facing parts more freely.

Gotcha.=C2=A0

I think that actually supports what I was saying; people would gravitate to= only doing in an extension what they cannot do in PHP itself, and over tim= e if PHP itself improves there is reason to migrate more code to PHP.=C2=A0=

But there can still be reasons to not allow some thing in userland. Some th= ings like __toArray.

-Mike

[1] https://www.mail-archive.com/= internals@lists.php.net/msg100001.html
[2] https://then= ewstack.io/how-webassembly-offers-secure-development-through-sandboxing/
[3]
https://radu-matei.com/blog/practical= -guide-to-wasm-memory/
[4] https://www.cs.cmu.edu/~= csd-phd-blog/2023/provably-safe-sandboxing-wasm/
[5] https://chatgpt.com/share/b890aede= -1c82-412a-89a9-deae99da506e
[6] https://www.assemblyscript.org/

=

Using WebAssembly (Wasm) for PHP doesn't make much = sense. PHP already runs on its own virtual machine server-side, so adding a= nother VM (Wasm) would just introduce unnecessary complexity and overhead. = Additionally, would this be the LLVM or Cranelift variant of Wasm?

F= or extensions, Wasm would perform even worse than current implementations, = no matter how it's integrated. Presently, I define zif_handler function= pointers that operate on the current execution frame and return value, tri= ggered when the engine detects an internal function (fbc). This approach is= as direct as it gets.

Suggesting AssemblyScript, especially in this= context, seems illogical. Have you actually worked with WebAssembly and co= nsidered performance implications, or is this based on theoretical knowledg= e?

Your point about operator overloading doesn= 't seem valid either. Consider the following:

```php
class X = {
=C2=A0 =C2=A0 public function plus(X $that) {}
=C2=A0 =C2=A0 public= function equals(X $that) {}
}
```

In this case, `plus` could = represent any behavior, as could `equals`. If I wanted to, I could implemen= t `plus` to perform what `equals` does and vice versa. Should we consider m= ethods broken just because their names can be arbitrary?

PHP already= distinguishes between comparison operators for objects:

```php
&= lt;?php
$obj1 =3D $obj2 =3D new stdclass;
assert($obj1 =3D=3D=3D $obj= 2); // compares object IDs
assert($obj1 =3D=3D $obj2); =C2=A0// compares= properties
$obj1 =3D new stdclass;
assert($obj1 !=3D=3D $obj2);
a= ssert($obj1 =3D=3D $obj2);
```

`=3D=3D=3D` compares object IDs, w= hile `=3D=3D` compares their properties. Beyond this, there's little re= ason to apply an operator to an object directly. Why would you need to call= `$user1 + $user2` or similar operations on an object? What scenario would = break by allowing operator overloads?

However, consider a case where= comparing just one property of an object (like a hash) is enough to determ= ine equality. Wouldn't it be great if, without changing any of the call= ing code, the engine compared `$this->hash =3D=3D=3D $that->hash` whe= n `$this =3D=3D $that` is invoked, instead of all properties? Without opera= tor overloading, I'd have to define an `equals` method and replace ever= y `$obj =3D=3D $x` call with `$obj->equals($x)`.

Moreover, operat= or overloading unlocks new possibilities for algorithm design. For example,= you could define complex mathematical operations on custom objects, enabli= ng you to express algorithms more concisely and naturally. Imagine implemen= ting vector addition, matrix multiplication, or symbolic computation direct= ly in PHP. Instead of verbose method calls like `$vec1->add($vec2)` or `= $matrix1->multiply($matrix2)`, you could use simple and intuitive syntax= like `$vec1 + $vec2` or `$matrix1 * $matrix2`. This is particularly useful= for domain-specific algorithms where overloading enhances readability and = performance.

Operator overloading isn't just about convenience. = It opens the door to more expressive algorithms, better readability, and re= duces boilerplate code, all while maintaining backward compatibility with e= xisting PHP behavior.

Cheers,
Hammed= .
--000000000000dc4fb40621de09cd--