TL;DR - Yeah, PHP, but what if C++? Feel free to tell me I'm wrong and
should feel bad. THIS IS ONLY IDLE MUSINGS.
I was reading the arbitrary string interpolation thread (which I have mixed
feelings on, but am generally okay with), and it got me thinking
about motivations for it and other ways we might address that.
I spend most of my time in C++ these days and that's going to show in this
proposal, and the answer is probably "PHP isn't C++" and that's fine, but I
want you to read to the end, because XSS is perennially on my mind and this
might be one extra tool, maybe.
PHP internal classes have the ability to handle operator overloads, and one
use for overloads I quite like from C++ is streaming interfaces. Imagine
the following:
// Don't get hung up on the name, we're a long way from bikeshedding yet.
$foo = (new \ostringstream) << "Your query returned " << $result->count()
<< " rows. The first row has ID: " >> $result->peekRow()['id'];
At each << operator, the RHS is "shifted" into the string builder, and the
object instance is returned. At the end $foo, is still that object, but
when it's echoed or cast to string it becomes the entire combined string.
As implementation details, we could keep the string as a list of segments
or materialize completely, that could also be optimized to not materialize
if we're in an output context since the intermediate complete string is
unnecessary. Don't worry about this for now though.
That by itself is... curious as an option, but not terribly interesting as
we DO have proper interpolation and it works just fine, right?
The reason I'm bothering to introduce this is that we could also build
contextual awareness into this. During instantiation we could identify the
context like:
$forOuput = new \ostringstream\html << "You entered: " <<
$_POST['textarea'];
$forURIs = new \stringstream\uri << BASE_URI << '?''
foreach ($_GET as $k => $v) {
$forURIs << $k '=' $v << '&';
}
These specializations could perform automatic sanitization during the
materialization phase, this could even be customizable:
$custom = new \ostringstream\user( landonize(...) );
We wouldn't be giving arbitrary operator overloading to the user, only
arbitrary sanitization.
Alternatively (or in addition), the point of materialization could be where
we make this decision:
echo $stream->html();
I'd build this in userspace, but of course we don't have operator
overloading, so the API would be a somewhat uglier function call:
$stream->append("This feels ")->append(FEELING::Sad);
Maybe the right answer is open the door on user-defined operator overloads,
but my flame retardant suit is in the shop and I don't really need to open
that mixed metaphor.
-Sara
TL;DR - Yeah, PHP, but what if C++? Feel free to tell me I'm wrong and
should feel bad. THIS IS ONLY IDLE MUSINGS.I was reading the arbitrary string interpolation thread (which I have mixed
feelings on, but am generally okay with), and it got me thinking
about motivations for it and other ways we might address that.I spend most of my time in C++ these days and that's going to show in this
proposal, and the answer is probably "PHP isn't C++" and that's fine, but I
want you to read to the end, because XSS is perennially on my mind and this
might be one extra tool, maybe.PHP internal classes have the ability to handle operator overloads, and one
use for overloads I quite like from C++ is streaming interfaces. Imagine
the following:// Don't get hung up on the name, we're a long way from bikeshedding yet.
$foo = (new \ostringstream) << "Your query returned " << $result->count()
<< " rows. The first row has ID: " >> $result->peekRow()['id'];At each << operator, the RHS is "shifted" into the string builder, and the
object instance is returned. At the end $foo, is still that object, but
when it's echoed or cast to string it becomes the entire combined string.
As implementation details, we could keep the string as a list of segments
or materialize completely, that could also be optimized to not materialize
if we're in an output context since the intermediate complete string is
unnecessary. Don't worry about this for now though.That by itself is... curious as an option, but not terribly interesting as
we DO have proper interpolation and it works just fine, right?The reason I'm bothering to introduce this is that we could also build
contextual awareness into this. During instantiation we could identify the
context like:$forOuput = new \ostringstream\html << "You entered: " <<
$_POST['textarea'];
$forURIs = new \stringstream\uri << BASE_URI << '?''
foreach ($_GET as $k => $v) {
$forURIs << $k '=' $v << '&';
}These specializations could perform automatic sanitization during the
materialization phase, this could even be customizable:$custom = new \ostringstream\user( landonize(...) );
We wouldn't be giving arbitrary operator overloading to the user, only
arbitrary sanitization.Alternatively (or in addition), the point of materialization could be where
we make this decision:echo $stream->html();
I'd build this in userspace, but of course we don't have operator
overloading, so the API would be a somewhat uglier function call:$stream->append("This feels ")->append(FEELING::Sad);
Maybe the right answer is open the door on user-defined operator overloads,
but my flame retardant suit is in the shop and I don't really need to open
that mixed metaphor.-Sara
What you're proposing here is:
- An overloadable operator on objects (via a new magic method or whatever) that takes one argument and returns another a new instance of the same class, with the argument included in it along with whatever the object's existing data/context is.
- Using that for string manipulation.
If you spell the operator >>=, then point 1 is adding a monadic bind. This has my full support.
Using it for string manipulation is fine, although there's a bazillion other things we can do with it that I would also very much support and can be done in user space. Whether or not it makes sense for some of these operations to be done in C instead is up for debate. Once an arbitrary object can have a socket, that plus monads can push most stream operations to user space.
Building a "stream wrapper" like thing, or a filter, then becomes some mix of object composition and binding.
$s = new StripTagsStream(new ZlibCompress(FileStream($fileName)) >>= $htmlString;
Which... feels kinda Java clunky. It would be better if we could chain out the wrapping levels. Which could potentially just be done in the implementation to allow a stream on the RHS to mean that.
public function __bind(Stream|string $s) {
if ($s instanceof Stream) {
return $s->wrapAround($this);
}
// Whatever this object does with a string.
}
$s = new FileStream($fileName) >>= new ZlibCompress() >>= new StripTagsStream() >>= $htmlString;
I'm... not sure which direction we'd want them to go in. Just spitballing. Some way to automate that pattern would likely be good.
But yeah, a native bind operator has my support. :-)
--Larry Garfield
I don't have much to say on that besides that I feel it's a great idea
and if that can be built with parametrized type streams (not limited to
strings only)
then I'd be even more thrilled with such functionality.
Thanks for this idea and I hope it get materialized soon.
Cheers,
Michał Marcin Brzuchalski
pon., 21 mar 2022 o 18:10 Larry Garfield larry@garfieldtech.com
napisał(a):
TL;DR - Yeah, PHP, but what if C++? Feel free to tell me I'm wrong and
should feel bad. THIS IS ONLY IDLE MUSINGS.I was reading the arbitrary string interpolation thread (which I have
mixed
feelings on, but am generally okay with), and it got me thinking
about motivations for it and other ways we might address that.I spend most of my time in C++ these days and that's going to show in
this
proposal, and the answer is probably "PHP isn't C++" and that's fine,
but I
want you to read to the end, because XSS is perennially on my mind and
this
might be one extra tool, maybe.PHP internal classes have the ability to handle operator overloads, and
one
use for overloads I quite like from C++ is streaming interfaces. Imagine
the following:// Don't get hung up on the name, we're a long way from bikeshedding yet.
$foo = (new \ostringstream) << "Your query returned " << $result->count()
<< " rows. The first row has ID: " >> $result->peekRow()['id'];At each << operator, the RHS is "shifted" into the string builder, and
the
object instance is returned. At the end $foo, is still that object, but
when it's echoed or cast to string it becomes the entire combined string.
As implementation details, we could keep the string as a list of segments
or materialize completely, that could also be optimized to not
materialize
if we're in an output context since the intermediate complete string is
unnecessary. Don't worry about this for now though.That by itself is... curious as an option, but not terribly interesting
as
we DO have proper interpolation and it works just fine, right?The reason I'm bothering to introduce this is that we could also build
contextual awareness into this. During instantiation we could identify
the
context like:$forOuput = new \ostringstream\html << "You entered: " <<
$_POST['textarea'];
$forURIs = new \stringstream\uri << BASE_URI << '?''
foreach ($_GET as $k => $v) {
$forURIs << $k '=' $v << '&';
}These specializations could perform automatic sanitization during the
materialization phase, this could even be customizable:$custom = new \ostringstream\user( landonize(...) );
We wouldn't be giving arbitrary operator overloading to the user, only
arbitrary sanitization.Alternatively (or in addition), the point of materialization could be
where
we make this decision:echo $stream->html();
I'd build this in userspace, but of course we don't have operator
overloading, so the API would be a somewhat uglier function call:$stream->append("This feels ")->append(FEELING::Sad);
Maybe the right answer is open the door on user-defined operator
overloads,
but my flame retardant suit is in the shop and I don't really need to
open
that mixed metaphor.-Sara
What you're proposing here is:
- An overloadable operator on objects (via a new magic method or
whatever) that takes one argument and returns another a new instance of the
same class, with the argument included in it along with whatever the
object's existing data/context is.- Using that for string manipulation.
If you spell the operator >>=, then point 1 is adding a monadic bind.
This has my full support.Using it for string manipulation is fine, although there's a bazillion
other things we can do with it that I would also very much support and can
be done in user space. Whether or not it makes sense for some of these
operations to be done in C instead is up for debate. Once an arbitrary
object can have a socket, that plus monads can push most stream operations
to user space.Building a "stream wrapper" like thing, or a filter, then becomes some mix
of object composition and binding.$s = new StripTagsStream(new ZlibCompress(FileStream($fileName)) >>=
$htmlString;Which... feels kinda Java clunky. It would be better if we could chain
out the wrapping levels. Which could potentially just be done in the
implementation to allow a stream on the RHS to mean that.public function __bind(Stream|string $s) {
if ($s instanceof Stream) {
return $s->wrapAround($this);
}
// Whatever this object does with a string.
}$s = new FileStream($fileName) >>= new ZlibCompress() >>= new
StripTagsStream() >>= $htmlString;I'm... not sure which direction we'd want them to go in. Just
spitballing. Some way to automate that pattern would likely be good.But yeah, a native bind operator has my support. :-)
--Larry Garfield
--
To unsubscribe, visit: https://www.php.net/unsub.php
On Tue, Mar 22, 2022 at 10:41 AM Michał Marcin Brzuchalski <
michal.brzuchalski@gmail.com> wrote:
I don't have much to say on that besides that I feel it's a great idea
and if that can be built with parametrized type streams (not limited to
strings only)
then I'd be even more thrilled with such functionality.Thanks for this idea and I hope it get materialized soon.
Cheers,
Michał Marcin Brzuchalskipon., 21 mar 2022 o 18:10 Larry Garfield larry@garfieldtech.com
napisał(a):TL;DR - Yeah, PHP, but what if C++? Feel free to tell me I'm wrong and
should feel bad. THIS IS ONLY IDLE MUSINGS.I was reading the arbitrary string interpolation thread (which I have
mixed
feelings on, but am generally okay with), and it got me thinking
about motivations for it and other ways we might address that.I spend most of my time in C++ these days and that's going to show in
this
proposal, and the answer is probably "PHP isn't C++" and that's fine,
but I
want you to read to the end, because XSS is perennially on my mind and
this
might be one extra tool, maybe.PHP internal classes have the ability to handle operator overloads, and
one
use for overloads I quite like from C++ is streaming interfaces.
Imagine
the following:// Don't get hung up on the name, we're a long way from bikeshedding
yet.
$foo = (new \ostringstream) << "Your query returned " <<
$result->count()
<< " rows. The first row has ID: " >> $result->peekRow()['id'];At each << operator, the RHS is "shifted" into the string builder, and
the
object instance is returned. At the end $foo, is still that object,
but
when it's echoed or cast to string it becomes the entire combined
string.
As implementation details, we could keep the string as a list of
segments
or materialize completely, that could also be optimized to not
materialize
if we're in an output context since the intermediate complete string is
unnecessary. Don't worry about this for now though.That by itself is... curious as an option, but not terribly interesting
as
we DO have proper interpolation and it works just fine, right?The reason I'm bothering to introduce this is that we could also build
contextual awareness into this. During instantiation we could identify
the
context like:$forOuput = new \ostringstream\html << "You entered: " <<
$_POST['textarea'];
$forURIs = new \stringstream\uri << BASE_URI << '?''
foreach ($_GET as $k => $v) {
$forURIs << $k '=' $v << '&';
}These specializations could perform automatic sanitization during the
materialization phase, this could even be customizable:$custom = new \ostringstream\user( landonize(...) );
We wouldn't be giving arbitrary operator overloading to the user, only
arbitrary sanitization.Alternatively (or in addition), the point of materialization could be
where
we make this decision:echo $stream->html();
I'd build this in userspace, but of course we don't have operator
overloading, so the API would be a somewhat uglier function call:$stream->append("This feels ")->append(FEELING::Sad);
Maybe the right answer is open the door on user-defined operator
overloads,
but my flame retardant suit is in the shop and I don't really need to
open
that mixed metaphor.-Sara
What you're proposing here is:
- An overloadable operator on objects (via a new magic method or
whatever) that takes one argument and returns another a new instance of
the
same class, with the argument included in it along with whatever the
object's existing data/context is.- Using that for string manipulation.
If you spell the operator >>=, then point 1 is adding a monadic bind.
This has my full support.Using it for string manipulation is fine, although there's a bazillion
other things we can do with it that I would also very much support and
can
be done in user space. Whether or not it makes sense for some of these
operations to be done in C instead is up for debate. Once an arbitrary
object can have a socket, that plus monads can push most stream
operations
to user space.Building a "stream wrapper" like thing, or a filter, then becomes some
mix
of object composition and binding.$s = new StripTagsStream(new ZlibCompress(FileStream($fileName)) >>=
$htmlString;Which... feels kinda Java clunky. It would be better if we could chain
out the wrapping levels. Which could potentially just be done in the
implementation to allow a stream on the RHS to mean that.public function __bind(Stream|string $s) {
if ($s instanceof Stream) {
return $s->wrapAround($this);
}
// Whatever this object does with a string.
}$s = new FileStream($fileName) >>= new ZlibCompress() >>= new
StripTagsStream() >>= $htmlString;I'm... not sure which direction we'd want them to go in. Just
spitballing. Some way to automate that pattern would likely be good.But yeah, a native bind operator has my support. :-)
--Larry Garfield
--
To unsubscribe, visit: https://www.php.net/unsub.php
But why can't we have generic operator overloading in which case this could
be completely built by libraries in userland?
On Tue, Mar 22, 2022 at 5:38 AM Robert Landers landers.robert@gmail.com
wrote:
But why can't we have generic operator overloading in which case this could
be completely built by libraries in userland?
I mean... honestly, I feel like I come back around to this very quickly.
Generic overloading gives us much more at the end of the day and allows the
people using PHP day in and day out to make the actual decisions about what
any of these APIs should look like.
So while I said I wanted to avoid the firestorm suggesting userspace
overloading would bring, maybe that's the question to ask:
Who's just a hard-nope on userspace operator overloading? If your reasons
go beyond foot-gun (and that is a valid reason), could you share what those
reasons are?
-Sara
Le 22/03/2022 à 16:14, Sara Golemon a écrit :
On Tue, Mar 22, 2022 at 5:38 AM Robert Landers landers.robert@gmail.com
wrote:But why can't we have generic operator overloading in which case this could
be completely built by libraries in userland?I mean... honestly, I feel like I come back around to this very quickly.
Generic overloading gives us much more at the end of the day and allows the
people using PHP day in and day out to make the actual decisions about what
any of these APIs should look like.So while I said I wanted to avoid the firestorm suggesting userspace
overloading would bring, maybe that's the question to ask:Who's just a hard-nope on userspace operator overloading? If your reasons
go beyond foot-gun (and that is a valid reason), could you share what those
reasons are?-Sara
Hello,
I am a not so hard-nope against userland operator overloading because
it's magic. My day job is 20% writing code, 30% speaking with clients,
and 50% reading and using community code. Userland operators are not as
explicit as verbose method calls; and you can't ctrl-space an operator
in any existing IDE, it's not obvious at what they does when you read
code using them in most cases, and beyond that, in order to use them,
you have to read an external documentation. Not that I don't read
external documentation, but 80% of time, reading directly interfaces in
my IDE is enough to know what to do with an API. When an API uses
methods for doing stuff, code is auto-documented, when people start
overriding operator, not so much (at least not as easy to find and
understand at first sight) especially if the code is segregated behind
interfaces and those interfaces don't explicit the operators (At least I
hope that, if userland operators land, that interfaces will have a way
to explicit the override).
And in the end, operators are just sugar for method calls. I don't
dislike writing $c = $a->plus($b) instead of $c = $a + $b, I even found
that there's some kind of elegance behind writing stuff explicitly.
There's no feature in the world that would be blocked by the lack of
operator overloading.
Moreover, the << >> c++ style stream operators are no more than a way to
implement the string builder pattern, if I understand it well. Then why
can't simply we use a StringBuilder class and use methods such as
append(), prepend() and all ? It's much more explicit and much less
alien-like for most people.
Regards,
--
Pierre
Le 22/03/2022 à 16:14, Sara Golemon a écrit :
Who's just a hard-nope on userspace operator overloading? If your
reasons
go beyond foot-gun (and that is a valid reason), could you share what
those
reasons are?I am a not so hard-nope against userland operator overloading because
it's magic. My day job is 20% writing code, 30% speaking with clients,
and 50% reading and using community code. Userland operators are not as
explicit as verbose method calls; and you can't ctrl-space an operator
in any existing IDE, it's not obvious at what they does when you read
code using them in most cases, and beyond that, in order to use them,
you have to read an external documentation.
Yup. I generally file this all under the 'foot-gun' objection, because at
the end of day if you've made an overload interface that makes your code
hard to read and reason about, then you have foot-gunned yourself. Not
dismissing it at all, because the struggle IS real.
And in the end, operators are just sugar for method calls. I don't
dislike writing $c = $a->plus($b) instead of $c = $a + $b, I even found
that there's some kind of elegance behind writing stuff explicitly.
There's no feature in the world that would be blocked by the lack of
operator overloading.
100% not going to argue with you on the accuracy of that statement. Of
course, something being sugar doesn't mean it's not sweet. ;)
-Sara
Am 22.03.2022 um 16:14 schrieb Sara Golemon pollita@php.net:
So while I said I wanted to avoid the firestorm suggesting userspace
overloading would bring, maybe that's the question to ask:Who's just a hard-nope on userspace operator overloading? If your reasons
go beyond foot-gun (and that is a valid reason), could you share what those
reasons are?
An obvious one could be complexity.
In the discussion about warning in conjunction with type juggling it was mentioned that this leads to increased complexity in the PHP core. While my knowledge of the engine is too superficial to really know I'd assume that generic operator overload could lead to quite some additional complexity and/or overhead.
But I'm sure other people know better than me what the real costs are.
Regards,
- Chris
On Tue, Mar 22, 2022 at 10:31 AM Christian Schneider cschneid@cschneid.com
wrote:
Am 22.03.2022 um 16:14 schrieb Sara Golemon pollita@php.net:
So while I said I wanted to avoid the firestorm suggesting userspace
overloading would bring, maybe that's the question to ask:Who's just a hard-nope on userspace operator overloading? If your
reasons
go beyond foot-gun (and that is a valid reason), could you share what
those
reasons are?An obvious one could be complexity.
In the discussion about warning in conjunction with type juggling it was
mentioned that this leads to increased complexity in the PHP core. While my
knowledge of the engine is too superficial to really know I'd assume that
generic operator overload could lead to quite some additional complexity
and/or overhead.
I'm not terribly worried about complexity since operator overloading
already exists in the engine. The only thing we don't have is a little
bit of glue to map these out to method calls.
This would be best served by actually writing up an implementation, which I
may do yet, but at the moment, I'm just gathering opinions.
-Sara
On Tue, Mar 22, 2022 at 10:31 AM Christian Schneider cschneid@cschneid.com
wrote:Am 22.03.2022 um 16:14 schrieb Sara Golemon pollita@php.net:
So while I said I wanted to avoid the firestorm suggesting userspace
overloading would bring, maybe that's the question to ask:Who's just a hard-nope on userspace operator overloading? If your
reasons
go beyond foot-gun (and that is a valid reason), could you share what
those
reasons are?An obvious one could be complexity.
In the discussion about warning in conjunction with type juggling it was
mentioned that this leads to increased complexity in the PHP core. While my
knowledge of the engine is too superficial to really know I'd assume that
generic operator overload could lead to quite some additional complexity
and/or overhead.I'm not terribly worried about complexity since operator overloading
already exists in the engine. The only thing we don't have is a little
bit of glue to map these out to method calls.This would be best served by actually writing up an implementation, which I
may do yet, but at the moment, I'm just gathering opinions.-Sara
Are we talking about arbitrary user-space operator overloading? Because a very specific, non-arbitrary, well-designed operator overloading RFC was just rejected. I don't know that anything has changed that would result in a different result for an even-broader (more foot-gunny) RFC.
Personally I'd settle for a bind operator, comparison operators, an some built-ins to use those for a new stream API bootstrap. That would already be huge, but the voters seem not favorable. :-(
Side note: I was asked elsewhere if bind was like pipe, since they look alike. Yes, bind is simply a "fancy pipe", essentially, for some contextually-relevant definition of "fancy". We should still add the non-fancy pipe as well while we're at it, but that's a separate RFC. :-)
--Larry Garfield
On Tue, Mar 22, 2022 at 7:21 PM Larry Garfield larry@garfieldtech.com
wrote:
Are we talking about arbitrary user-space operator overloading?
Because a very specific, non-arbitrary, well-designed operator
overloading RFC was just rejected.
Yup. This was pointed out to me in r11, I guess I must have missed that
vote when it happened. Not even hitting 50% is as telling as it needs to
be.
Personally I'd settle for a bind operator, comparison operators,
and some built-ins to use those for a new stream API bootstrap.
Yeah, streams need real love, and I hope someone with more investment in
the subject takes on the work.
-Sara
I'd build this in userspace, but of course we don't have operator
overloading, so the API would be a somewhat uglier function call:
It is currently possible to get operator overloads in userspace using FFI
(using my fork of lisachenko's z-engine to FFI into PHP calls). I made a
quick prototype implementation: https://github.com/iggyvolz/sstream (of
course any IDE or static analysis tool is going to yell at you for writing
object << string
but it works) . Whether to use << or <<= or >> or >>=
is up to debate but the general concept of "$x << $y calls $x->append($y)"
is there.