A little syntactic sugar on array_* function calls?

4 years ago by Hans Henrik Bergan — view source

unread

fwiw this can be implemented in userland, and i bet someone already made a
composer package for it ^^

Hi,

I was wondering whether $array->map($somefunction) would be possible. I am
not a C programmer by any stretch but reading ZEND_VM_HOT_OBJ_HANDLER(112
it seems to me it should be quite easy (famous last words) to find out if
object is an array and if so then

prepend the string array_ before the method name

based on a small lookup table move the "object" to the right place --
either first argument or second.

do a function call instead of a method call.

Regarding #2 by default it's the first:

$array->flip() becomes array_flip($array)
$array->column($column_key, $index_key) becomes array_column($array,
$column_key, $index_key)
$array->merge($array2, $array3) becomes array_merge($array, $array2,
$array3)

There'd be a small list of methods/functions where it's the second, for
example:

$array->map($fn) becomes array_map($fn, $array)
$array->search($needle, $strict) becomes array_search($needle, $array,
$strict)
$array->key_exists($key) becomes array_key_exists($key, $array)

(Is there even any other?)

For phase 1 we could skip the functions which gets the first argument by
reference (walk, sort) and figure it out later. Hand waving yes but never
let perfect stand in the way of good enough :)

Look how nicely this reads:

$array->map($fn)->filter($fn2)

Compared to array_filter(array_map($fn, $array), $fn2)

I see no BC concerns here because in any previous PHP versions this is a
fatal error. I do not see any syntax ambiguity either but here I am
probably just naive. It could also be usable with user defined functions.

What do you think?

Karoly Negyesi

4 years ago by someniatko — view source

unread

I was wondering whether $array->map($somefunction) would be possible.

There is Pipe Operator RFC existing already, which most probably would
suit your needs. The code you want will look like this:
https://wiki.php.net/rfc/pipe-operator-v2

$array |> array_map($somefunction).

Best wishes,
someniatko

4 years ago by someniatko — view source

unread

Sorry, I made a typo here. In combination with the proposed Partial
Function Application RFC
(https://wiki.php.net/rfc/partial_function_application), which is
however is under active discussion, and changes will probably be made
to it, it would look roughly like this:

$array |> array_map($somefunction, ?);

Best wishes,
someniatko

4 years ago by Lynn — view source

unread

On Tue, May 25, 2021 at 11:31 AM Hans Henrik Bergan divinity76@gmail.com
wrote:

fwiw this can be implemented in userland, and i bet someone already made a
composer package for it ^^

Not everyone is interested in doing $array = new ArrayWrapper($originalArray) and then breaking all array parameters.
There are a bunch of different packages and people prefer different
packages on top of that. Some packages will have feature X and others will
have feature Y, and that makes it even harder to properly use. The downside
of having this in PHP will obviously be that it's much less flexible than a
userland implementation. I'd be very happy to see it in PHP while I won't
even bother looking for array wrappers in userland.

There is Pipe Operator RFC existing already, which most probably would
suit your needs. The code you want will look like this:
https://wiki.php.net/rfc/pipe-operator-v2
$array |> array_map($somefunction).

The pipe operator feels like a poor solution while -> would do exactly
what people want.

I was wondering whether $array->map($somefunction) would be possible.

There is Pipe Operator RFC existing already, which most probably would
suit your needs. The code you want will look like this:
https://wiki.php.net/rfc/pipe-operator-v2
$array |> array_map($somefunction).
Best wishes,
someniatko

--

To unsubscribe, visit: https://www.php.net/unsub.php

4 years ago by someniatko — view source

unread

The pipe operator feels like a poor solution while -> would do exactly what people want.

Could you elaborate? Adding method-like array access functions with
only few predefined functions, and only for arrays looks very limited
in scope, while the pipe operator would allow applying any existing
function, be it internal or userland one, to any type of variable, not
limited by arrays, having the same "fluent api" feel. Also it clearly
distincts between object method calls and function application, which
is a plus for clarity, IMO.

I am also not sure who the "people" you refer to are, because, well, I
am among the people using PHP daily, and I would personally prefer a
more generic solution, which the Pipe Operator currently is.

Best wishes,
someniatko

4 years ago by Karoly Negyesi — view source

unread

Thanks for your quick feedback everyone.

The pipe operator feels like a poor solution while -> would do exactly
what people want.

Could you elaborate? Adding method-like array access functions with
only few predefined functions, and only for arrays looks very limited

That is probably because it is very limited :) deliberately so.

The proposed syntax

$array |> array_map($fn1, ?) |> array_filter(?, $fn2)

When I compare to:

$array->map($fn1)->filter($fn2)

It's longer. Much longer.
It still requires knowing where the array goes. That's legacy which we
could sidestep with the arrow notation.
Admittedly, the pipe is much more powerful.

But why not both :) ?

Also, it would require accepting two RFCs although of course mine would
need to become an RFC too and longer term this is not a problem.

scalar_objects are wonderful. It's a space rocket compared to my
wheelbarrow. If it flies , mine is obviously moot. However, a wheelbarrow
is much cheaper and quicker to construct than a spaceship :D

Once again, what I propose here wants to be a simple, cheap to implement,
narrow quickfix. (Although users can add array_foobar($array, $arg1...) for
$array->foobar($arg1) as they need.)

Karoly Negyesi

4 years ago by someniatko — view source

unread

The proposed syntax

$array |> array_map($fn1, ?) |> array_filter(?, $fn2)

When I compare to:

$array->map($fn1)->filter($fn2)

It's longer. Much longer.

It still requires knowing where the array goes. That's legacy which we could sidestep with the arrow notation.

Admittedly, the pipe is much more powerful.

While the argument No. 2. is completely valid, the 1st one is not so.
If you remove whitespaces around the |> and also if you alias these
functions to map and filter respectively (or if, for instance,
some future RFC moves them into a special PHP\Array namespace, which
would probably never happen, but it's allowed to dream), it could look
like this:

$array|>map($fn1, ?)|>filter(?, $fn2);
$array->map($fn1)->filter($fn2);

A bit longer (due to 2.), but not that much, actually.

Best wishes,
someniatko

4 years ago by Hendra Gunawan — view source

unread

Hello.

$array|>map($fn1, ?)|>filter(?, $fn2);
$array->map($fn1)->filter($fn2);

Whitespace removal is not a solution for code length problems.
You might have a new problem if you do it. "|" is very similar
to the lowercase "L" and uppercase "i".

It's just an extra 3 characters (", ?" or "?, "). For most people,
this is not a problem at all. people tend to write "one statement per line"
rather than "multi statement line". I myself usually write no more than
3 statements per line if they are less than 120 characters.

The real problem is there is no consistency for "haystack vs needle"
position. There are RFCs to fix this (along with the naming convention
problem), but none of them are successful.

The pipe operator feels like a poor solution while "->" would do
exactly what people want.

Not so poor if we

use "~>" as pipe operator rather than "|>"
redesign the api under their proper namespace and strictly place
the "haystack" as the first function argument.

Regards,
Hendra Gunawan.

4 years ago by txigreman@hotmail.com — view source

unread

Hi all,

It sounds like scalar objects by Nikita:
https://github.com/https://github.com/nikic/scalar_objectsnikichttps://github.com/nikic/scalar_objects/scalar_objectshttps://github.com/nikic/scalar_objects

Regards,
Iván Arias.

Get Outlook for Androidhttps://aka.ms/AAb9ysg

4 years ago by Hamza Ahmad — view source

unread

Hello,
I read about this extension times ago but didn't know whether it had
been public. If Nikita is reading this, I request him to think of
proposing a modified version of this extension bundled with PHP.
In simple words, he can hide the function that registers a class that
serves as a prototype of a built-in type. And, also provide with
scalar methods for string, int, float and arrays. To get handler
registering functionality added to core, there should be a separate
RFC.
While I read this thread for the first time, I had following suggestions:

All array functions should be moved to its scaler object, and the
word "array_" should also be removed.
To maintain backward compatibility, all array_* functions will
become method aliases of scaler array.
ArrayObject will also exist for the compatibility purpose, and its
methods will also be added to the scaler array.
Thus, array() or [] will return scaler array object, and following
syntax would become valid:
[1,2,3,4,5,6,7,8,9] -> reverse();
array(1 => 'a', 2 => 'b') -> flip();
If it happens, users will automatically be stopped from passing
non-array values to array functions, and the error will be caught
earlier.
Regards

Hi all,

It sounds like scalar objects by Nikita:
https://github.com/https://github.com/nikic/scalar_objectsnikichttps://github.com/nikic/scalar_objects/scalar_objectshttps://github.com/nikic/scalar_objects

Regards,
Iván Arias.

Get Outlook for Androidhttps://aka.ms/AAb9ysg

From: Hendra Gunawan the.liquid.metal@gmail.com
Sent: Tuesday, May 25, 2021 10:58:46 PM
To: someniatko someniatko@gmail.com
Cc: Karoly Negyesi karoly@negyesi.net; Marco Pivetta ocramius@gmail.com;
Lynn kjarli@gmail.com; internals@lists.php.net internals@lists.php.net
Subject: Re: [PHP-DEV] A little syntactic sugar on array_* function calls?

Hello.
$array|>map($fn1, ?)|>filter(?, $fn2);
$array->map($fn1)->filter($fn2);
Whitespace removal is not a solution for code length problems.
You might have a new problem if you do it. "|" is very similar
to the lowercase "L" and uppercase "i".

It's just an extra 3 characters (", ?" or "?, "). For most people,
this is not a problem at all. people tend to write "one statement per line"
rather than "multi statement line". I myself usually write no more than
3 statements per line if they are less than 120 characters.

The real problem is there is no consistency for "haystack vs needle"
position. There are RFCs to fix this (along with the naming convention
problem), but none of them are successful.

The pipe operator feels like a poor solution while "->" would do
exactly what people want.

Not so poor if we

use "~>" as pipe operator rather than "|>"

redesign the api under their proper namespace and strictly place
the "haystack" as the first function argument.

Regards,
Hendra Gunawan.

--

To unsubscribe, visit: https://www.php.net/unsub.php

4 years ago by Hossein Baghayi — view source

unread

On Wed, 26 May 2021 at 10:14, Hamza Ahmad office.hamzaahmad@gmail.com
wrote:

Thus, array() or [] will return scaler array object,

Hello,
This doesn't seem trivial to me.
I mean, should array object be passed by value or by reference?
Arrays are passed by value by default so far, and objects are be-ref
internally.
If we are to have array object, will it be exceptional? Or should we change
its behaviour going forward?

To be clear, array() returns an array right now, which by default is passed
by value at the moment.
If it was supposed to be changed to an object, (array() to return an
object), should it be still passed by value? (an exceptional object) or we
should change its behaviour going forward and pass it by-ref?

Either way, it may have some quirks associated with it.

4 years ago by Mike Schinkel — view source

unread

Hi all,

It sounds like scalar objects by Nikita: https://github.com/nikic/scalar_objects

Yes, but Nikita wrote this note about technical limitations at the bottom of the repo README:

Due to technical limitations, it is not possible to create mutable APIs for primitive types. Modifying $self within the methods is not possible (or rather, will have no effect, as you'd just be changing a copy).

Does that mean that the scope of Nikita's proof-of-concept could not modify $self, or that it is simply not possible to modify $self given limitations inherent in PHP?

Further, does that only apply to scalars, or might possible arrays could be different?

-Mike

4 years ago by Marco Pivetta — view source

unread

Hi all,

It sounds like scalar objects by Nikita:
https://github.com/nikic/scalar_objects

Yes, but Nikita wrote this note about technical limitations at the bottom
of the repo README:

Due to technical limitations, it is not possible to create mutable APIs
for primitive types. Modifying $self within the methods is not possible (or
rather, will have no effect, as you'd just be changing a copy).

Sounds like a big advantage?

Marco Pivetta

http://twitter.com/Ocramius

http://ocramius.github.com/

4 years ago by Hamza Ahmad — view source

unread

should array object be passed by value or by reference?
If we are to have array object, will it be exceptional?

By value. Because array is a data type, we are talking about making it
behave like object. In JavaScript, Arrays, Strings, and Numbers are
objects; they have their respective properties and methods. Still,
when they are passed to a function or a method call, they are passed
by value, not by reference.

We should pass arrays by value because it will let a function or a
method modify it without changing the original array. If we make it a
regular object, it will be a bc break. So, whenever a callable
modifies an array, it will modify a variable out of its scope.

I am talking about attaching some methods and properties to array (or
largely, the string, int and float) type. In your manner, an
exceptional array object that is passed by value, Which will have
"key_first", "key_last", "keys", "values", "length", "type" (if PHP
later introduces typed arrays), and "is_list" as properties, and
"reverse", "flip", "map", "filter", "walk" and so on as methods. Such
methods will be performed on a value, not a variable. In other words:
[1,2,3,4,5,6,7,8,9,0]->print(); will work. This way, it does not
matter whether one modifies a variable or a value.

According to the implementation of Nikita's extension, there will be
functions attached to each method of a type. To remove this
limitation, what if array is an internal array object?

To make my previous statement regarding making array_* functions as
method aliases for array->* methods, I give the example of mysqli. It
has both ways of interaction, object-oriented and procedural. So, why
not this with arrays?

Regards

Hi all,

It sounds like scalar objects by Nikita:
https://github.com/nikic/scalar_objects

Yes, but Nikita wrote this note about technical limitations at the bottom of
the repo README:

Due to technical limitations, it is not possible to create mutable APIs for
primitive types. Modifying $self within the methods is not possible (or
rather, will have no effect, as you'd just be changing a copy).

Does that mean that the scope of Nikita's proof-of-concept could not modify
$self, or that it is simply not possible to modify $self given limitations
inherent in PHP?

Further, does that only apply to scalars, or might possible arrays could be
different?

-Mike

4 years ago by Hendra Gunawan — view source

unread

Hello.

Yes, but Nikita wrote this note about technical limitations at the bottom of the repo README:

Due to technical limitations, it is not possible to create mutable APIs for
primitive types. Modifying $self within the methods is not possible (or
rather, will have no effect, as you'd just be changing a copy).

If it is solved, this is a great accomplishment for PHP. But I think
scalar object is not going anywhere in the near future. If you are not
convinced, please take a look
https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181.

This makes me have a strong feeling about pipe operator greater than
before to solve object-style for scalar issue.

I hope that someone will take an initiative to fix the old
inconsistent and confusing API.

Pipe operator+new API is a better solution than no solution at all.

4 years ago by Mike Schinkel — view source

unread

Hello.

Yes, but Nikita wrote this note about technical limitations at the bottom of the repo README:

Due to technical limitations, it is not possible to create mutable APIs for
primitive types. Modifying $self within the methods is not possible (or
rather, will have no effect, as you'd just be changing a copy).

If it is solved, this is a great accomplishment for PHP. But I think
scalar object is not going anywhere in the near future. If you are not
convinced, please take a look
https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181.

Nikita's comment actually causes me more questions, not fewer.

Nikita says "We need to know that $a[$b][$c is an array in order to determine that the call should be performed by-reference. However, we already need to convert $a, $a[$b] and $a[$b][$c] into references before we know about that."

How then are we able to do the following?:

$a[$b][$c][] = 1;

How also can we do this:

byref($a[$b][$c]);
function byref(&$x) {
$x[]= 2;
}

See https://3v4l.org/aPvTD https://3v4l.org/aPvTD

I assume that in both my examples $a[$b][$c] would be considered an "lvalue"[1] and can be a target of assignment triggered by either the assignment operator or calling the function and passing to a by-ref parameter.

[1] https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values

So is there a reason that -> on an array could not trigger the same? Is Nikita saying that the performance of those calls performed by-reference would not matter because they are always being assigned, at least in the former case, but to do so with array expressions would be problematic? (Ignoring there is no code in the wild that currently uses the -> operator, or does that matter?)

I ask honestly to understand, and not as a rhetorical question.

Additionally, if the case of updating an array variable is not a problem but updating an array expression is a problem then why not just limit the -> operator to only work on expressions for immutable methods and require variables for mutable methods? I would think should be easy enough to throw an error for those specific "methods" that would be mutable, such as shift() and unshift() if $a[$b][$c]->shift('foo') were called?

Or maybe just completely limit using the -> operator on array variables. Don't work on any array expressions for consistency. There is already precedence in PHP for operators that work on variables and not on expressions: ++, --, and &.

IF we can get a thumbs up from Nikita that one of these would actually be possible then I think the next step should be to write up a list of proposed array methods that would be implemented to support the -> operator with arrays and put them in an RFC, and to flesh out any edge cases.

-Mike

4 years ago by Nikita Popov — view source

unread

On May 26, 2021, at 7:44 PM, Hendra Gunawan the.liquid.metal@gmail.com
wrote:

Hello.

Yes, but Nikita wrote this note about technical limitations at the
bottom of the repo README:

Due to technical limitations, it is not possible to create mutable APIs
for
primitive types. Modifying $self within the methods is not possible (or
rather, will have no effect, as you'd just be changing a copy).

If it is solved, this is a great accomplishment for PHP. But I think
scalar object is not going anywhere in the near future. If you are not
convinced, please take a look
https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181
.

Nikita's comment actually causes me more questions, not fewer.

Nikita says "We need to know that $a[$b][$c is an array in order to
determine that the call should be performed by-reference. However, we
already need to convert $a, $a[$b] and $a[$b][$c] into references before we
know about that."

How then are we able to do the following?:

$a[$b][$c][] = 1;

In this case, we're clearly performing a write operation on the array. If
you want to know the technical details, the compiler will convert this into
a sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is
for "write", which will perform all the necessary special handling, such as
copy-on-write separation and auto-vivification.

How also can we do this:

byref($a[$b][$c]);
function byref(&$x) {
$x[]= 2;
}

See https://3v4l.org/aPvTD https://3v4l.org/aPvTD

This is a more complex case. In this case the compiler doesn't know in
advance whether the argument is passed by value or by reference. What
happens here is:

INIT_FCALL determines that we're calling byref().
CHECK_FUNC_ARG for the first arg determines that this argument is passed
by-reference for this function.
FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R or
to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined.

I assume that in both my examples $a[$b][$c] would be considered an

"lvalue"[1] and can be a target of assignment triggered by either the
assignment operator or calling the function and passing to a by-ref
parameter.

[1]
https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values

So is there a reason that -> on an array could not trigger the same? Is
Nikita saying that the performance of those calls performed by-reference
would not matter because they are always being assigned, at least in the
former case, but to do so with array expressions would be problematic?
(Ignoring there is no code in the wild that currently uses the -> operator,
or does that matter?)

Note that the byref($a[$b][$c]) case only works because we know which
function is being called at the time the argument is passed. If you have
$a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) or
by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts the
argument by-value or by-reference. But we can only know that once we have
already evaluated $a[$b][$c] and found out that it is indeed an array.

The only way around this is to always perform a for-write fetch of
$a[$b][$c], even though we don't know that the end result is going to be an
array. However, doing so would pessimize the performance of code operating
on objects. Consider $some_huge_shared_array[0]->foo(). If we fetch
$some_huge_shared_array for write, we'll be required to perform a full
duplication of the array in preparation for a possible future write. If it
turns out that $some_huge_shared_array[0] is actually an object, or that
$some_huge_shared_array[0] is an array and the performed operation is
by-value, then we have performed this copy unnecessarily.

I don't believe this is acceptable.

I ask honestly to understand, and not as a rhetorical question.

Additionally, if the case of updating an array variable is not a problem
but updating an array expression is a problem then why not just limit the
-> operator to only work on expressions for immutable methods and require
variables for mutable methods? I would think should be easy enough to
throw an error for those specific "methods" that would be mutable, such as
shift() and unshift() if $a[$b][$c]->shift('foo') were called?

There are externalities associated even with the simple $x->foo() case,
though they are less severe. They primarily involve reduced ability to
analyze code in opcache.

In either case, this limitation does not seem reasonable to me from a
language design perspective. If $a->push($b) works, then $a[$k]->push($b)
can reasonably be expected to work as well.

Or maybe just completely limit using the -> operator on array variables.
Don't work on any array expressions for consistency. There is already
precedence in PHP for operators that work on variables and not on
expressions: ++, --, and &.

IF we can get a thumbs up from Nikita that one of these would actually be
possible then I think the next step should be to write up a list of
proposed array methods that would be implemented to support the -> operator
with arrays and put them in an RFC, and to flesh out any edge cases.

The only correct way to resolve this issue is to not support mutable
operations.

I don't think there's much need for mutable operations. sort() and
shuffle() would be best implemented by returning a new array instead.
array_push() is redundant with $array[]. array_shift() and array_unshift()
should never be used. array_pop() and array_splice() are the only sensible
mutable array methods that come to mind, and I daresay we can do without
them.

Regards,
Nikita

4 years ago by Mark Randall — view source

unread

This is a more complex case. In this case the compiler doesn't know in
advance whether the argument is passed by value or by reference. What
happens here is:

I'm trying to wrap my head around this, but if a function arg can handle
this, does something internal to the engine preclude fetching in write
context, after already fetching in read context, other than performance?

So can the initial fetch be performed with FETCH_DIM_R, handling the
object case + any other scalars, and if and only if the value is an
array and operating on what would traditionally be a by-ref, repeating
the previous lookup with FETCH_DIM_W?

Mark Randall

4 years ago by Mike Schinkel — view source

unread

Hi Nikita,

Thank you for taking the time to explain in detail.

One more question below.

-Mike

Hello.

Yes, but Nikita wrote this note about technical limitations at the bottom of the repo README:

Due to technical limitations, it is not possible to create mutable APIs for
primitive types. Modifying $self within the methods is not possible (or
rather, will have no effect, as you'd just be changing a copy).

If it is solved, this is a great accomplishment for PHP. But I think
scalar object is not going anywhere in the near future. If you are not
convinced, please take a look
https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181 https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181.

Nikita's comment actually causes me more questions, not fewer.

Nikita says "We need to know that $a[$b][$c is an array in order to determine that the call should be performed by-reference. However, we already need to convert $a, $a[$b] and $a[$b][$c] into references before we know about that."

How then are we able to do the following?:

$a[$b][$c][] = 1;

In this case, we're clearly performing a write operation on the array. If you want to know the technical details, the compiler will convert this into a sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is for "write", which will perform all the necessary special handling, such as copy-on-write separation and auto-vivification.

How also can we do this:

byref($a[$b][$c]);
function byref(&$x) {
$x[]= 2;
}

See https://3v4l.org/aPvTD https://3v4l.org/aPvTD <https://3v4l.org/aPvTD https://3v4l.org/aPvTD>

This is a more complex case. In this case the compiler doesn't know in advance whether the argument is passed by value or by reference. What happens here is:

INIT_FCALL determines that we're calling byref().

CHECK_FUNC_ARG for the first arg determines that this argument is passed by-reference for this function.

FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R or to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined.

I assume that in both my examples $a[$b][$c] would be considered an "lvalue"[1] and can be a target of assignment triggered by either the assignment operator or calling the function and passing to a by-ref parameter.

[1] https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values

So is there a reason that -> on an array could not trigger the same? Is Nikita saying that the performance of those calls performed by-reference would not matter because they are always being assigned, at least in the former case, but to do so with array expressions would be problematic? (Ignoring there is no code in the wild that currently uses the -> operator, or does that matter?)

Note that the byref($a[$b][$c]) case only works because we know which function is being called at the time the argument is passed. If you have $a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) or by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts the argument by-value or by-reference. But we can only know that once we have already evaluated $a[$b][$c] and found out that it is indeed an array.

The only way around this is to always perform a for-write fetch of $a[$b][$c], even though we don't know that the end result is going to be an array. However, doing so would pessimize the performance of code operating on objects. Consider $some_huge_shared_array[0]->foo(). If we fetch $some_huge_shared_array for write, we'll be required to perform a full duplication of the array in preparation for a possible future write. If it turns out that $some_huge_shared_array[0] is actually an object, or that $some_huge_shared_array[0] is an array and the performed operation is by-value, then we have performed this copy unnecessarily.

I don't believe this is acceptable.

I ask honestly to understand, and not as a rhetorical question.

Additionally, if the case of updating an array variable is not a problem but updating an array expression is a problem then why not just limit the -> operator to only work on expressions for immutable methods and require variables for mutable methods? I would think should be easy enough to throw an error for those specific "methods" that would be mutable, such as shift() and unshift() if $a[$b][$c]->shift('foo') were called?

There are externalities associated even with the simple $x->foo() case, though they are less severe. They primarily involve reduced ability to analyze code in opcache.

In either case, this limitation does not seem reasonable to me from a language design perspective. If $a->push($b) works, then $a[$k]->push($b) can reasonably be expected to work as well.

Or maybe just completely limit using the -> operator on array variables. Don't work on any array expressions for consistency. There is already precedence in PHP for operators that work on variables and not on expressions: ++, --, and &.

IF we can get a thumbs up from Nikita that one of these would actually be possible then I think the next step should be to write up a list of proposed array methods that would be implemented to support the -> operator with arrays and put them in an RFC, and to flesh out any edge cases.

The only correct way to resolve this issue is to not support mutable operations.

I don't think I agree that this is the only correct way, but I respect your position of authority on the matter.

I don't think there's much need for mutable operations. sort() and shuffle() would be best implemented by returning a new array instead. array_push() is redundant with $array[]. array_shift() and array_unshift() should never be used.

Why do you say array_shift() and array_unshift() should never be used? When I wrote the above questions the use-case I was thinking about most was $a->unshift($value) as I use array_unshift() more than most of the other array functions.

Do you mean that these if applied as "methods" to an array should not be use immutably — meaning in-place is bad but returning an array value that has been shifted would be okay — or do you have some other reason you believe that shifting an array is bad? Note the reason I have used them in the past is when I need to pass an array to a function written by someone else that expects the array to be ordered.

Also, what about very large arrays? I assume — which could be a bad assumption — that PHP internally can be more efficient about how it handles array_unshift() instead of just duplicating the large array so as to add an element at the beginning?

-Mike

4 years ago by Nikita Popov — view source

unread

Hi Nikita,

Thank you for taking the time to explain in detail.

One more question below.

-Mike

On May 26, 2021, at 7:44 PM, Hendra Gunawan the.liquid.metal@gmail.com
wrote:

Hello.

Yes, but Nikita wrote this note about technical limitations at the
bottom of the repo README:

Due to technical limitations, it is not possible to create mutable
APIs for
primitive types. Modifying $self within the methods is not possible (or
rather, will have no effect, as you'd just be changing a copy).

If it is solved, this is a great accomplishment for PHP. But I think
scalar object is not going anywhere in the near future. If you are not
convinced, please take a look

https://github.com/nikic/scalar_objects/issues/20#issuecomment-569520181.

Nikita's comment actually causes me more questions, not fewer.

Nikita says "We need to know that $a[$b][$c is an array in order to
determine that the call should be performed by-reference. However, we
already need to convert $a, $a[$b] and $a[$b][$c] into references before we
know about that."

How then are we able to do the following?:

$a[$b][$c][] = 1;

In this case, we're clearly performing a write operation on the array. If
you want to know the technical details, the compiler will convert this into
a sequence of FETCH_DIM_W ops followed by ASSIGN_DIM. The "W" bit here is
for "write", which will perform all the necessary special handling, such as
copy-on-write separation and auto-vivification.

How also can we do this:

byref($a[$b][$c]);
function byref(&$x) {
$x[]= 2;
}

See https://3v4l.org/aPvTD https://3v4l.org/aPvTD

This is a more complex case. In this case the compiler doesn't know in
advance whether the argument is passed by value or by reference. What
happens here is:

INIT_FCALL determines that we're calling byref().

CHECK_FUNC_ARG for the first arg determines that this argument is
passed by-reference for this function.

FETCH_DIM_FUNC_ARG on the array will be perform either an FETCH_DIM_R
or to FETCH_DIM_W operation, depending on what CHECK_FUNC_ARG determined.

I assume that in both my examples $a[$b][$c] would be considered an

"lvalue"[1] and can be a target of assignment triggered by either the
assignment operator or calling the function and passing to a by-ref
parameter.

[1]
https://en.wikipedia.org/wiki/Value_(computer_science)#Assignment:_l-values_and_r-values

So is there a reason that -> on an array could not trigger the same? Is
Nikita saying that the performance of those calls performed by-reference
would not matter because they are always being assigned, at least in the
former case, but to do so with array expressions would be problematic?
(Ignoring there is no code in the wild that currently uses the -> operator,
or does that matter?)

Note that the byref($a[$b][$c]) case only works because we know which
function is being called at the time the argument is passed. If you have
$a[$b][$c]->test() we need to pass $a[$b][$c] by reference (FETCH_DIM_W) or
by value (FETCH_DIM_R) depending on whether $a[$b][$c]->test() accepts the
argument by-value or by-reference. But we can only know that once we have
already evaluated $a[$b][$c] and found out that it is indeed an array.

The only way around this is to always perform a for-write fetch of
$a[$b][$c], even though we don't know that the end result is going to be an
array. However, doing so would pessimize the performance of code operating
on objects. Consider $some_huge_shared_array[0]->foo(). If we fetch
$some_huge_shared_array for write, we'll be required to perform a full
duplication of the array in preparation for a possible future write. If it
turns out that $some_huge_shared_array[0] is actually an object, or that
$some_huge_shared_array[0] is an array and the performed operation is
by-value, then we have performed this copy unnecessarily.

I don't believe this is acceptable.

I ask honestly to understand, and not as a rhetorical question.

Additionally, if the case of updating an array variable is not a problem
but updating an array expression is a problem then why not just limit the
-> operator to only work on expressions for immutable methods and require
variables for mutable methods? I would think should be easy enough to
throw an error for those specific "methods" that would be mutable, such as
shift() and unshift() if $a[$b][$c]->shift('foo') were called?

There are externalities associated even with the simple $x->foo() case,
though they are less severe. They primarily involve reduced ability to
analyze code in opcache.

In either case, this limitation does not seem reasonable to me from a
language design perspective. If $a->push($b) works, then $a[$k]->push($b)
can reasonably be expected to work as well.

Or maybe just completely limit using the -> operator on array variables.
Don't work on any array expressions for consistency. There is already
precedence in PHP for operators that work on variables and not on
expressions: ++, --, and &.

IF we can get a thumbs up from Nikita that one of these would actually be
possible then I think the next step should be to write up a list of
proposed array methods that would be implemented to support the -> operator
with arrays and put them in an RFC, and to flesh out any edge cases.

The only correct way to resolve this issue is to not support mutable
operations.

I don't think I agree that this is the only correct way, but I respect
your position of authority on the matter.

I don't think there's much need for mutable operations. sort() and
shuffle() would be best implemented by returning a new array instead.
array_push() is redundant with $array[]. array_shift() and array_unshift()
should never be used.

Why do you say array_shift() and array_unshift() should never be used?
When I wrote the above questions the use-case I was thinking about most was
$a->unshift($value) as I use array_unshift() more than most of the other
array functions.

Do you mean that these if applied as "methods" to an array should not be
use immutably — meaning in-place is bad but returning an array value that
has been shifted would be okay — or do you have some other reason you
believe that shifting an array is bad? Note the reason I have used them in
the past is when I need to pass an array to a function written by someone
else that expects the array to be ordered.

Also, what about very large arrays? I assume — which could be a bad
assumption — that PHP internally can be more efficient about how it handles
array_unshift() instead of just duplicating the large array so as to add an
element at the beginning?

Arrays only support efficient push/pop operations. Performing an
array_shift() or array_unshift() requires going through the whole array to
reindex all the keys, even though you're only adding/removing one element.
In other words, array_shift() and array_unshift() are O(n) operations, not
O(1) as one would intuitively expect. If you use shift/unshift as common
operations, you're better off using a different data-structure or
construction approach.

Regards,
Nikita

4 years ago by Hendra Gunawan — view source

unread

Hello.

The only correct way to resolve this issue is to not support mutable operations.

Correct me if I'm wrong: scalar object will hit memory limit earlier
than old API if it applied to $some_huge_shared_array for several
method calls.

I don't think there's much need for mutable operations. sort() and shuffle() would be best implemented by returning a new array instead. array_push() is redundant with $array[]. array_shift() and array_unshift() should never be used. array_pop() and array_splice() are the only sensible mutable array methods that come to mind, and I daresay we can do without them.

Suppose that we all agree with that. Will scalar object preserve
most of all functionality of the old API? Some functions are very
handy that keep us away from the gory detail implementation. In array
case, we know that PHP array is a combination of array and plain
object in JS term. There is a trend in user land PHP library that they
are just copying the JS array API and poorly preserving the existing
functionality of PHP old API. Implicitly, the author of this thread
wants this to happen.

If I am not wrong, scalar object date back to before PHP 7.0. Is there
any consideration why scalar object was not escalated to the next
phase, say to RFC?

Regards
Hendra Gunawan.

4 years ago by Aleksander Machniak — view source

unread

$array->map($fn1)->filter($fn2)

It's longer. Much longer.

It still requires knowing where the array goes. That's legacy which we
could sidestep with the arrow notation.

Admittedly, the pipe is much more powerful.

I agree. A unified object oriented interface to arrays (or strings for
that matter) is not a new topic on this list.

I guess you'd have to start with collecting all methods that would need
to be implemented. First stage could be all array_* functions, but I can
imagine others e.g. sorting functions to be included.

Once again, what I propose here wants to be a simple, cheap to implement,
narrow quickfix.

I'm afraid it's much more complicated than you think.

(Although users can add array_foobar($array, $arg1...) for
$array->foobar($arg1) as they need.)

I wouldn't go that far with this (possible BC break), maybe a future scope.

--
Aleksander Machniak
Kolab Groupware Developer [https://kolab.org]
Roundcube Webmail Developer [https://roundcube.net]

PGP: 19359DC1 # Blog: https://kolabian.wordpress.com

4 years ago by Marco Pivetta — view source

unread

Heyo,

Hi,

I was wondering whether $array->map($somefunction) would be possible. I am
not a C programmer by any stretch but reading ZEND_VM_HOT_OBJ_HANDLER(112
it seems to me it should be quite easy (famous last words) to find out if
object is an array and if so then

prepend the string array_ before the method name

based on a small lookup table move the "object" to the right place --
either first argument or second.

do a function call instead of a method call.

Regarding #2 by default it's the first:

$array->flip() becomes array_flip($array)
$array->column($column_key, $index_key) becomes array_column($array,
$column_key, $index_key)
$array->merge($array2, $array3) becomes array_merge($array, $array2,
$array3)

There'd be a small list of methods/functions where it's the second, for
example:

$array->map($fn) becomes array_map($fn, $array)
$array->search($needle, $strict) becomes array_search($needle, $array,
$strict)
$array->key_exists($key) becomes array_key_exists($key, $array)

(Is there even any other?)

For phase 1 we could skip the functions which gets the first argument by
reference (walk, sort) and figure it out later. Hand waving yes but never
let perfect stand in the way of good enough :)

Look how nicely this reads:

$array->map($fn)->filter($fn2)

Compared to array_filter(array_map($fn, $array), $fn2)

I see no BC concerns here because in any previous PHP versions this is a
fatal error. I do not see any syntax ambiguity either but here I am
probably just naive. It could also be usable with user defined functions.

What do you think?

Have you seen https://github.com/nikic/scalar_objects ?

Marco Pivetta

http://twitter.com/Ocramius

http://ocramius.github.com/

4 years ago by Sara Golemon — view source

unread

I was wondering whether $array->map($somefunction) would be possible. I am
not a C programmer by any stretch but reading ZEND_VM_HOT_OBJ_HANDLER(112
it seems to me it should be quite easy (famous last words) to find out if
object is an array and if so then

prepend the string array_ before the method name

based on a small lookup table move the "object" to the right place --
either first argument or second.

do a function call instead of a method call.

While I don't love the specifics of the proposal, I am 100% in favor of
allowing arrays to be used in an object-like fashion.

What I don't like about the specific proposal is that it's just a little
too magic in its function selection and argument mapping. There's also the
fact that it doesn't leave room to improve specifics about the
implementations of the methods. I'd much rather seen an Array class
defined with specific methods declared on it. In many cases these will be
simple trampolines to an existing function, but it gives us
self-documenting stubs and room to wiggle out of poor decisions from the
1990s. Such a class would not be instantiable or inheritable, it would
just exist as a lightweight ValueObject for performing the method
invocations (we can make it internally instantiable using tricks like not
calling a private constructor).

Then some hand-wavey details about maybe returning objects which have a
cast-to-array handler, mumble mumble, devil in the details... waving
hands...

-Sara

4 years ago by Mike Schinkel — view source

unread

What I don't like about the specific proposal is that it's just a little
too magic in its function selection and argument mapping. There's also the
fact that it doesn't leave room to improve specifics about the
implementations of the methods. I'd much rather seen an Array class
defined with specific methods declared on it.

Wouldn't an Array class necessarily result in array-incompatible pass-by-reference semantics, which is one of the same issues with userland using ArrayObject as an array replacement?

Hi all,

It sounds like scalar objects by Nikita: https://github.com/nikic/scalar_objects https://github.com/nikic/scalar_objects

Yes, but Nikita wrote this note about technical limitations at the bottom of the repo README:

Due to technical limitations, it is not possible to create mutable APIs for primitive types. Modifying $self within the methods is not possible (or rather, will have no effect, as you'd just be changing a copy).

Sounds like a big advantage?

Yes, it is a big advantage.

Except for when it is not.

-Mike

4 years ago by Sara Golemon — view source

unread

What I don't like about the specific proposal is that it's just a little
too magic in its function selection and argument mapping. There's also the
fact that it doesn't leave room to improve specifics about the
implementations of the methods. I'd much rather seen an Array class
defined with specific methods declared on it.

Wouldn't an Array class necessarily result in array-incompatible
pass-by-reference semantics, which is one of the same issues with userland
using ArrayObject as an array replacement?

It would if the Array objects got returned. I'm instead picturing an
instance that magically comes into being solely for the duration of the
method call. Once the method returns, the object vanishes.

-Sara

4 years ago by Larry Garfield — view source

unread

What I don't like about the specific proposal is that it's just a little
too magic in its function selection and argument mapping. There's also the
fact that it doesn't leave room to improve specifics about the
implementations of the methods. I'd much rather seen an Array class
defined with specific methods declared on it.

Wouldn't an Array class necessarily result in array-incompatible
pass-by-reference semantics, which is one of the same issues with userland
using ArrayObject as an array replacement?

It would if the Array objects got returned. I'm instead picturing an
instance that magically comes into being solely for the duration of the
method call. Once the method returns, the object vanishes.

-Sara

It sounds like you're describing something more akin to "extensions" in C#, or the way trait impls work in Rust, or the way methods get defined in Go.

(All of which would be quite neat, but I don't know how they'd play nicely in PHP.)

--Larry Garfield

A little syntactic sugar on array_* function calls?

-- Aleksander Machniak Kolab Groupware Developer [https://kolab.org] Roundcube Webmail Developer [https://roundcube.net]

--
Aleksander Machniak
Kolab Groupware Developer [https://kolab.org]
Roundcube Webmail Developer [https://roundcube.net]