PHP True Async RFC

5 months ago by Rob Landers — view source

unread

Good day, everyone. I hope you're doing well.

I’d like to introduce a draft version of the RFC for the True Async component.

https://wiki.php.net/rfc/true_async

I believe this version is not perfect and requires analysis. And I strongly believe that things like this shouldn't be developed in isolation. So, if you think any important (or even minor) aspects have been overlooked, please bring them to attention.

The draft status also highlights the fact that it includes doubts about the implementation and criticism. The main global issue I see is the lack of "future experience" regarding how this API will be used—another reason to bring it up for public discussion.

Wishing you all a great day, and thank you for your feedback!

FYI: once you introduce a draft RFC for discussion, the RFC should change status to "under discussion" per (4): https://wiki.php.net/rfc/howto

— Rob

5 months ago by Edmond Dantes — view source

unread

FYI: once you introduce a draft RFC for discussion, the RFC should change
status to "under discussion" per (4):

It's done. Thank you.
Ed.

5 months ago by Rowan Tommins [IMSoP] — view source

unread

Good day, everyone. I hope you're doing well.

I’d like to introduce a draft version of the RFC for the True Async
component.

https://wiki.php.net/rfc/true_async

My reaction to this can be summed up as "this is huge!" By that I mean
multiple things...

First: PHP having native async support would be a huge step forward for
the language. It's really exciting to see how this proposal develops.

Second: it's clear you've put a huge amount of work into this, so a huge
thank you for that, and I hope it is rewarded.

Third: this is a huge proposal to digest. I wonder if there are ways it
can be split into smaller pieces, so that we don't overlook details in
one part because our focus is drawn to another. That might mean
releasing a partial implementation this year, and more features next
year; or it might just mean discussing and merging some core pieces
first, then immediately following up with a series of feature RFCs, all
targeting the same release.

Fourth: design decisions here will have a huge impact on the language
for years to come. We should spend plenty of time looking at experience
from elsewhere - other languages, and existing third-party async
implementations for PHP. This is closely related to the previous point,
since expanding the current RFC with comparisons for every decision
would make it impractically long.

Fifth: this is a huge amount of new code - GitHub says 24 thousand lines
of added code, although some of that is tests and documentation (which
is great to see included!) We need to make sure there are enough people
who understand the implementation to maintain that. Maybe we can try to
tempt some of the core contributors to existing third-party libraries to
spend some of their time on php-src instead.

I realise I haven't actually given any concrete feedback on the proposal

I don't have any experience with other async implementations, and
don't fully understand the concepts involved, so don't feel qualified to
comment on the high-level design questions. I might have opinions on
smaller design details (random example: RESOLVE, CANCEL, and TIMEOUT
should be cases on an enum, not int constants) but see point 4: there's
just too much here to discuss in that level of detail, and there are
top-level decisions which should be our focus first.

To re-iterate: this is really exciting, and thanks for getting it to
this stage!

--
Rowan Tommins
[IMSoP]

5 months ago by Edmond Dantes — view source

unread

First: PHP having native async support would be a huge step forward for
the language. It's really exciting to see how this proposal develops.

Thank you for the kind words, it was awesome to read.

I wonder if there are ways it can be split into smaller pieces, so that

we don't overlook details in one part because our focus is drawn to another.

I can suggest the following workflow:

Approval of the core concept: Changes affecting the language core.
2.

Decision on the low-level API: Async\wait + Resume + microtask. Should
it be exposed to PHP developers or not? (I don’t have a definitive answer).
This is a crucial point that impacts 30-40% of the code. If the decision is
made to hide this API, the code will need to be adjusted.

Next, the RFC can be split into two parts:

Low-level: Basic PHP primitive functions + C API
High-level: Future, await, Channel, and maybe Pool.

So the process would be:
3. Approval of the Low-level RFC
4. Approval of the High-level RFC. Step 4 depends on Step 3 in terms of
implementation but is almost independent in terms of semantics. This
means it can be discussed separately and more freely.

Additionally, the Low-level API can be released independently, allowing
PHP extensions to adopt concurrency earlier.

As for function names, I really hope for your support in this matter
because it's far from trivial.

Thanks, Ed.

5 months ago by Rob Landers — view source

unread

Good day, everyone. I hope you're doing well.

I’d like to introduce a draft version of the RFC for the True Async
component.

https://wiki.php.net/rfc/true_async

My reaction to this can be summed up as "this is huge!" By that I mean
multiple things...

First: PHP having native async support would be a huge step forward for
the language. It's really exciting to see how this proposal develops.

Second: it's clear you've put a huge amount of work into this, so a huge
thank you for that, and I hope it is rewarded.

Third: this is a huge proposal to digest. I wonder if there are ways it
can be split into smaller pieces, so that we don't overlook details in
one part because our focus is drawn to another. That might mean
releasing a partial implementation this year, and more features next
year; or it might just mean discussing and merging some core pieces
first, then immediately following up with a series of feature RFCs, all
targeting the same release.

Fourth: design decisions here will have a huge impact on the language
for years to come. We should spend plenty of time looking at experience
from elsewhere - other languages, and existing third-party async
implementations for PHP. This is closely related to the previous point,
since expanding the current RFC with comparisons for every decision
would make it impractically long.

Fifth: this is a huge amount of new code - GitHub says 24 thousand lines
of added code, although some of that is tests and documentation (which
is great to see included!) We need to make sure there are enough people
who understand the implementation to maintain that. Maybe we can try to
tempt some of the core contributors to existing third-party libraries to
spend some of their time on php-src instead.

I realise I haven't actually given any concrete feedback on the proposal

I don't have any experience with other async implementations, and
don't fully understand the concepts involved, so don't feel qualified to
comment on the high-level design questions. I might have opinions on
smaller design details (random example: RESOLVE, CANCEL, and TIMEOUT
should be cases on an enum, not int constants) but see point 4: there's
just too much here to discuss in that level of detail, and there are
top-level decisions which should be our focus first.

To re-iterate: this is really exciting, and thanks for getting it to
this stage!

--
Rowan Tommins
[IMSoP]

I second this, and as a long time user of amphp, go, and C#, I’d be coming into it with a specific mindset.

My only thing so far is that it appears the scheduler cannot be replaced; at least, easily. I don’t know if we would do so over on FrankenPHP, but it would be interesting to replace the scheduler with something that utilized go-routines for true multi-threading. Whether that works or not, is a whole different can of worms.

I’m compiling a deeper review, but that speaks more to the implementation than the spec.

— Rob

5 months ago by Edmond Dantes — view source

unread

but it would be interesting to replace the scheduler with something that
utilized go-routines for true multi-threading. Whether that works or not,
is a whole different can of worms.

— Rob

If the question is whether it is possible to interact with a PHP thread
from another thread by sending an event to the Reactor, the answer is yes,
it is possible. Moreover, from the PHP-land side, this could be a Channel.

If the question is deeper — replacing the Scheduler with a Scheduler in
another language or from a different ecosystem — then it is more likely
possible than not, considering that the module itself is separated from the
rest of the implementation.
If you know a situation where this would be useful, then why not.

For example, in cases of integration with a web server, we can just send a
message through a channel from "server-thread" to "php-thread", and in a
microtask written in C, for example, create Fibers to handle the request.
This approach is used in Swoole.

And this solution should be even slightly faster than in Swoole because the
interaction will occur through memory copying within a single process.

If memory copying is to be avoided, then the web server can be integrated
directly into the Reactor, making the web server itself run as a microtask.
Since the memory will be allocated immediately in the correct thread, there
won’t even be a need to copy it, which in some cases might provide a
performance boost. Or maybe not...

Ed.

5 months ago by Rob Landers — view source

unread

Good day, everyone. I hope you're doing well.

I’d like to introduce a draft version of the RFC for the True Async component.

https://wiki.php.net/rfc/true_async

I believe this version is not perfect and requires analysis. And I strongly believe that things like this shouldn't be developed in isolation. So, if you think any important (or even minor) aspects have been overlooked, please bring them to attention.

The draft status also highlights the fact that it includes doubts about the implementation and criticism. The main global issue I see is the lack of "future experience" regarding how this API will be used—another reason to bring it up for public discussion.

Wishing you all a great day, and thank you for your feedback!

Hey Edmond:

I find this feature quite exciting! I've got some feedback so far, though most of it is for clarification or potential optimizations:

A PHP developer SHOULD NOT make any assumptions about the order in which Fibers will be executed, as this order may change or be too complex to predict.

There should be a defined ordering (or at least, some guarantees). Being able to understand what things run in what order can help with understanding a complex system. Even if it is just a vague notion (user tasks are processed before events, or vice versa), it would still give developers more confidence in the code they write. You actually mention a bit of the order later (microtasks happen before fibers/events), so this sentence maybe doesn't make complete sense.

Personally, I feel as though an async task should run as though it were a function call until it hits a suspension. This is mostly an optimization though (C# does this), but it could potentially reduce overhead of queueing a function that may never suspend (which you mention as a potential problem much later on):

Async\run(function() {

$fiber = Async\async(function() {
sleep http://www.php.net/sleep(1); // this gets enqueued now
return "Fiber completed!";
});

// Execution is paused until the fiber completes
$result = Async\await($fiber); // immediately enter $fiber without queuing

echo $result . "\n";

echo "Done!\n";
});

Until it is activated, PHP code behaves as before: calls to blocking functions will block the execution thread and will not switch the Fiber context. Thus, code written without the Scheduler component will function exactly the same way, without side effects. This ensures backward compatibility.

I'm not sure I understand this. Won't php code behave exactly the same as it did before once enabling the scheduler? Will libraries written before this feature existed suddenly behave differently? Do we need to worry about the color of functions because it changes the behavior?

True Async prohibits initializing the Scheduler twice.

How will a library take advantage of this feature if it cannot be certain the scheduler is running or not? Do I need to write a library for async and another version for non-async? Or do all the async functions with this feature work without the scheduler running, or do they throw a catchable error?

This is crucial because the process may handle an OS signal that imposes a time limit on execution (for example, as Windows does).

Will this change the way os signals are handled then? Will it break compatibility if a library uses pcntl traps and I'm using true async traps too? Note there are several different ways (timeout) signals are handled in PHP -- so if (per-chance) the scheduler could always be running, maybe we can unify the way signals are handled in php.

Code that uses Resume cannot rely on when exactly the Fiber will resume execution.

What if it never resumes at all? Will it call a finally block if it is try/catched or will execution just be abandoned? Is there some way to ensure cleanup of resources? It should probably mention this case and how abandoning execution works.

If an exception is thrown inside a fiber and not handled, it will stop the Scheduler and be thrown at the point where Async\launchScheduler() is called.

The RFC doesn't mention the stack trace. Will it throw away any information about the inner exception?

The Graceful Shutdown mode can also be triggered using the function:

What will calling exit or die do?

A concurrent runtime allows handling requests using Fibers, where each Fiber can process its own request. In this case, storing request-associated data in global variables is no longer an option.

Why is this the case? Furthermore, if it inherits from the fiber that started its current fiber, won't using Resume/Notifier potentially cause problems when used manually? There are examples over the RFC using global variables in closures; so do these examples not actually work? Will sharing instances of objects in scope of the functions break things? For example:

Async\run($obj->method1(...));
Async\run($obj->method2(...));

This is technically sharing global variables (well, global to that scope -- global is just a scope after all) -- so what happens here? Would it make sense to delegate this fiber-local storage to user-land libraries instead?

Objects of the Future class are high-level patterns for handling deferred results.

By this point we have covered FiberHandle, Resume, and Contexts. Now we have Futures? Can we simplify this to just Futures? Why do we need all these different ways to handle execution?

A channel is a primitive for message exchange between Fibers.

Why is there an isEmpty and isNotEmpty function? Wouldn't !$channel->isEmpty() suffice?

It's also not clear what the value of most of these function is. For example:

if ($chan->isFull()) {
doSomething(); // suspends at some point inside? We may not know when we write the code.
// chan is no longer full, or maybe it is -- who knows, but the original assumption entering this branch is no longer true.
...
}

Whether a channel is full or not is not really important, and if you rely on that information, this is usually an architectural smell (at least in other languages). Same thing with empty or writable, or many others of these functions. You basically just write to a channel and eventually (or not, which is a bug and causes a deadlock) something will read it. The entire point is to use channels to decouple async code, but most of the functions here allow for code to become strongly coupled.

As for the single producer method, I am not sure why you would use this. I can see some upside for the built-in constraints (potentially in a dev-mode environment) but in a production system, single-producer bottlenecks are a real thing that can cause serious performance issues. This is usually something you explicitly want to avoid.

In addition to the send/receive methods, which suspend the execution of a Fiber, the channel also provides non-blocking methods: trySend, tryReceive, and auxiliary explicit blocking methods: waitUntilWritable and waitUntilReadable.

It isn't clear what happens when trySend fails. Is this an error or does nothing?

Thinking through it, there may be cases where trySend is valid, but more often than not, it is probably an antipattern. I cannot think of a valid reason for tryReceive and it's usage is most likely guaranteed to cause a deadlock in real code. For true multi-threaded applications, it makes more sense, but not for single-threaded concurrency like this.

In other words, the following code is likely to be more robust, and not depend on execution order (which we are told at the beginning not to do):

Async\run(function() {
$channel = new Async\Channel();

$reader = Async\async(*function*() *use*($channel) {
    while ($data = $channel->read() && $data !== NULL) {
        echo "receive: *$data**\n*";
    }
});

for ($i = 0; $i < 4; $i++) {
    echo "send: event data *$i**\n*";
    $data = $channel->send("event data *$i*");
}

$reader->cancel(); // clean up our reader
// or
$channel->close(); // will receive `NULL` I believe?

});

A trySend is still useful when you want to send a message but don't want to block if it is full. However, this is going to largely depend on how long is has been since the developer last suspended the current fiber, and nothing else -- thus it is probably an antipattern since it totally depends on the literal structure of the code, not the structure of the program -- if that makes sense.

This means that trapSignal is not intended for “regular code” and should not be used “anywhere”.

Can you expand on what this means in the RFC? Why expose it if it shouldn't be used?

I didn't go into the low level api details yet -- this email is already pretty long. But I would suggest maybe thinking about how to unify Notifiers/Resume/FiberHandle/Future into a single thing. These things are pretty similar to one another (from a developer's standpoint) -- a way to continue execution, and they all offer a slightly different api.

I also noticed that you seem to be relying heavily on the current implementation to define behavior. Ideally, the RFC should define behavior and the implementation implement that behavior as described in the RFC. In other words, the RFC is used as a reference point as to whether something is a bug or an enhancement in the future. There has been more than once where the list looks back at an old RFC to try and determine the intent for discovering if something is working as intended or a bug. RFCs are also used to write documentation, so the more detailed the RFC, the better the documentation will be for new users of PHP.

— Rob

5 months ago by Edmond Dantes — view source

unread

There should be a defined ordering (or at least, some guarantees).

The execution order, which is part of the contract, is as follows:

Microtasks are executed first.
Then I/O events and OS signals are processed.
Then timer events are executed.
Only after that are fibers scheduled for execution.

In the current implementation, fibers are stored in a queue without
priorities (this is not a random choice). During one cycle period, only
one fiber is taken from the queue.
This results in the following code (I've removed unnecessary details):

do {

execute_microtasks_handler();

has_handles =
execute_callbacks_handler(circular_buffer_is_not_empty(&ASYNC_G(deferred_resumes)));

execute_microtasks_handler();

bool was_executed = execute_next_fiber_handler();

if (UNEXPECTED(
false == has_handles
&& false == was_executed
&& zend_hash_num_elements(&ASYNC_G(fibers_state)) > 0
&& circular_buffer_is_empty(&ASYNC_G(deferred_resumes))
&& circular_buffer_is_empty(&ASYNC_G(microtasks))
&& resolve_deadlocks()
)) {
break;
}

} while (zend_hash_num_elements(&ASYNC_G(fibers_state)) > 0
|| circular_buffer_is_not_empty(&ASYNC_G(microtasks))
|| reactor_loop_alive_fn()
);

If we go into details, it is also noticeable that microtasks are executed
twice - before and after event processing - because an event handler might
enqueue a microtask, and the loop ensures that this code executes as early
as possible.

The contract for the execution order of microtasks and events is important
because it must be considered when developing event handlers. The
concurrent iterator relies on this rule.
However, making assumptions about when a fiber will be executed is not
part of the contract, if only because this algorithm can be changed at any
moment.

// Execution is paused until the fiber completes $result = Async\await(

$fiber); // immediately enter $fiber without queuing

So is it possible to change the execution order and optimize context
switches? Yes, there are ways to do this. However, it would require
modifying the Fiber code, possibly in a significant way (I haven't explored
this aspect in depth).

But… let's consider whether this would be a good idea.

We have a web server. A single thread is handling five requests. They all
compete with each other because this is a typical application interacting
with MySQL.
In each Fiber, you send a query and wait for the result as quickly as
possible.

In what case should we create a new coroutine within a request handler?
The answer: usually, we do this when we want to run something in the
background while continuing to process the request and return a response as
soon as possible.

In this paradigm, it is beneficial to execute coroutines in the order they
were enqueued.

For other scenarios, it might be a better approach for a child coroutine
to execute immediately. In that case, these scenarios should be considered,
and it may be worth introducing specific semantics for such cases.

Won't php code behave exactly the same as it did before once enabling the

scheduler?

Suppose we have a sleep() function. Normally, it calls php_sleep((unsigned
int)num).
The php_sleep function blocks the execution of the thread.
But we need to add an alternative path:

if (IN_ASYNC_CONTEXT) {
async_wait_timeout((unsigned int) num * 1000, NULL);
RETURN_LONG(0);
}

The IN_ASYNC_CONTEXT condition consists of two points:

The current execution context is inside a Fiber.
The Scheduler is active.

What’s the difference?

If the Scheduler is not active, calling sleep() will block the entire
Thread because, without an event loop, it simply cannot correctly handle
concurrency.
However, if the Scheduler is active, the code will set up handlers and
return control to the "main loop", which will pick the next Fiber from the
queue, and so on.

This means that without a Scheduler and Reactor, concurrent execution is
impossible (without additional effort).

From the perspective of a PHP developer, if they are working with
AMPHP/Swoole, nothing changes, because the code inside the if condition
will never execute in their case.

Does this change the execution order inside a Fiber? No.

If you had code working with RabbitMQ sockets, and you copied this code
into a Fiber, then enabled concurrency, it would work exactly the same
way. If the code used blocking sockets, the Fiber would yield control
to the Scheduler. And if two such Fibers are running, they will start
working with RabbitMQ sequentially. Of course, each Fiber should use a
different socket.

The same applies to CURL. Do you have an existing module that sends
requests to a service using CURL in a synchronous style? Just copy the
code into a coroutine.

This means almost 98% transparency. Why almost? Because there might be
nuances in helper functions and internal states. There may also be
differences
in OS state management or file system, which could affect the final
result.

How will a library take advantage of this feature if it cannot be certain the scheduler is
running or not? Do I need to write a library for async and another version for non-async?
Or do all the async functions with this feature work without the scheduler running, or do
they throw a catchable error?

This means that the launchScheduler() function should be called only
once during the entire lifecycle of the application. If an error occurs
and is not handled, the application should terminate. This is not a
technical limitation but rather a logical constraint.
If launchScheduler() were replaced with a CLI option, such as php
--enable-scheduler, where the Scheduler is implicitly activated, then it
would be like the *last line of code *it must exist only once.

Will this change the way os signals are handled then? Will it break
compatibility if a

library uses pcntl traps and I'm using true async traps too? Note there are several
different ways (timeout) signals are handled in PHP -- so if (per-chance) the scheduler
could always be running, maybe we can unify the way signals are handled in php.

Regarding this phrase in the RFC: it refers to the window close event
in Windows, which provides a few seconds before the process is forcibly
terminated.

There are signals intended for application termination, such as SIGBREAK
or CTRL-C, which should typically be handled in only one place in the
application. Developers are often tempted to insert signal handlers in
multiple locations, making the code dependent on the environment. But more
importantly, this should not happen at all.

True Async explicitly defines a Flow for emergency or unexpected
application termination. Attempting to disrupt this Flow by adding a
custom termination signal handler introduces ambiguity.

There should be only one termination handler. And at the end of its
execution, it must call gracefulShutdown.

As for pcntl, this will need to be tested.

What if it never resumes at all?

If a Fiber is never resumed, it means the application has completely
crashed with no way to recover :)

The RFC has two sections dedicated to this issue:
Cancellation Operation + Graceful Shutdown.

If the application terminates due to an unhandled exception, all Fibers
must be executed.
Any Fiber can be canceled at any time, and there is no need to
use explicit
Cancellation, which I personally find an inconvenient pattern.

The RFC doesn’t mention the stack trace. Will it throw away any information

about the inner exception?

This is literally "exception transfer". The stack trace will be exactly
the same as if the exception were thrown at the call site.
To be honest, I haven’t had enough time to thoroughly test this. Let's try
it:

<?php

Async\async(function() {
echo "async function 1\n";

Async\async(function() {
    echo "2\n";
    throw new Error("Error");
});

});

echo "start\n";
try {
Async\launchScheduler();
} catch (\Throwable $exception) {
print_r($exception);
}
echo "end\n";

?>

004+ Error Object
005+ (
006+ [message:protected] => Error
007+ [string:Error:private] =>
008+ [code:protected] => 0
009+ [file:protected] => async.php
010+ [line:protected] => 8
011+ [trace:Error:private] => Array
012+ (
013+ [0] => Array
014+ (
015+ [function] => {closure:{closure:async.php:3}:6}
016+ [args] => Array
017+ (
018+ )
019+ )
020+ [1] => Array
021+ (
022+ [file] => async.php
023+ [line] => 14
024+ [function] => Async\launchScheduler
025+ [args] => Array
026+ (
027+ )
028+ )
029+ )
030+ [previous:Error:private] =>
031+ )

Seems perfectly correct.

What will calling exit or die do?

I completely forgot about them! Well, of course, Swoole override them. This
needs to be added to the TODO.

Why is this the case?

For example, consider a long-running application where a service is a
class that remains in memory continuously. The web server receives an
HTTP request and starts a Fiber for each request. Each request has its
own User Session ID.

You want to call a service function, but you don’t want to pass the Session
ID every time, because there are also 5-10 other request-related
variables. However, you cannot simply store the Session ID in a class
property, because context switching is unpredictable. At one moment,
you're handling Request #1, and a second later, you're already
processing Request
#2.

When a Fiber creates another Fiber, it copies a reference to the context
object, which has minimal performance impact while maintaining
execution environment
consistency.

*Closure variables work as expected *they are pure closures with no
modifications.
I didn’t mean that True Async breaks anything at the language level. The
issue is logical:
You cannot use a global variable in two Fibers, modify it, read it,
and expect its state to remain consistent.

By this point we have covered FiberHandle, Resume, and Contexts. Now we

have Futures? Can we simplify this to just Futures? Why do we need all
these different ways to handle execution?

Futures and Notifiers are two different patterns.

A Future changes its state only once.
A Notifier generates one or more events.
Internally, Future uses Notifier.

In the RFC, I mention that these are essentially two APIs:

High-level API
Low-level API

One of the open questions is whether both APIs should remain in PHP-land.

The low-level API allows for close interaction with the event loop,
which might be useful if someone wants to write a service in PHP that
requires this level of control.

Additionally, this API helps minimize Fiber context switches, since its
callbacks execute without switching.
This is both an advantage and a disadvantage.

It's also not clear what the value of most of these function is. For example:

Your comment made me think, especially in the context of anti-patterns.
And I agree that it's better to remove unnecessary methods than to let
programmers shoot themselves in the foot.

As for the single producer method, I am not sure why you would use this.

Yes, in other languages there are no explicit restrictions. If the single
producer approach is indeed rarely used, then it's not such an important
feature to include. However, I lack certainty on whether it's truly a rare
case. On the other hand, these functions are inexpensive to implement and
do not affect performance. Moreover, they have another drawback: they
increase the number of behavioral variants in a single class, which seems a
more significant disadvantage than the frequency of use.

It isn't clear what happens when trySend fails. Is this an error or
does nothing?

Yes, this is a documentation oversight. I'll add it to the TODO.

Thinking through it, there may be cases where trySend is valid,

Code using tryReceive could be useful in cases where a channel is used to
implement a pool. Suppose you need to retrieve an object from the pool, but
if it's not available, you’d prefer to do something else (like throw an
exception) rather than block the fiber.
Overall, though, you’re right — it’s an antipattern. It’s better to
implement the pool as an explicit class and reserve channels for their
classic use.

Can you expand on what this means in the RFC? Why expose it if it shouldn't

be used?

I answered a similar question above.

I also noticed that you seem to be relying heavily on the current
implementation to define

behavior.

I love an iterative approach: prototype => RFC => prototype => RFC.

Thank you for the excellent remarks and analysis!

Ed.

5 months ago by Daniil Gentili — view source

unread

Hi,

Any Fiber can be canceled at any time, and there is no need to use explicit Cancellation, which I personally find an inconvenient pattern.

As a heavy use of both amphp and go, cancellations (contexts in go) are absolutely needed, as a fiber may spawn further background fibers in order to execute some operation, just cancelling that specific fiber will not cancel the spawned fibers, unless a bunch of boilerplate try-catch blocks are added to propagate CancellationExceptions.

A nicer API should use only explicit cancellation objects, as this pattern of preemptive implicit cancellations (i.e. a fiber may be cancelled at any point via cancel()) is super dangerous IMO, as it can lead to all sorts of nasty behaviour: what if we cancel execution of a fiber in the middle of a critical section (i.e. between a lock() and an unlock() of a file or a database? What if unlocking() in the catch (CancelledException) block requires spawning a new fiber as part of the interaction with the database?).
Consider also the huge amount of CancelledException blocks that would have to be added to handle state cleanup in case of premature implicit cancellations, as opposed to explicit cancellations that only throw when we ask them to: there’s a reason why golang, amphp & others use explicit cancellations.

Another thing I’m not happy with is how unless the scheduler is launched, all code executes in blocking mode: this seems like a super bad idea, as it will hold back the ecosystem again, and create a split in the project similar to JIT (i.e. a separate “execution mode” with its own bugs, that get fixed slowly because few people are using it, and few people are using it because of its bugs).
The main reason given in the RFC (Code written without using the Scheduler should not experience any side effects) makes no sense, because legacy code not spawning fibers will not experience concurrency side effects anyway, regardless of whether the scheduler is started or not.

A thing I would love to see, on the other hand, is for Context to become a “provider” for superglobals such as $_REQUEST, $_POST, $_GET, and all globals in general (and perhaps all other global state such as static properties): this would allow to very easily to turn i.e. php-fpm into a fully asynchronous application server, where each request is started in the same thread (or in N threads in an M-N M>N execution model) but its global state is entirely isolated between fibers.

Regards,
Daniil Gentili - Senior software engineer

Portfolio: https://daniil.it https://daniil.it/
Telegram: https://t.me/danogentili

5 months ago by Edmond Dantes — view source

unread

As a heavy use of both amphp and go, cancellations (contexts in go) are
absolutely needed, as a fiber may spawn further background fibers in order
to execute some operation, just cancelling that specific fiber will not
cancel the spawned fibers, unless a bunch of boilerplate try-catch blocks
are added to propagate CancellationExceptions.

I didn't mean that Cancellation isn't needed at all. I meant that canceling
a Fiber is sufficient in most scenarios and leads to clean, understandable
code.

Other languages have child coroutines (Swoole supports them too), but I'm
not sure if that's the right approach.
I like context.WithCancel from Go, but it can essentially be implemented
directly in PHP land since all the necessary tools are available.

A nicer API should use only explicit cancellation objects, as this pattern

of preemptive implicit cancellations

The exception mechanism is the standard way to alter the execution flow in
PHP. If a programmer writes code with lock and unlock outside of a
try-finally block but calls functions between these methods, they are
potentially creating a bad solution—at the very least because someone else
might later introduce an exception in one of those functions. This is a
classic case for languages with exceptions.

So far, I haven't found a better way to ensure the logical consistency and
integrity of the execution flow. Maybe someone has a suggestion?

The main reason given in the RFC

The main reason is that PHP has been around for many years and didn’t just
appear yesterday.

If you have an idea on how to start the Scheduler implicitly, let's
implement it. So far, I have a few ideas:

Using an option in php.ini (downside: if PHP is used for multiple
projects).
Using a CLI option – so far, I like this the most.

A thing I would love to see, on the other hand, is for Context to become a

“provider”

It's hard for me to evaluate this idea. Intuitively, it doesn't seem ideal.
In general, I'm not very fond of $_GET/$_POST. But on the other hand, why
not? This needs some consideration.

allow to very easily to turn i.e. php-fpm into a fully asynchronous
application server,

where each request is started in the same thread (or in N threads in an M-N M>N
execution model) but its global state is entirely isolated between fibers.

I haven’t thought about this possibility. But wouldn’t this break the FCGI
contract?

Thanks! Ed.

5 months ago by Daniil Gentili — view source

unread

As a heavy use of both amphp and go, cancellations (contexts in go) are absolutely needed, as a fiber may spawn further background fibers in order to execute some operation, just cancelling that specific fiber will not cancel the spawned fibers, unless a bunch of boilerplate try-catch blocks are added to propagate CancellationExceptions.

I didn't mean that Cancellation isn't needed at all. I meant that canceling a Fiber is sufficient in most scenarios and leads to clean, understandable code.

Other languages have child coroutines (Swoole supports them too), but I'm not sure if that's the right approach.

I like context.WithCancel from Go, but it can essentially be implemented directly in PHP land since all the necessary tools are available.

Note, this is precisely the problem, implement cancellation propagation to child fibers in userland PHP requires writing a bunch of boilerplate try-catch blocks to propagate CancellationExceptions to child FutureHandle::cancel()s (spawning multiple fibers to execute subtasks concurrently during an async method call is pretty common, and the current implicit cancellation mode requires writing a bunch of try-catch blocks to propagate cancellation, instead of just passing a cancellation object, or a flag to inherit the cancellation of the current fiber when spawning a new one).

A nicer API should use only explicit cancellation objects, as this pattern of preemptive implicit cancellations

The exception mechanism is the standard way to alter the execution flow in PHP. If a programmer writes code with lock and unlock outside of a try-finally block but calls functions between these methods, they are potentially creating a bad solution—at the very least because someone else might later introduce an exception in one of those functions. This is a classic case for languages with exceptions.

Note the explicit use case I listed is that of an unlock() in a finally block that requires spawning a new fiber in order to execute the actual unlock() RPC call: this is explicitly in contrast with the RFC, which specifies that

ATTENTION: A programmer must never attempt to create a new fiber while handling a CancellationException, as this behavior may trigger an exception during Graceful Shutdown mode.

While this is somewhat understandable in the context of graceful shutdown, it still means that unlocking in a finally block (the only way of properly handling cancellations with the current model) isn’t always possible..

So far, I haven't found a better way to ensure the logical consistency and integrity of the execution flow. Maybe someone has a suggestion?

The main reason given in the RFC

The main reason is that PHP has been around for many years and didn’t just appear yesterday.

If you have an idea on how to start the Scheduler implicitly, let's implement it. So far, I have a few ideas:

Using an option in php.ini (downside: if PHP is used for multiple projects).
Using a CLI option – so far, I like this the most.
I would really prefer it to be always enabled, no fallback at all, because as I said, it will make absolutely no difference to legacy, non-async projects that do not use fibers, but it will avoid a split ecosystem scenario.

A thing I would love to see, on the other hand, is for Context to become a
“provider”

It's hard for me to evaluate this idea. Intuitively, it doesn't seem ideal. In general, I'm not very fond of $_GET/$_POST. But on the other hand, why not? This needs some consideration.

allow to very easily to turn i.e. php-fpm into a fully asynchronous application server,
where each request is started in the same thread (or in N threads in an M-N M>N
execution model) but its global state is entirely isolated between fibers.
I haven’t thought about this possibility. But wouldn’t this break the FCGI contract?

I see no reason why it should break the contract, if implemented by isolating the global state of each fiber, it can be treated as a mere implementation detail of the (eventually new) SAPI.

Regards,
Daniil Gentili

—

Daniil Gentili - Senior software engineer

Portfolio: https://daniil.it
https://daniil.it/Telegram: https://t.me/danogentili

5 months ago by Edmond Dantes — view source

unread

I like context.WithCancel from Go, but it can essentially be implemented directly in PHP land since all the necessary tools are available.

Note, this is precisely the problem, implement cancellation propagation to child fibers in userland PHP requires writing a bunch of boilerplate try-catch blocks to propagate CancellationExceptions to child FutureHandle::cancel()s (spawning multiple fibers to execute subtasks concurrently during an async method call is pretty common, and the current implicit cancellation mode requires writing a bunch of try-catch blocks to propagate cancellation, instead of just passing a cancellation object, or a flag to inherit the cancellation of the current fiber when spawning a new one).

Catching CancellationException is only necessary if there is some defer code.
If there isn't, then there's no need to catch it. Try-catch blocks are not mandatory.

We can create a Cancellation object, pass it via use or as a parameter to all child fibers, and check it in await(). This is the most explicit approach. In this case, try-catch would only be needed if we want to clean up some resources. Otherwise, we can omit it.

According to the RFC, if a fiber does not catch CancellationException, it will be handled by the Scheduler. Therefore, catching this exception is not strictly necessary.

If this solution also seems too verbose, there is another one that can be implemented without modifying this RFC. For example, implementing a cancellation operation for a Context. All coroutines associated with this context would be canceled. From an implementation perspective, this is essentially iterating over all coroutines and checking which context they belong to.

Note the explicit use case I listed is that of an unlock() in a finally block that requires spawning a new fiber in order to execute the actual unlock() RPC call: this is explicitly in contrast with the RFC, which specifies that

So, if I understand correctly, the code in question looks like this:

try { lock(); ... } finally { unlock(); }

function unlock() {
async\run();
}

If I got it right, then the following happens:

The code inside try {} allocates resources.
The code inside finally {} also allocates resources.

So, what do we get? We're trying to terminate the execution of a fiber, and instead, it creates a new one. It seems like there's a logical error here.

Instead of creating a new fiber, it would be better to use microtasks.

I would really prefer it to be always enabled, no fallback at all, because as I said, it will make absolutely no difference to legacy, non-async projects that do not use fibers, but it will avoid a split ecosystem scenario.

I'm not arguing at all that avoiding the call to this function is a good solution. I’m on your side. The only question is how to achieve this technically.

Could you describe an example of "ecosystem split" in the context of this function? What exactly is the danger?

I see no reason why it should break the contract, if implemented by isolating the global state of each fiber, it can be treated as a mere implementation detail of the (eventually new) SAPI.

So, I can take NGINX and FCGI, and without changing the FCGI interface itself, but modifying its internal implementation, get a working application. Yes, but... that means all global variables, including static ones, need to be tied to the context. It's not that it can't be done, but what about memory consumption.

I'm afraid that if the code wasn't designed for a LongRunning APP, it's unlikely to handle this task correctly.

--
Ed.

5 months ago by Edmond Dantes — view source

unread

Lock/Unlock issue

It seems that this is actually about a database query that puts the Fiber
into a waiting state specifically, query("UNLOCK").

In that case, everything should work correctly.

Although there are some dangerous edge cases. The database might be under
high load, causing the query("UNLOCK") request to wait for too long,
leading to a timeout. This would trigger another exception, which could
then be interpreted as a complete failure.

Putting a Fiber into a waiting state inside a finally block does not
contradict the shutdown mode. However, the programmer must be careful
inside finally section because if a second exception occurs, it means the
code cannot properly complete execution.

--
Ed.

5 months ago by Nicolas Grekas — view source

unread

Hi Edmond,

Thanks for sharing the huge amount of work that went into this!

I would really prefer it to be always enabled, no fallback at all, because

as I said, it will make absolutely no difference to legacy, non-async
projects that do not use fibers, but it will avoid a split ecosystem
scenario.

I'm not arguing at all that avoiding the call to this function is a good
solution. I’m on your side. The only question is how to achieve this
technically.

Could you describe an example of "ecosystem split" in the context of
this function? What exactly is the danger?

Not sure it's an answer to this question but in Symfony's HttpClient, we
have an amphp-based implementation that's working both outside and inside
an event loop:

inside means amphp's scheduler already started, and then each request is
scheduled thanks to amphp's http client
outside means Symfony's code is going to trigger amphp's event loop
internally.

The target DX is that when outside any event loop, we're still able to
leverage fibers to provide concurrency, for requests only, and when inside
an event loop, requests run concurrently to any other things that the loop
monitors.

Is that something that could be achieved with your proposal?
If not, maybe that's the split we're wondering about?

Nicolas

5 months ago by Edmond Dantes — view source

unread

Hi, Nicolas.

Hi Edmond,

The target DX is that when outside any event loop, we're still able to
leverage fibers to provide concurrency, for requests only, and when inside
an event loop, requests run concurrently to any other things that the loop
monitors.

Is that something that could be achieved with your proposal?
If not, maybe that's the split we're wondering about?

This RFC leads to PHP operating in two modes:

Blocking mode: The Event Loop needs to be implemented manually, AMPHP
works. This is how PHP currently operates.
Concurrent mode: Code runs in coroutines. The Event Loop works under the
hood. AMPHP does not work.

If we try to imagine a way to keep PHP in a single mode, it would likely
require implementing coroutines separately from Fiber and leaving Fiber as
legacy.

This solution has both advantages and disadvantages.

Advantages:

Switching can be optimized considering the new architecture.
The Event Loop will start automatically when needed.
Code using Fiber will work as before, and most likely, AMPHP will be able
to create an event loop in user-land.

Disadvantages:

More work is required.
There is a risk of ending up with a Frankenstein-like result. :)

A relative advantage of the current implementation is that it changes only
about 100-500 lines in the PHP core (probably even fewer, since part of the
changes are in extensions like CURL and Socket).

The downside is that it cannot change the rules that were previously
established.

--
Ed.

5 months ago by Edmond Dantes — view source

unread

Note the explicit use case I listed is that of an unlock() in a finally
block that requires spawning a new fiber in order to execute the actual
unlock() RPC call: this is explicitly in contrast with the RFC, which
specifies that

ATTENTION: A programmer must never attempt to create a new fiber
while handling a CancellationException, as this behavior may trigger an
exception during Graceful Shutdown mode.

I think you are right. This restriction increases complexity without
providing significant benefits. I will remove this condition from the RFC
entirely and simply state that the programmer should handle such situations
carefully.
Thank you!

4 months ago by Larry Garfield — view source

unread

Good day, everyone. I hope you're doing well.

I’d like to introduce a draft version of the RFC for the True Async component.

https://wiki.php.net/rfc/true_async

I believe this version is not perfect and requires analysis. And I
strongly believe that things like this shouldn't be developed in
isolation. So, if you think any important (or even minor) aspects have
been overlooked, please bring them to attention.

The draft status also highlights the fact that it includes doubts about
the implementation and criticism. The main global issue I see is the
lack of "future experience" regarding how this API will be used—another
reason to bring it up for public discussion.

Wishing you all a great day, and thank you for your feedback!

I finally managed to read through enough of the RFC to say something intelligent. :-)

First off, as others have said, thank you for a thorough and detailed proposal. It's clear you've thought through a lot of details. I also especially like that it's transparent for most IO operations, which is mandatory for adoption. It's clear to me that async in PHP will never be more than niche until there is a built-in dev-facing API that is easy to use on its own without any 3rd party libraries.

Unfortunately, at this point I cannot support this proposal, because I disagree with the fundamental design primitives.

Let's look at the core design primitives:

A series of free-standing functions.
That only work if the scheduler is active.
The scheduler being active is a run-once global flag.
So code that uses those functions is only useful based on a global state not present in that function.
And a host of other seemingly low-level objects that have a myriad of methods on them that do, um, stuff.
Oh, and a lot of static methods, too, instead of free-standing functions.

The number of ways for this to go wrong and confuse the heck out of a developer is disturbingly high.

In the Low-Level API section, the RFC notes:

I came to the conclusion that, in the long run, sacrificing flexibility in favor of code safety is a reasonable trade-off.

I completely agree with this statement! And feel the RFC doesn't go even remotely far enough in that direction.

In particular, I commend to your attention this post about a Python async library that very deliberately works at a much higher level of abstraction, and is therefore vastly safer:

https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/

I won't repeat the post, but suffice to say I agree with it almost entirely. (I dislike the name "nursery," but otherwise...) That is the direction we should be looking at for PHP, from the get-go.

PHP doesn't have Python-style context managers (though I would like them), so a PHP version of that might look something like this (just spitballing):

async $context {
// $context is an object of AsyncContext, and can be passed around as such.
// It is the only way to span anything async, or interact with the async controls.
// If a function doesn't take an AsyncContext param, it cannot control async. This is good.

$context->run(some_function(...));
$result = $context->run(function(AsyncContext $ctx) use ($someObj) {
// This queues a thunk to run at the end of the closest async {} block.
$ctx->defer($someObj->shutdown(...));
});
}
catch (SomeException $e) {
// Exception thrown by one of the fibers.
}

// This is an unwrapped value.
print $result;

Naturally there would be more to the API, but I'm just showing the basics.

Importantly:

There is no global modal (schedulerStarted) to think about.
When the async {} block ends, you know with 100% certainty that there are no dangling background tasks.
It's explicitly obvious what functions are going to try and mess with the async context, and therefore cannot be called except within an async context.
An application can have sync portions and async portions very easily, without worrying about which "mode" it's in at a given time.

It also means that writing a number of the utilities mentioned in the RFC do not require any engine code. Eg:

function parallel_map(iterable $it, Closure $fn) {
$result = [];
async $ctx {
foreach ($it as $k => $v) {
$result[$k] = $ctx->run($fn($v));
}
}
return $result;
}

Now I know that's safe to call anywhere, whether I'm current in an active async mode or not.

I'm not convinced that sticking arbitrary key/value pairs into the Context object is wise; that's global state by another name. But if we must, the above would handle all the inheritance and override stuff quite naturally. Possibly with:

async $ctx from $parentCtx {
// ...
}

Similarly, the two different modes for channels strike me as quite unnecessary. I also would tend to favor how Rust does channels (via a library, I don't think it's a built-in): have separate variables for the in-side and out-side. Again, just spitballing:

[$in, $out] = Channel::create($buffer_size);

$in->send($val);

$out->receive($val);

(Give or take variations of those methods.)

Now you don't need to worry about fibers owning things. You just have a ChannelIn object and a ChannelOut object, and can pass either one to as many or as few functions as you want. And those functions could be spawning new fibers if you'd like, or not. (There's likely some complications here I'm not thinking of, but I've not dug into it in depth yet.) You can now close either side, or just let the objects go out of scope.

In short, I am fully in favor of better async logic in PHP. I am very against an API that even allows me to do something stupid or deadlock-creating, or that relies on hidden global state. That would be worse than the status quo, and there are better models than what is shown here that offer much stronger "correct by construction" guarantees.

--Larry Garfield

4 months ago by Eugene Sidelnyk — view source

unread

Hi there,

I would also like to highlight some interesting ideas that I find being
useful to consider.

Recently Bend programming language has been released, and it incorporates a
completely different view on the conception of "code", in the definition of
"what it is" and "how it should be interpreted".

While we interpret it as a sequence of instructions, the proper way of
seeing it is the graph of instructions. On every step we reduce that graph,
by running the code of the nodes current node depends on.

Therefore, basically everything could paralleled w/o the need to have fancy
management of threads and other low-level things.

For example, having this code:

$foo = foo();
$bar = bar();
$baz = $foo + $bar;

If it was run in Bend, it would be interpreted so that foo() and bar()
functions are executed in parallel, and $baz = $foo + $bar is executed
afterwards, since this computation depends on the other two.

The key, most excellently beautiful feature
here is that all async management is under the hood, exposing nothing for
the developers to be bothered with.

That being said, I also want to mention that Bend has a primitive for
concurrent loops. Actually, they used another solution, different from
loops, since loops are sequential by their essense (iterative one by one).
They introduced a concurrent alternative for loops with "bend" keyword,
allowing data structures to be traversed in parallel.

I think this is actually "the right way" of doing parallel processing in
general and async programming in particular, and this is greatly to be
considered for having at least some principles applied in PHP.

What I think it could be.

async function baz(): int {
$foo = foo();
$bar = bar();

return $foo + $bar;
}

// value is returned just like from any other ordinary function
$val = baz();

Function above could run foo() in one fiber, and bar() in another, both of
them being awaited at the return statement (at the first statement where
the value is actually used / referenced, if we put it more generally) so
that actual values could be taken.

In other words, async function is not promise-based as in other languages
that suffer from red blue function problem, but rather it is function with
coroutine flow of execution, so that foo() is executed as the first
coroutine, and when it blocks, then bar() is executed until it also blocks.
Then, at plus operator being evaluated, $foo is awaited, and $bar is
awaited, since they are necessary parts for + operation to complete.

Best regards

4 months ago by Rob Landers — view source

unread

Hi there,

I would also like to highlight some interesting ideas that I find being useful to consider.

Recently Bend programming language has been released, and it incorporates a completely different view on the conception of "code", in the definition of "what it is" and "how it should be interpreted".

While we interpret it as a sequence of instructions, the proper way of seeing it is the graph of instructions. On every step we reduce that graph, by running the code of the nodes current node depends on.

Therefore, basically everything could paralleled w/o the need to have fancy management of threads and other low-level things.

For example, having this code:

$foo = foo();
$bar = bar();
$baz = $foo + $bar;

If it was run in Bend, it would be interpreted so that foo() and bar() functions are executed in parallel, and $baz = $foo + $bar is executed afterwards, since this computation depends on the other two.

The key, most excellently beautiful feature
here is that all async management is under the hood, exposing nothing for the developers to be bothered with.

That being said, I also want to mention that Bend has a primitive for concurrent loops. Actually, they used another solution, different from loops, since loops are sequential by their essense (iterative one by one). They introduced a concurrent alternative for loops with "bend" keyword, allowing data structures to be traversed in parallel.

I think this is actually "the right way" of doing parallel processing in general and async programming in particular, and this is greatly to be considered for having at least some principles applied in PHP.

What I think it could be.

async function baz(): int {
$foo = foo();
$bar = bar();

return $foo + $bar;
}

// value is returned just like from any other ordinary function
$val = baz();

Function above could run foo() in one fiber, and bar() in another, both of them being awaited at the return statement (at the first statement where the value is actually used / referenced, if we put it more generally) so that actual values could be taken.

In other words, async function is not promise-based as in other languages that suffer from red blue function problem, but rather it is function with coroutine flow of execution, so that foo() is executed as the first coroutine, and when it blocks, then bar() is executed until it also blocks. Then, at plus operator being evaluated, $foo is awaited, and $bar is awaited, since they are necessary parts for + operation to complete.

Best regards

Huh. Reminds me of SSA, which can identify independent computations like that. It’s used by go, and many other compiled languages, but not in the same way this bend language does it. So, that’s interesting.

I don’t know if php could implement SSA (maybe opcache could), but with how dynamic php is, I’m not sure it would be helpful.

An interesting application nonetheless, thanks for sharing!

— Rob

4 months ago by Edmond Dantes — view source

unread

Hello, Eugene!

What I think it could be.

async function baz(): int {
$foo = foo();
$bar = bar();

return $foo + $bar;
}

// value is returned just like from any other ordinary function
$val = baz();

If we have code like $x + $y, and in one block it follows rule 1 while in
another block it follows rule 2, this increases the complexity of the
language. The worst part is that the same operators exhibit DIFFERENT
behavior in different contexts. This violates semantic integrity. (A
similar issue occurred in C++ with operator overloading, where a
theoretically elegant solution turned out to be terrible in practice).

If you want to achieve a clean syntax for concurrency in PHP, I would
suggest considering pipes in the long run.

For example:

|> $users = getUsers() ||| $orders = getOrders()
|> mergeByColumn($users, $orders, 'orders')

--

Ed.

4 months ago by Morgan — view source

unread

Hi there,

I would also like to highlight some interesting ideas that I find
being useful to consider.

Recently Bend programming language has been released, and it
incorporates a completely different view on the conception of "code", in
the definition of "what it is" and "how it should be interpreted".

While we interpret it as a sequence of instructions, the proper way
of seeing it is the graph of instructions. On every step we reduce that
graph, by running the code of the nodes current node depends on.

I've always kind of liked this model.

https://en.wikipedia.org/wiki/Dataflow_programming

4 months ago by Rowan Tommins [IMSoP] — view source

unread

PHP doesn't have Python-style context managers (though I would like them)

So would I, I've actually thought about it a lot...

But more importantly, this highlights something important about that Python library: it is built on top of a native async/await system which is baked into the language (note the example uses "async with", not normal "with").

That reinforces my earlier feeling that this RFC is trying to do far too much at once - it's not just about "low-level vs high-level", there's multiple whole features here:

asynchronous versions of native functions, and presumably a C API for writing those in extensions
facilities for writing coroutines (async/await, but not as keywords)
deferrable "microtasks"
event/signal handling functionality
communication between threads/fibers via Channels
a facility for launching coroutines concurrently (as Python demonstrates, this can be separate from how the coroutines themselves are written)
maybe more that I've overlooked while trying to digest the RFC

Having all of those would be amazing, but every one of them deserves its own discussion, and several can be left to userland or as future scope in an initial implementation.

Rowan Tommins
[IMSoP]

4 months ago by Edmond Dantes — view source

unread

Good day, Larry.

First off, as others have said, thank you for a thorough and detailed
proposal.
Thanks!

A series of free-standing functions.

That only work if the scheduler is active.

The scheduler being active is a run-once global flag.

So code that uses those functions is only useful based on a global
state not present in that function.

And a host of other seemingly low-level objects that have a myriad of
methods on them that do, um, stuff.

Oh, and a lot of static methods, too, instead of free-standing
functions.

Suppose these shortcomings don’t exist, and we have implemented the boldest
scenario imaginable. We introduce Structured Concurrency, remove low-level
elements, and possibly even get rid of Future. Of course, there are no
functions like startScheduler or anything like that.

In this case, how should PHP handle Fiber and all the behavior
associated with it? Should Fiber be declared deprecated and removed from
the language? What should the flow be?
What should be done with I/O functions? Should they remain blocking,
with a separate API provided as an extension?
Would it be possible to convince the maintainers of XDEBUG and other
extensions to rewrite their code to support the new model? ( If you're
reading this question now, please share your opinion. )
If transparent concurrency is introduced for I/O in point 2, what
should be done with Revolt + AMPHP? This would break their code. Should
an additional function or option be introduced to switch PHP into "legacy
mode"?

I share your feelings on many points, but I would like to see some
real-world alternative.

I commend to your attention this post about a Python async library

Structured concurrency is a great thing. However, I’d like to avoid
changing the language syntax and make something closer to Go’s semantics.
I’ll think about it and add this idea to my TODO.

async $context {
// $context is an object of AsyncContext, and can be passed around as
such.
// It is the only way to span anything async, or interact with the
async controls.
// If a function doesn't take an AsyncContext param, it cannot control
async. This is good.

This is a very elegant solution. Theoretically.

However, in practice, if you require explicitly passing the context to all
functions, it leads to the following consequences:

The semantics of all functions increase by one additional
parameter (Signature
bloat).
If an asynchronous call needs to be added to a function, and other
functions depend on it, then the semantics of all dependent functions must
be changed as well.

In strict languages, a hybrid model is often used, or like in Go, where the
context is passed explicitly as a synchronization object, but only when
necessary.

In this example, there is another aspect: the fact that async execution is
explicitly limited to a specific scope. This is essentially the same as
startScheduler, and it is one of the options I was considering.

Of course, startScheduler can be replaced with a construction like
async(function()
{ ... }).
This means that async execution is only active within the closure, and
coroutines can only be created inside that closure.

This is one of the semantic solutions that allows removing startScheduler,
but at the implementation level, it is exactly the same.

What do you think about this?

I'm not convinced that sticking arbitrary key/value pairs into the
Context object is wise;

Why not?

that's global state by another name

Static variables inside a function are also global state. Are you against
static variables?

But if we must, the above would handle all the inheritance and override
stuff quite naturally. Possibly with:

How will a context with open string keys help preserve service data that
the service doesn't want to expose to anyone? The Key() solution is
essentially the same as Symbol in JS, which is used for the same purpose.
Of course, we could add a coroutine static $var construct to the language
syntax. But it's all the same just syntactic sugar that would require more
code to support.

[$in, $out] = Channel::create($buffer_size);

This semantics require the programmer to remember that two variables
actually point to the same object. If a function has multiple channels,
this makes the code quite verbose. Additionally, such channels are
inconvenient to store in lists because their structure becomes more complex.

I would suggest a slightly different solution:

<code php> $in = new Channel()->getProducer(); async myFunction($in->getConsumer()); <code>

This semantics do not restrict the programmer in usage patterns while still
allowing interaction with the channel through a well-defined contract.

Thanks for the great examples, and a special thanks for the article.
I also like the definition of context.

Ed

4 months ago by Larry Garfield — view source

unread

Good day, Larry.

First off, as others have said, thank you for a thorough and detailed proposal.
Thanks!

A series of free-standing functions.

That only work if the scheduler is active.

The scheduler being active is a run-once global flag.

So code that uses those functions is only useful based on a global state not present in that function.

And a host of other seemingly low-level objects that have a myriad of methods on them that do, um, stuff.

Oh, and a lot of static methods, too, instead of free-standing functions.

Suppose these shortcomings don’t exist, and we have implemented the
boldest scenario imaginable. We introduce Structured Concurrency,
remove low-level elements, and possibly even get rid of Future. Of
course, there are no functions like startScheduler or anything like
that.

In this case, how should PHP handle Fiber and all the behavior
associated with it? Should Fiber be declared deprecated and removed
from the language? What should the flow be?

I'm not sure yet. I was quite hesitant about Fibers when they went in because they were so low-level, but the authors were confident that it was enough for a user-space toolchain to be iterated on quickly that everyone could use. That clearly didn't pan out as intended (Revolt exists, but usage of it is still rare), so here we are with a half-finished API.

Thinking aloud, perhaps we could cause new Fiber to create an automatic async block? Or we do deprecate it and discourage its use. Something to think through, certainly.

What should be done with I/O functions? Should they remain
blocking, with a separate API provided as an extension?

The fact that IO functions become transparently async when appropriate is the best part of the current RFC. Please keep that. :-)

Would it be possible to convince the maintainers of XDEBUG and
other extensions to rewrite their code to support the new model? ( If
you're reading this question now, please share your opinion. )

I cannot speak for Derick.

If transparent concurrency is introduced for I/O in point 2, what
should be done with Revolt + AMPHP? This would break their code.
Should an additional function or option be introduced to switch PHP
into "legacy mode"?

Also an excellent question, to which I do not yet have an answer. (See previous point about Fibers being half-complete.) I would want to involve Aaron, Christian, and Ces-Jan before trying to make any suggestions here.

Structured concurrency is a great thing. However, I’d like to avoid
changing the language syntax and make something closer to Go’s
semantics. I’ll think about it and add this idea to my TODO.

Well, as noted in the article, structured concurrency done right means not having unstructured concurrency. Having Go-style async and then building a structured nursery system on top of it means you cannot have any of the guarantees of the structured approach, because the other one is still poking out the side and leaking. We're already stuck with mutable-by-default, global variables, and other things that prevent us from making helpful assumptions. Please, let's try to avoid that for async. We don't need more gotos.

async $context {
// $context is an object of AsyncContext, and can be passed around as such.
// It is the only way to span anything async, or interact with the async controls.
// If a function doesn't take an AsyncContext param, it cannot control async. This is good.

This is a very elegant solution. Theoretically.

However, in practice, if you require explicitly passing the context to
all functions, it leads to the following consequences:

The semantics of all functions increase by one additional parameter
(Signature bloat).

No, just those functions/objects that necessarily involve running async control commands. Most wouldn't. They would just silently context switch when they hit an IO operation (which as noted above is transparency supported, which is what makes this work) and otherwise behave the same.

But if something does actively need to do async stuff, it should have a context to work within. It's the same discussion as:

A: "Pass/inject a DB connection to a class that needs it, don't just call a global db() function."
B: "But then I have to pass it to all these places explicitly!"
A: "That's a sign your SQL is too scattered around the code base. Fix that first and your problem goes away."

Explicit flow control is how you avoid bugs. It's also self-documenting, as it's patently obvious what code expects to run in an async context and which doesn't care.

If an asynchronous call needs to be added to a function, and other
functions depend on it, then the semantics of all dependent functions
must be changed as well.

This is no different than DI of any other service. I have restructured code to handle temporary contexts before. (My AttributeUtils and Serde libraries.) The result was... much better code than I had before. I'm glad I made those refactors.

In this example, there is another aspect: the fact that async execution
is explicitly limited to a specific scope. This is essentially the same
as startScheduler, and it is one of the options I was considering.

Of course, startScheduler can be replaced with a construction like
async(function() { ... }).
This means that async execution is only active within the closure, and
coroutines can only be created inside that closure.

This is one of the semantic solutions that allows removing
startScheduler, but at the implementation level, it is exactly the
same.

What do you think about this?

That looks mostly like the async block syntax I proposed, spelled differently. The main difference is that the body of the wrapped function would need to explicitly use any variables from scope that it wanted, rather than getting them implicitly. Whether that's good or bad is probably subjective.

But it would allow for a syntax like this for the context, which is quite similar to how database transactions are often done:

$val = async(function(AsyncContext $ctx) use ($stuff, $fn) {
$result = [];
foreach ($stuff as $item) {
$result[] = $ctx->run($fn);
}

// We block/wait here until all subtasks are complete, then the async() call returns this value.
return $result;
});

And of course in both cases you could use a pre-defined callable instead of inlining one. At this point I think it's mostly a stylistic difference, function vs block.

I'm not convinced that sticking arbitrary key/value pairs into the Context object is wise;

Why not?

that's global state by another name

Static variables inside a function are also global state. Are you
against static variables?

Vocally, in fact. :-)

But if we must, the above would handle all the inheritance and override stuff quite naturally. Possibly with:

How will a context with open string keys help preserve service data
that the service doesn't want to expose to anyone? The Key() solution
is essentially the same as Symbol in JS, which is used for the same
purpose. Of course, we could add a coroutine static $var construct to
the language syntax. But it's all the same just syntactic sugar that
would require more code to support.

I cannot speak to JS Symbols as I haven't used them. I am just vhemently opposed to globals, no matter how many layers they're wrapped in. :-) Most uses could be replaced by proper DI or partial application.

[$in, $out] = Channel::create($buffer_size);

This semantics require the programmer to remember that two variables
actually point to the same object. If a function has multiple channels,
this makes the code quite verbose. Additionally, such channels are
inconvenient to store in lists because their structure becomes more
complex.

I would suggest a slightly different solution:
<code php> $in = new Channel()->getProducer(); async myFunction($in->getConsumer()); <code>
This semantics do not restrict the programmer in usage patterns while
still allowing interaction with the channel through a well-defined
contract.

I'd go slightly differently if you wanted to go that route:

$ch = new Channel($buffer_size);
$in = $ch->producer();
$out = $ch->consumer();

// You do most interaction with $in and $out.

I could probably work with that as well.

(Or even just $ch->inPipe and $ch->outPipe, now that we have nice property support.)

But the overall point, I think, is avoiding implicit modal logic. If my code doesn't need to care if it's in an async world, it doesn't care. If it does, then I need an explicit async world to work within, rather than relying on one implicitly existing, I hope. And I shouldn't have to think about "who owns this end of this channel". I just have an in and out hose I stick stuff into and pull out from, kthxbye.

Thanks for the great examples, and a special thanks for the article.
I also like the definition of context.

Ed

--Larry Garfield

4 months ago by Edmond Dantes — view source

unread

Thinking aloud, perhaps we could cause new Fiber to create an
automatic async block?

The main issue with Fibers is their switching logic:
If you create Fiber A and call another Fiber B inside it, Fiber B can only
return to the Fiber that created it, not just anywhere. However, the
Scheduler requires entirely different behavior.

This creates a conflict with the Scheduler. Moreover, it can even break the
Scheduler if it operates based on Fibers. That's why all these strange
solutions in the RFC are just workarounds to somehow bypass this problem.
But it seems we've already found an alternative solution.

I cannot speak for Derick.

Of course, I just mean that he probably won't be happy about it :)

No, just those functions/objects that necessarily involve running async
control commands. Most wouldn't.
They would just silently context switch when they hit an IO operation
(which as noted above is transparency supported, which is what makes this
work) and otherwise behave the same.

So it's something more like Go or Python.

$val = async(function(AsyncContext $ctx) use ($stuff, $fn) {
$result = [];
foreach ($stuff as $item) {
$result[] = $ctx->run($fn);
}

// We block/wait here until all subtasks are complete, then the async()
call returns this value.
return $result;
});

Do I understand correctly that at the point $val =
async(function(AsyncContext $ctx) use ($stuff, $fn) execution stops until
everything inside is completed?

If so, let me introduce a second semantic option (for now, I'll remove the
context and focus only on the function).

$url1 = 'https://domain1.com/';
$url2 = 'https://domain2.com/';

$url_handle = fn(string $url) => file_get_contents($url);

$res = Async\start(function() use ($url1, $url2, $url_handle) {
    $res1 = Async\run($url_handle, $url1);
    $res2 = Async\run($url_handle, $url2);

    Async\run(fn() => sleep(5));

    // some logic here

    return $merged_result;
});

What's Happening Here:

After calling $res = Async\start(), the code waits until the entire
block completes.
Inside Async\start, the code waits for all nested coroutines to
finish.
If a coroutine has other nested coroutines, the same rule applies.

Rules Inside an Asynchronous Block:

I/O functions do not block coroutines within the block.
Creating a new Fiber is not allowed — an exception will be thrown:
you cannot use Fiber.
Unhandled exceptions will be thrown at the point of $res =
Async\start().

Coroutine Cancellation Rules:

Canceling a coroutine cancels it and all its child coroutines (this cannot
be bypassed unless the coroutine is created in a different context).

How does this option sound to you?

Essentially, this is Kotlin, but it should also resemble Python. However,
unlike Kotlin, there are no special language constructs here—code blocks
naturally serve that role. Of course, syntactic sugar can be added later
for better readability.

And if you like this, I have good news: there are no implementation issues
at this level.

In terms of semantic elegance, the only thing that bothers me is that return
behavior is slightly altered — meaning the actual "return" won’t happen
until all child functions complete. This isn’t very good, and Kotlin’s
style would fit better here.

But on the other hand — can we live with this?

I cannot speak to JS Symbols as I haven't used them.
I am just vhemently opposed to globals, no matter how many layers they're
wrapped in. :-) Most uses could be replaced by proper DI or partial
application.

You won’t be able to use DI because you have only one service (instance of
class) for the entire application, not a separate service for each
coroutine. This service is shared across the application and can be called
from any coroutine. As a result, the service needs memory slots to store or
retrieve data. DI is a mechanism used once during service initialization,
not every time a method is called.

The only question is whether to use open text keys in the context, which is
unsafe and can lead to collisions, or to use a unique key-object that is
known only to the one who created it. (If PHP introduces object constants,
this syntax would also look elegant.)

There is, of course, another approach: making Context any arbitrary object
defined by the user. But this solution has a known downside — lack of a
standard interface.

(Or even just $ch->inPipe and $ch->outPipe, now that we have nice
property support.)

Just a brilliant idea. :)

Have a good day!

Ed.

4 months ago by Rowan Tommins [IMSoP] — view source

unread

Essentially, this is Kotlin, but it should also resemble Python.
However, unlike Kotlin, there are no special language constructs
here—code blocks naturally serve that role. Of course, syntactic sugar
can be added later for better readability.

To pick up on this point: PHP doesn't have any generalised notion of
"code blocks", only Closures, and those have a "weight" which is more
fundamental than syntax: creating the Closure object, copying or
referencing captured variables, creating a new execution stack frame,
and arranging for parameters to be passed in and a return value passed out.

Perhaps more importantly, there's a reason most languages don't
represent flow control purely in terms of functions and objects: it's
generally far simpler to define "this is the semantics of a while loop"
and implement it in the compiler or VM, than "these building blocks are
sufficient that any kind of loop can be built in userland without
explicit compiler support".

Defining new syntax would encourage us to define a minimum top-level
behaviour, such as "inside an async{} block, these things are possible,
and these things are guaranteed to be true". Then we simply make that
true by having the compiler inject whatever actions it needs before,
during, and after that block. Any additional keywords, functions, or
objects, are then ways for the user to vary or make use of that flow,
rather than ways to define the flow itself.

This is roughly what happened with Closures themselves in PHP: first,
decide that "$foo = function(){};" will be valid syntax, and define
Closure as the type of $foo; then over time, add additional behaviour to
the Closure class, the ability to add __invoke() hooks on other classes, etc

Regards,

--
Rowan Tommins
[IMSoP]

4 months ago by Rowan Tommins [IMSoP] — view source

unread

This is roughly what happened with Closures themselves in PHP: first,
decide that "$foo = function(){};" will be valid syntax, and define
Closure as the type of $foo; then over time, add additional behaviour
to the Closure class, the ability to add __invoke() hooks on other
classes, etc

Sorry to double-post, but Generators are probably a better example: you
can write "$foo = yield $bar;" and there are well-defined semantics; on
the outside of the function, we represent the state as a Generator
object, and make it implement Iterable to explain how foreach() works;
but on the inside of the function, it's pure magic: $bar is passed into
an invisible channel, an invisible continuation is created, and when
it's resumed another invisible channel passes out a value for $foo.

--
Rowan Tommins
[IMSoP]

4 months ago by Edmond Dantes — view source

unread

Defining new syntax would encourage us to define a minimum top-level
behaviour, such as "inside an async{} block, these things are possible,
and these things are guaranteed to be true"

True. This is precisely the main reason not to change the syntax. The
issue is not even about how many changes need to be made in the code, but
rather about how many agreements need to be considered.

Ed.

4 months ago by Rowan Tommins [IMSoP] — view source

unread

Defining new syntax would encourage us to define a minimum top-level
behaviour, such as "inside an async{} block, these things are possible,
and these things are guaranteed to be true"

True. This is precisely the main reason not to change the syntax. The
issue is not even about how many changes need to be made in the code,
but rather about how many agreements need to be considered.

Quite the opposite: with a function-and-object approach everything needs
a name, an API, and a way of being described in relation to how the
language already works. In a syntax-and-semantics approach, we only need
to describe the things people actually need.

The generator implementation doesn't have a name or API for where the
value on the right of a "yield" goes, or where the value on its left
comes from; we just describe the behaviour: values passed to yield
somehow end up in the calling scope's Generator object, and values
passed to that object somehow end up back at the yield statement. We
don't have to define the API for a GeneratorContext object, and the
semantics of what happens when users pass it around and store it in
different scopes.

In the same way, do we actually need to design what an "async context"
looks like to the user? Do we actually want the user to be able to have
access to two (nested) async contexts at once, and choose which one to
spawn a task into? Or would we prefer, at least in the minimum
implementation, to say "when you spawn a task, it spawns in the current
async context, and if there is no current async context, an error is
thrown"?

--
Rowan Tommins
[IMSoP]

4 months ago by Edmond Dantes — view source

unread

In a syntax-and-semantics approach, we only need to describe the things
people actually need.

There is no doubt that syntax provides the programmer with a clear tool for
expressing intent.

In the same way, do we actually need to design what an "async context"
looks like to the user?

Its implementation is more about deciding which paradigms we want to
support.

If we want to support global services that require local state within a
coroutine, then they need a context. If there are no global "impure"
services (i.e., those maintaining state within a coroutine), then a context
may not be necessary. The first paradigm is not applicable to pure
multitasking—almost all programming languages (as far as I know) have
abandoned it in favor of ownership/memory passing. However, in PHP, it is
popular.

For example, PHP has functions for working with HTTP. One of them writes
the last received headers into a "global" variable, and another function
allows retrieving them. This is where a context is needed. Or, for
instance, when a request is made inside a coroutine, the service that
handles socket interactions under the hood must:

Retrieve a socket from the connection pool.
Place the socket in the coroutine’s context for as long as it is
needed.

However, this same scenario could be implemented more elegantly if PHP code
explicitly used an object like "Connection" or "Transaction" and retrieved
it from the pool. In that case, a context would not be needed.

Thus, the only question is: do we need to maintain state between
function/method calls within a coroutine?

Do we actually want the user to be able to have access to two (nested)
async contexts at once, and choose which one to spawn a task into?

If we discard the Go model, where the programmer decides what to do and
which foot to shoot themselves in, and instead use parent-child coroutines,
then such a function breaks this rule. This means it should not exist, as
its presence increases system complexity. However, in the parent-child
model, there is a case where a coroutine needs to be created in a different
context.

For example:

A request to reset a password arrives at the server.
The API creates a coroutine in a separate context from the request to
send an email.
The API returns a 201 response.

In this case, a special API is needed to accomplish this. The downside of
any strict semantics is the presence of exceptional cases. However, such
cases should be rare. If they are not, then the parent-child model is not
suitable.

To resolve this issue, we need to know the opinions of framework
maintainers. They should say either: Yes, this approach will reduce the
amount of code, or No, it will increase the codebase, or We don't care,
do as you like :)

4 months ago by Rowan Tommins [IMSoP] — view source

unread

For example, PHP has functions for working with HTTP. One of them
writes the last received headers into a "global" variable, and another
function allows retrieving them. This is where a context is needed.

OK, let's dig into this case: what is the actual problem, and what does
an async design need to provide so that it can be solved.

As far as I know, all current SAPIs follow one of two patterns:

The traditional "shared nothing" approach: each request is launched
in a new process or thread, and all global state is isolated to that
request.
The explicit injection approach: the request and response are
represented as objects, and the user must pass those objects around to
where they are needed.

Notably, 2 can be emulated on top of 1, but not vice versa, and this is
exactly what a lot of modern applications and frameworks do: they take
the SAPI's global state, and wrap it in injected objects (e.g. PSR-7
ServerRequestInterface and ServerResponseInterface).

Code written that way will work fine on a SAPI that spawns a fiber for
each request, so there's no problem for us to solve there.

At the other extreme are frameworks and applications that access the
global state directly throughout - making heavy use of superglobal,
global, and static variables; directly outputting using echo/print, etc.
Those will break in a fiber-based SAPI, but as far as I can see, there's
nothing the async design can do to fix that.

In the middle, there are some applications we might be able to help:
they rely on global state, but wrap it in global functions or static
methods which could be replaced with some magic from the async
implementation.

So our problem statement is:

given a function that takes no request-specific input, and is expected
to return request-specific state (e.g. function
get_query_string_param(string $name): ?string)
and, given a SAPI that spawns a fiber for each request
how do we adjust the implementation of the function, without changing
its signature?

Things we don't need to define:

how the SAPI works
how the data is structured inside the function

Non-solutions:

refactoring the application to pass around a Context object - if we're
willing to do that, we can just pass around a PSR-7 RequestInterface
instead, and the problem goes away

Minimal solution:

a way to get an integer or string, which the function can use to
partition its data

Usage example:

function get_query_string_param(string $name): ?string {
    global $request_data; // in a shared-nothing SAPI, this is
per-request; but in a fiber-based one, it's shared between requests
    $request_data_partition = $request_data[
Fiber::getCurrent()->getId() ]; // this line makes the function work
under concurrent SAPIs
    return $request_data_partition['query_string'][$name]; // this line
is basically unchanged from the original application
}

Limitation:

if the SAPI spawns a fiber for the request, but that fiber then spawns
child fibers, the function won't find the right partition

Minimal solution:

track and expose the "parent" of each fiber

Usage example:

function get_query_string_param(string $name): ?string {
    global $request_data;
    // Traverse until we find the ID we've stored data against in our
request bootstrapping code
    $fiber = Fiber::getCurrent();
    while ( ! isset($request_data[ $fiber->getId() ] ) {
        $fiber = $fiber->getParent();
    }
    $request_data_partition = $request_data[ $fiber->getId() ];
    return $request_data_partition['query_string'][$name];
}

Obviously, this isn't the only solution, but it is sufficient for this
problem.

As a first pass, it saves us bikeshedding exactly what methods an
Async\Context class should have, because that whole class can be added
later, or just implemented in userland.

If we strip down the solution initially, we can concentrate on the
fundamental design - things like "Fibers have parents", and what that
implies for how they're started and used.

--
Rowan Tommins
[IMSoP]

4 months ago by Rob Landers — view source

unread

For example, PHP has functions for working with HTTP. One of them
writes the last received headers into a "global" variable, and another
function allows retrieving them. This is where a context is needed.

OK, let's dig into this case: what is the actual problem, and what does
an async design need to provide so that it can be solved.

As far as I know, all current SAPIs follow one of two patterns:

The traditional "shared nothing" approach: each request is launched
in a new process or thread, and all global state is isolated to that
request.

The explicit injection approach: the request and response are
represented as objects, and the user must pass those objects around to
where they are needed.

Notably, 2 can be emulated on top of 1, but not vice versa, and this is
exactly what a lot of modern applications and frameworks do: they take
the SAPI's global state, and wrap it in injected objects (e.g. PSR-7
ServerRequestInterface and ServerResponseInterface).

Code written that way will work fine on a SAPI that spawns a fiber for
each request, so there's no problem for us to solve there.

At the other extreme are frameworks and applications that access the
global state directly throughout - making heavy use of superglobal,
global, and static variables; directly outputting using echo/print, etc.
Those will break in a fiber-based SAPI, but as far as I can see, there's
nothing the async design can do to fix that.

In the middle, there are some applications we might be able to help:
they rely on global state, but wrap it in global functions or static
methods which could be replaced with some magic from the async
implementation.

I think this might be an invalid assumption. A SAPI is written in C (or at least, using the C api's) and thus can do just about anything. If it wanted to, it could swap out the global state when switching fibers. This isn't impossible, nor all that hard to do. If I were writing this feature in an existing SAPI, this is probably exactly what I would do to maintain maximal compatibility.

So, at a minimum, I would guess the engine needs to provide hooks that the SAPI can use to provide request contexts to the global state (such as a "(before|after)FiberSwitch" function or something called around the fiber switch).

That being said, I'm unsure if an existing SAPI would send multiple requests to the same thread/process already handling a request. This would be a large undertaking and require those hooks to know from which request output is coming from so it can direct it to the right socket.

Remember, fibers are still running in a single thread/process. They are not threading and running concurrently. They are taking turns in the same thread. Sharing memory between fibers is relatively easy and not complicated. Amphp has a fiber-local memory (this context, basically), and I have never had a use for it, even once, in the last five years.

If fibers were to allow true concurrency, we would need many more primitives. At the minimum we would need mutexes to prevent race conditions in critical sections. With current fibers, you don't need to worry about that (usually), because there is never more than one fiber running at any given time. That being said, I have had to use amphp mutexes and semaphores to ensure that there is some kind of synchronization -- a real life example is a custom database driver I maintain that needs to ensure exactly one fiber is writing a query to the database at a time (since this is non-blocking).

— Rob

4 months ago by Edmond Dantes — view source

unread

A SAPI is written in C (or at least, using
the C api's) and thus can do just about anything. If it wanted to, it
could swap out
the global state when switching fibers.

Probably, it's possible. However, if I'm not mistaken, $_GET and $_POST are
implemented as regular PHP arrays, so if they need to be adapted, they
should be replaced with proxy objects.

So, at a minimum, I would guess the engine needs to provide hooks that
the SAPI can use to provide request contexts to the global state

Thus, the cost of coroutine switching increases. Coroutines can switch
between each other multiple times during a single SQL query. If there are
10-20 such queries, the total number of switches can reach hundreds. Using
the proxy pattern is the most common practice in this case.

If fibers were to allow true concurrency, we would need many more
primitives.

You mean true parallelism. If that happens, all existing PHP frameworks,
libraries, and C extensions would have to be rewritten, sometimes almost
from scratch. But it would most likely be a different language.

Ed.

4 months ago by Larry Garfield — view source

unread

No, just those functions/objects that necessarily involve running async control commands. Most wouldn't.
They would just silently context switch when they hit an IO operation (which as noted above is transparency supported, which is what makes this
work) and otherwise behave the same.

So it's something more like Go or Python.

$val = async(function(AsyncContext $ctx) use ($stuff, $fn) {
$result = [];
foreach ($stuff as $item) {
$result[] = $ctx->run($fn);
}

// We block/wait here until all subtasks are complete, then the async() call returns this value.
return $result;
});

Do I understand correctly that at the point $val = async(function(AsyncContext $ctx) use ($stuff, $fn) execution stops
until everything inside is completed?

Correct. By the time $val is populated, all fibers/coroutines/tasks started inside that block have completed and closed, guaranteed. If an exception was thrown or something else went wrong, then by the time the exception escapes the asnc{} block, all fibers inside it are done and closed, guaranteed. (If there's another async {} block further up the stack somewhere, there may still be other background fibers running, but anything created inside that block is guaranteed done.)

If so, let me introduce a second semantic option (for now, I'll remove
the context and focus only on the function).
$url1 = 'https://domain1.com/';
$url2 = 'https://domain2.com/';

$url_handle = fn(string $url) => file_get_contents($url);

$res = Async\start(function() use ($url1, $url2, $url_handle) {
    $res1 = Async\run($url_handle, $url1);
    $res2 = Async\run($url_handle, $url2);

    Async\run(fn() => sleep(5));

    // some logic here

    return $merged_result;
});
What's Happening Here:

After calling $res = Async\start(), the code waits until the
entire block completes.

Inside Async\start, the code waits for all nested coroutines to
finish.

If a coroutine has other nested coroutines, the same rule applies.
Rules Inside an Asynchronous Block:

I/O functions do not block coroutines within the block.

Creating a new Fiber is not allowed — an exception will be
thrown: you cannot use Fiber.

Unhandled exceptions will be thrown at the point of $res = Async\start().
Coroutine Cancellation Rules:

Canceling a coroutine cancels it and all its child coroutines (this
cannot be bypassed unless the coroutine is created in a different
context).

How does this option sound to you?

We can quibble on the details and spelling, but I think the overall logic is sound. One key question, if we disallow explicitly creating Fibers inside an async block, can a Fiber be created outside of it and not block async, or would that also be excluded? Viz, this is illegal:

async {
$f = new Fiber(some_func(...));
}

But would this also be illegal?

$f = new Fiber(some_func(...));
$f->start();

async {
do_stuff();
}

Essentially, this is Kotlin, but it should also resemble Python.
However, unlike Kotlin, there are no special language constructs
here—code blocks naturally serve that role. Of course, syntactic sugar
can be added later for better readability.

My brief foray into Kotlin in a previous job didn't get as far as coroutines, so I will take your word from it. From a very cursory glance at the documentation, I think runBlocking {} is approximately what I am describing, yes. The various other block types I don't know are necessary.

And if you like this, I have good news: there are no implementation
issues at this level.

In terms of semantic elegance, the only thing that bothers me is that
return behavior is slightly altered — meaning the actual "return"
won’t happen until all child functions complete. This isn’t very good,
and Kotlin’s style would fit better here.

I'm not sure I follow. The main guarantee we want is that "once you pass this }, all fibers/coroutines have ended, count on it." Do you mean something like this?

async $ctx {
$ctx->run(foo(...));
$ctx->run(bar(...));

// This return statement blocks until foo() and bar() complete.
return "all done";
}

That doesn't seem any weirder than return and finally{} blocks. :-) (Note that we can and should consider if async {} makes sense to have its own catch and finally blocks built in.)

But on the other hand — can we live with this?

This seems far closer to something I'd support than the current RFC, yes.

I cannot speak to JS Symbols as I haven't used them.
I am just vhemently opposed to globals, no matter how many layers they're wrapped in. :-) Most uses could be replaced by proper DI or partial application.

You won’t be able to use DI because you have only one service
(instance of class) for the entire application, not a separate service
for each coroutine. This service is shared across the application and
can be called from any coroutine. As a result, the service needs memory
slots to store or retrieve data. DI is a mechanism used once during
service initialization, not every time a method is called.

Not true. DI doesn't imply singleton objects. Most good DI containers default to singleton objects, as they should, but for example Laravel's container does not. You have to opt-in to singleton behavior. (I think that's a terrible design, but it's still DI.)

DI just means "a scope gets the stuff it needs given to it, it never asks for it." How that stuff is passed in is, deliberately, undefined. A DI container is but one way.

In Crell/Serde, I actually use "runner objects" a lot. I have an example here:

https://presentations.garfieldtech.com/slides-serialization/longhornphp2023/#/7/4/3

That is still dependency injection, because ThingRunner is still taking all of its dependencies via the constructor. And being readonly, it's still immutable-friendly.

That's the sort of thing I'm thinking of here for the async context. To spitball again:

class ClientManager {
public function __construct(string $base) {}

public function client(AsyncContext $ctx) {
return new HttpClient($this->base, $ctx);
}
}

class HttpClient {
public function __construct(private string $base, private AsyncContext $ctx) {}

public function get(string $path) {
$this->ctx->defer(fn() => print "Read $path\n");
return $this->ctx->run(fn() => file_get_contents($this->base . $path));
}
}

$manager = $container->get(ClientManager::class);

async $ctx {
$client = $manager->client($ctx);
$client->get('/foo');
$client->get('/bar');
}
// We don't get here until all file_get_contents() calls are complete.
// The deferred functions all get called right here.

// There is no no async happening anymore.
print "Done";

I'm pretty sure the return values are all messed up there, but hopefully you get the idea. Now HttpClient has a fully injected context that controls what async scope it's working in. The same class can be used in a bunch of different async blocks, each with their own context. You can even mock AsyncContext for testing purposes just like any other constructor argument. And not a global function or variable in sight! :-)

--Larry Garfield

4 months ago by Edmond Dantes — view source

unread

One key question, if we disallow explicitly creating Fibers inside an
async block,
can a Fiber be created outside of it and not block async, or would that
also be excluded? Viz, this is illegal:

Creating a Fiber outside of an asynchronous block is allowed; this
ensures backward compatibility.
According to the logic integrity rule, an asynchronous block cannot be
created inside a Fiber. This is a correct statement.

However, if the asynchronous block blocks execution, then it does not
matter whether a Fiber was created or not, because it will not be possible
to switch it in any way.
So, the answer to your question is: yes, such code is legal, but the Fiber
will not be usable for switching.

In other words, Fiber and an asynchronous block are mutually exclusive.
Only one of them can be used at a time: either Fiber + Revolt or an
asynchronous block.

Of course, this is not an elegant solution, as it adds one more rule to the
language, making it more complex. However, from a legacy perspective, it
seems like a minimal scar.

(to All: Please leave your opinion if you are reading this )

// This return statement blocks until foo() and bar() complete.

Yes, that's correct. That's exactly what I mean.

Of course, under the hood, return will execute immediately if the coroutine
is not waiting for anything. However, the Scheduler will store its result
and pause it until the child coroutines finish their work.

In essence, this follows the parent-child coroutine pattern, where they are
always linked. The downside is that it requires more code inside the
implementation, and some people might accuse us of a paternalistic
approach. :)

should consider if async {} makes sense to have its own catch and finally
blocks built in.)

We can use the approach from the RFC to catch exceptions from child
coroutines: explicit waiting, which creates a handover point for exceptions.

Alternatively, a separate handler like Context::catch() could be
introduced, which can be defined at the beginning of the coroutine.

Or both approaches could be supported. There's definitely something to
think about here.

That is still dependency injection, because ThingRunner is still taking
all of its dependencies via the constructor. And being readonly, it's
still immutable-friendly.

Yeah, so basically, you're creating the service again and again for each
coroutine if the coroutine needs to use it. This is a good solution in the
context of multitasking, but it loses in terms of performance and memory,
as well as complexity and code size, because it requires more factory
classes.

The main advantage of LongRunning is initializing once and using it
multiple times. On the other hand, this approach explicitly manages memory,
ensuring that all objects are created within the coroutine's context rather
than in the global context.

Ah, now I see how much you dislike global state! :)

However, in a scenario where a web server handles many similar requests,
"global state" might not necessarily win in terms of speed but rather due
to the simplicity of implementation and the overall maintenance cost of the
code. (I know that in programming, there is an entire camp of immutability
advocates who preach that their approach is the key remedy for errors.)

I would support both paradigms, especially since it doesn’t cost much.

A coroutine will own its internal context anyway, and this context will be
carried along with it, even across threads. How to use this context is up
to the programmer to decide. But at the same time, I will try to make the
pattern you described fit seamlessly into this logic.

Ed.

4 months ago by Daniil Gentili — view source

unread

Of course, this is not an elegant solution, as it adds one more rule to the language, making it more complex. However, from a legacy perspective, it seems like a minimal scar.
(to All: Please leave your opinion if you are reading this )

Larry’s approach seems like a horrible idea to me: it increases complexity, prevents easy migration of existing code to an asynchronous model and is incredibly verbose for no good reason.

The arguments mentioned in https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ are not good arguments at all, as they essentially propose explicitly reducing concurrency (by allowing it only within async blocks) or making it harder to use by forcing users to pass around contexts (which is even worse than function colouring https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/).
This (supposedly) reduces issues with resource contention/race conditions: sure, if you don’t use concurrency or severely limit it, you will have less issues with race conditions, but that’s not an argument in favour of nurseries, that’s an argument against concurrency.

Race conditions and deadlocks are possible either way when using concurrency, and the way to avoid them is to introduce synchronisation primitives (locks, mutexes similar to the ones in https://github.com/amphp/sync/, or lockfree solutions like actors, which I am a heavy user of), not bloating signatures by forcing users to pass around contexts, reducing concurrency and completely disallowing global state.

Golang is the perfect example of a language that does colourless, (mostly) contextless concurrency without the need for coloured (async/await keywords) functions and other complications.
Race conditions are deadlocks are avoided, like in any concurrent model, by using appropriate synchronisation primitives, and by communicating with channels (actor model) instead of sharing memory, where appropriate.

Side note, I very much like the current approach of implicit cancellations, because they even remove the need to pass contexts to make use of cancellations, like in golang or amphp (though the RFC could use some further work regarding cancellation inheritance between fibers, but that’s a minor issue).

Yeah, so basically, you're creating the service again and again for each coroutine if the coroutine needs to use it. This is a good solution in the context of multitasking, but it loses in terms of performance and memory, as well as complexity and code size, because it requires more factory classes.

^ this

Regarding backwards compatibility (especially with revolt), since I also briefly considered submitting an async RFC and thought about it a bit, I can suggest exposing an event loop interface like https://github.com/revoltphp/event-loop/blob/main/src/EventLoop.php, which would allow userland event loop implementations to simply switch to using the native event loop as backend (this’ll be especially simple to do for which is the main user of fibers, revolt, since the current implementation is clearly inspired by revolt’s event loop).

Essentially, the only thing that’s needed for backwards-compatibility in most cases is an API that can be used to register onWritable, onReadable callbacks for streams and a way to register delayed (delay) tasks, to completely remove the need to invoke stream_select.

I’d recommend chatting with Aaron to further discuss backwards compatibility and the overall RFC: I’ve already pinged him, he’ll chime in once he has more time to read the RFC.


To Edmond, as someone who submitted RFCs before: stand your ground, try not to listen too much to what people propose in this list, especially if it’s regarding radical changes like Larry's; avoid bloating the RFC with proposals that you do not really agree with.


Regards,
Daniil Gentili

—

Daniil Gentili - Senior software engineer 

Portfolio: https://daniil.it <https://daniil.it/>;
Telegram: https://t.me/danogentili

4 months ago by Edmond Dantes — view source

unread

Hello, Daniil.

Essentially, the only thing that’s needed for backwards-compatibility in
most cases is an API that can be used to register onWritable,
onReadable callbacks for streams and a way to register delayed (delay)
tasks, to completely remove the need to invoke stream_select.

Thank you for this point. It seems I was mistaken in thinking that there is
a Scheduler inside Revolt. Of course, if we're only talking about the
EventLoop, maintaining compatibility won't be an issue at all.

I’d recommend chatting with Aaron to further discuss backwards
compatibility and the overall RFC: I’ve already pinged him, he’ll chime in
once he has more time to read the RFC.

That would be really cool.

To Edmond, as someone who submitted RFCs before: stand your ground, try
not to listen too much to what people propose in this list,
especially if it’s regarding radical changes like Larry's; avoid bloating
the RFC with proposals that you do not really agree with.

Actually, I agree in many ways. In programming, there's an eternal struggle
between abstraction and implementation,

between strict rules and flexibility, between paternalism where the
language makes decisions for you and freedom.

Each of these traits is beneficial in certain scenarios. The most important
thing is to understand whether it will be beneficial for PHP scenarios.
This is the main goal of this RFC stage. That's why I would really like to
hear the voices of those who create PHP's code infrastructure. I mean,
Symfony, Laravel, etc.

Thanks!

Ed.

4 months ago by Larry Garfield — view source

unread

One key question, if we disallow explicitly creating Fibers inside an async block,
can a Fiber be created outside of it and not block async, or would that also be excluded? Viz, this is illegal:

Creating a Fiber outside of an asynchronous block is allowed; this
ensures backward compatibility.
According to the logic integrity rule, an asynchronous block cannot be
created inside a Fiber. This is a correct statement.

However, if the asynchronous block blocks execution, then it does not
matter whether a Fiber was created or not, because it will not be
possible to switch it in any way.
So, the answer to your question is: yes, such code is legal, but the
Fiber will not be usable for switching.

In other words, Fiber and an asynchronous block are mutually exclusive.
Only one of them can be used at a time: either Fiber + Revolt or an
asynchronous block.

Of course, this is not an elegant solution, as it adds one more rule to
the language, making it more complex. However, from a legacy
perspective, it seems like a minimal scar.

(to All: Please leave your opinion if you are reading this )

This seems like a reasonable approach to me, given the current state. At any give time, you can have "manual" or "automatic" handling in use, but one has to completely finish before you can start using the other. Whether we should remove the "manual" access in the future becomes a question for the future.

// This return statement blocks until foo() and bar() complete.

Yes, that's correct. That's exactly what I mean.

Of course, under the hood, return will execute immediately if the
coroutine is not waiting for anything. However, the Scheduler will
store its result and pause it until the child coroutines finish their
work.

In essence, this follows the parent-child coroutine pattern, where they
are always linked. The downside is that it requires more code inside
the implementation, and some people might accuse us of a paternalistic
approach. :)

See, what you call "paternalistic" I say is "basic good usability." Affordances are part of the design of everything. Good design means making doing the right thing easy and the wrong thing hard, preferably impossible. (Eg, why 120v and 220v outlets have incompatible plugs, to use the classic example.) I am a strong support of correct by construction / make invalid states unrepresentable / type-driven development, or whatever it's called this week.

And history has demonstrated that humans simply cannot be trusted to manually handle synchronization safely, just like they cannot be trusted to manually handle memory safely. :-) (That's why green threads et al exist.)

That is still dependency injection, because ThingRunner is still taking all of its dependencies via the constructor. And being readonly, it's still immutable-friendly.

Yeah, so basically, you're creating the service again and again for
each coroutine if the coroutine needs to use it. This is a good
solution in the context of multitasking, but it loses in terms of
performance and memory, as well as complexity and code size, because it
requires more factory classes.

Not necessarily. It depends on what all you're doing when creating those objects. It can be quite fast. Plus, if you want a simpler approach, just pass the context directly:

async $ctx {
$ctx->run($httpClient->runAsync($ctx, $url));
}

It's just a parameter to pass. How you pass it is up to you.

It is literally the same argument for "pass the DB connection into the constructor, don't call a static method to get it" or "pass in the current user object to the method, don't call a global function to get it." These are decades-old discussions with known solved problems, which all boil down to "pass things explicitly."

To quote someone on FP: "The benefit of functional programming is it makes data flow explicit. The downside is it sometimes painfully explicit."

I am far happier with explicit that is occasionally annoyingly so, and building tools and syntax to reduce that annoyance, than having implicit data just floating around in the ether around me and praying it's what I expect it to be.

The main advantage of LongRunning is initializing once and using it
multiple times. On the other hand, this approach explicitly manages
memory, ensuring that all objects are created within the coroutine's
context rather than in the global context.

As above, in simpler cases you can just make the context a boring old function parameter, in which case the perf overhead is unmesurable.

Ah, now I see how much you dislike global state! :)

It is the root of all evil.

However, in a scenario where a web server handles many similar
requests, "global state" might not necessarily win in terms of speed
but rather due to the simplicity of implementation and the overall
maintenance cost of the code. (I know that in programming, there is an
entire camp of immutability advocates who preach that their approach is
the key remedy for errors.)

I would support both paradigms, especially since it doesn’t cost much.

Depends on the cost you mean. If you have "system with strong guarantees" and "system with no guarantees" interacting, then you have a system with no guarantees. Plus the cost of devs having to think about two different APIs, one of which is unit testable and one of which isn't, or at least not easily.

Do you have a concrete example of where the inconvenience of explicit context is sufficiently high to warrant an implicit global and all the impacts that has?

--Larry Garfield

4 months ago by Edmond Dantes — view source

unread

See, what you call "paternalistic" I say is "basic good usability."
Affordances are part of the design of everything. Good design means
making doing the

If we worry about "intuitive usability", we should ban caching, finite
state machines, and of course, concurrency.
Parallelism? Not just ban it, but burn those who use it at the stake of the
inquisition! :)

In this context, the child-parent model has a flaw that directly
contradicts intuitive usage.
Let me remind you of the main rule:

Default behavior: All child coroutines are canceled if the parent is
canceled.

Now, imagine a case where we need to create a coroutine not tied to the
parent.
To do this, we have to define a separate function or syntax.

Such a coroutine is created to perform an action that must be completed,
even if the parent coroutines are not fully executed.
Typically, this is a critical action, like logging or sending a
notification.

This leads to an issue:

Ordinary actions use a function that the programmer always remembers.
Important actions require a separate function, which the programmer might
forget.

This is the dark side of any strict design when exceptions exist (and they
almost always do).

And the problem is bigger than it seems because:

The parent coroutine is created in Function A.
The child coroutine is created in Function B.
These functions are in different modules, written by different
developers.

Developer A implements a unique algorithm that cancels coroutine execution.
This algorithm is logical and correct in the context of A.
Developer B simply forgets that execution might be interrupted.
And boom! We've just introduced a bug that will send the entire dev team on
a wild goose chase.

This is why the Go model (without parent-child links) is different:
It makes chaining coroutines harder.
But if you don’t need chains, it’s simpler.
And whether you need chains or not is a separate question.

Possible scenarios in PHP Scenario 1

We need to generate a report, where data must be collected from multiple
services.

We create one coroutine per service.
Wait for all of them to finish.
Generate the report.

Parent-child model is ideal:
If the parent coroutine is canceled, the child coroutines are
meaningless as well.

Scenario 2

Web server. The API receives a request to create a certificate. The
algorithm:

Check if we can do it, then create a DB record stating that the
user has a certificate.
Send a Job – notify other users who need to know about this event.
Return the certificate URL (a link with an ID).

Key requirement:

Heavy operations (longer than 2-3 seconds) should be performed in
a Job-Worker pool to keep the server responsive.
Notifications are sent as a separate Job in a separate coroutine,
which:
- Can retry sending twice if needed.
- Implements a fallback mechanism.
- Is NOT linked to the request coroutine.

Which scenario is more likely for PHP?

To quote someone on FP: "The benefit of functional programming is it
makes data flow explicit. The downside is it sometimes painfully explicit."

If there is a nesting of 10 functions where parameters are passed
explicitly, then the number of parameters in the top function will be equal
to the sum of the parameters of all other functions, and the overall code
coupling will be 100%. Parameters can be grouped into objects (structures),
thus reducing this problem. However, creating additional objects leads to
the temptation to shove a parameter into the first available object because
thinking about composition is a difficult task. This means that such an
approach either violates SOLID or increases design complexity. But usually,
the worst-case scenario happens: developers happily violate both SOLID and
design. :)

I think these principles are more suitable for areas where design planning
takes up 30-50% of the total development time and where such a time
distribution is rational in relation to the project's success. At the same
time, the initial requirements change extremely rarely. PHP operates under
completely different conditions: "it was needed yesterday" :)

As above, in simpler cases you can just make the context a boring old
function parameter,

What if a service wants to store specific data in the context?

As for directly passing the context into a function, the coroutine already
owns the context, and it can be retrieved from it. This is a consequence of
PHP having an abstraction that C/Rust lacks, allowing it to handle part of
the dirty work on behalf of the programmer. It's the same as when you use
$this when calling a method.

Do you have a concrete example of where the inconvenience of explicit
context is sufficiently high to warrant an implicit global and all the
impacts that has?

The refactoring issue. There are five levels of nesting. At the fifth
level, someone called an asynchronous function and created a context.
Thirty days later, someone wanted to call an asynchronous function at the
first level of nesting. And suddenly, it turns out that the context needs
to be explicitly passed. And that's where the fun begins. :)

Ed.

4 months ago by Rowan Tommins [IMSoP] — view source

unread

Now, imagine a case where we need to create a coroutine not tied to
the parent.
To do this, we have to define a separate function or syntax.

Such a coroutine is created to perform an action that must be completed,
even if the parent coroutines are not fully executed.
Typically, this is a critical action, like logging or sending a
notification.

This leads to an issue:

Ordinary actions use a function that the programmer always remembers.

Important actions require a separate function, which the programmer
might forget.

Let's assume we want to support this scenario; we could:

a) Throw away all automatic resource management, and make it the user's
responsibility to arrange for additional fibers to be cancelled when
their "parent" is cancelled
b) Create unmanaged fibers by default, but provide a simple mechanism to
"attach" to a child/parent
c) Provide automatic cleanup by default, but a simple mechanism to
"disown" a child/parent (similar to Unix processes)
d) Provide two separate-but-equal primitives for spawning coroutines,
"run as child", and "run as top-level"

Option (a) feels rather unappealing; it also implies that no "parent"
relationship is available for things like context data.

I think you agree that top-level fibers would be the less common case,
so (b) seems awkward as well.

Option (c) might look like this:

async {
    $child = asyncRun foo();
    $bgTask = asyncRun bar();
    $bgTask->detach();
}
// foo() guaranteed to be completed or cancelled, bar() continuing as an
independent fiber

Or maybe the detach would be inside bar(), e.g.
Fiber::getCurrent()->detach()

Option (d) might look like this:

async {
$child = asyncChild foo();
$bgTask = asyncDetached bar();
}
// foo() guaranteed to be completed or cancelled, bar() continuing as an
independent fiber

(all names and syntax picked for fast illustration, not an exact proposal)

--
Rowan Tommins
[IMSoP]

4 months ago by Edmond Dantes — view source

unread

Let's assume we want to support this scenario; we could:

Thank you, that's an accurate summary. I would focus on two options:

Creating child coroutines by default, but allowing unbound ones to
exist.
Explicitly creating child coroutines.

And in the RFC, I would leave the choice open to all participants.

In terms of syntax, it might look something like this (just thinking out
loud):


async {
    async child {


    }

}

or


async {
    async unbound {


    }

}

The pros and cons were described earlier and will be moved to a separate
RFC.

4 months ago by Rowan Tommins [IMSoP] — view source

unread

It is literally the same argument for "pass the DB connection into the constructor, don't call a static method to get it" or "pass in the current user object to the method, don't call a global function to get it." These are decades-old discussions with known solved problems, which all boil down to "pass things explicitly."

I think the counterargument to this is that you wouldn't inject a service that implemented a while loop, or if statement. I'm not even sure what mocking a control flow primitive would mean.

Similarly, we don't pass around objects representing the "try context" so that we can call "throw"as a method on them. I'm not aware of anybody complaining that they can't mock the throw statement as a consequence, or wanting to work with multiple "try contexts" at once and choose which one to throw into.

A lexically scoped async{} statement feels like it could work similarly: the language primitive for "run this code in a new fiber" (and I think it should be a primitive, not a function or method) would look up the stack for an open async{} block, and that would be the "nursery" of the new fiber. [You may not like that name, but it's a lot less ambiguous than "context", which is being used for at least two different things in this discussion.]

Arguably this is even needed to be "correct by construction" - if the user can pass around nurseries, they can create a child fiber that outlives its parent, or extend the lifetime of one nursery by storing a reference to it in a fiber owned by a different nursery. If all they can do is spawn a fiber in the currently active nursery, the child's lifetime guaranteed to be no longer than its parent, and that lifetime is defined rigidly in the source code.

Rowan Tommins
[IMSoP]

4 months ago by Larry Garfield — view source

unread

It is literally the same argument for "pass the DB connection into the constructor, don't call a static method to get it" or "pass in the current user object to the method, don't call a global function to get it." These are decades-old discussions with known solved problems, which all boil down to "pass things explicitly."

I think the counterargument to this is that you wouldn't inject a
service that implemented a while loop, or if statement. I'm not even
sure what mocking a control flow primitive would mean.

Similarly, we don't pass around objects representing the "try context"
so that we can call "throw"as a method on them. I'm not aware of
anybody complaining that they can't mock the throw statement as a
consequence, or wanting to work with multiple "try contexts" at once
and choose which one to throw into.

A lexically scoped async{} statement feels like it could work
similarly: the language primitive for "run this code in a new fiber"
(and I think it should be a primitive, not a function or method) would
look up the stack for an open async{} block, and that would be the
"nursery" of the new fiber. [You may not like that name, but it's a lot
less ambiguous than "context", which is being used for at least two
different things in this discussion.]

Arguably this is even needed to be "correct by construction" - if the
user can pass around nurseries, they can create a child fiber that
outlives its parent, or extend the lifetime of one nursery by storing a
reference to it in a fiber owned by a different nursery. If all they
can do is spawn a fiber in the currently active nursery, the child's
lifetime guaranteed to be no longer than its parent, and that lifetime
is defined rigidly in the source code.

Rowan Tommins
[IMSoP]

Since I think better in code, if using try-catch as a model, that would lead to something like:

function foo(int $x): int {
// if foo() is called inside an async block, this is non-blocking.
// if it's called outside an async block, it's blocking.
syslog(FUNCTION);
return 1;
}

function bar(int $x): int {
return $x + 1; // Just a boring function like always.
}

function baz(int $x): int {
// Because this is called here, baz() MUST only be called from
// inside a nested async block. Doing otherwise cause a fatal at runtime.
spawn foo($x);
}

async { // Starts a nursery
$res1 = spawn foo(5); // Spawns new Fiber that runs foo().
$res2 = spawn bar(3); // A second fiber.
$res3 = spawn baz(3); // A Third fiber.

// merge results somehow
return $combinedResult;
} // We block here until everything spawned inside this async block finishes.

spawn bar(3); // This is called outside of an async() block, so it just crashes the program (like an uncaught exception).

Is that what you're suggesting? If so, I'd have to think it through a bit more to see what guarantees that does[n't] provide. It might work. (I deliberately used spawn instead of "await" to avoid the mental association with JS async/await.) My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

Another point worth mentioning: I get the impression that there are two very different mental models of when/why one would use async that are floating around in this thread, which lead to two different sets of conclusions.

Async in the small: Like the reporting example, "fan out" a set of tasks, and bring them back together quickly before continuing in an otherwise mostly sync PHP-FPM process. All the data is still part of one user request, so we still have "shared nothing."
Async in the large: A long running server like Node.js, ReactPHP, etc. Multiplexing several user requests into one OS process via async on the IO points. Basically the entire application has a giant async {} wrapped around it.

Neither of these is a bad use case, and they're not mutually exclusive, but they do lead to different priorities. I freely admit my bias is towards Type 1, while it sounds like Edmond is coming from a Type 2 perspective.

Not a criticism, just flagging it as something that we should be aware of.

--Larry Garfield

4 months ago by Rowan Tommins [IMSoP] — view source

unread

Is that what you're suggesting? If so, I'd have to think it through a bit more to see what guarantees that does[n't] provide. It might work. (I deliberately used spawn instead of "await" to avoid the mental association with JS async/await.) My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

Yes, that's pretty much what was in my head. I freely admit I haven't
thought through the implications either.

My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

I think it's significantly less like coloured functions than passing
around a nursery object. You could almost take this:

async function foo($bar int, $baz string) {
spawn something_else();
}
spawn foo(42, 'hello');

As sugar for this:

function foo($bar int, $baz string, AsyncNursery $__nursery) {
$__nursery->spawn( something_else(...) );
}
$__nursery->spawn( fn($n) => foo(42, 'hello', $n) );

However you spell it, you've had to change the function's signature in
order to use async facilities in its body.

If the body can say "get current nursery", it can be called even if its
immediate caller has no knowledge of async code, as long as we have
some reasonable definition of "current".

--
Rowan Tommins
[IMSoP]

4 months ago by Rob Landers — view source

unread

Is that what you're suggesting? If so, I'd have to think it through a bit more to see what guarantees that does[n't] provide. It might work. (I deliberately used spawn instead of "await" to avoid the mental association with JS async/await.) My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

Yes, that's pretty much what was in my head. I freely admit I haven't
thought through the implications either.

My biggest issue is that this is starting to feel like colored functions, even if partially transparent.

I think it's significantly less like coloured functions than passing
around a nursery object. You could almost take this:

async function foo($bar int, $baz string) {
spawn something_else();
}
spawn foo(42, 'hello');

As sugar for this:

function foo($bar int, $baz string, AsyncNursery $__nursery) {
$__nursery->spawn( something_else(...) );
}
$__nursery->spawn( fn($n) => foo(42, 'hello', $n) );

However you spell it, you've had to change the function's signature in
order to use async facilities in its body.

If the body can say "get current nursery", it can be called even if its
immediate caller has no knowledge of async code, as long as we have
some reasonable definition of "current".

--
Rowan Tommins
[IMSoP]

The uncoloring of functions in PHP is probably one of the most annoying aspects of fibers, IMHO. It's hard to explain unless you've been using them awhile. But, with colored functions, the caller has control over when the result is waiting on -- it could be now, it could be in a totally different part of the program, or not at all. With fibers, the author of the function you are calling has control over when the result is waited on (and they don't have control over anything they call). This can create unpredictable issues when writing code where a specific part wrote some code thinking it had exclusive access to a property/variable. However, someone else changed one of the functions being called into an async function, making that assumption no longer true.

With colored functions, the person making changes also has to update all the places where it is called and can validate any assumptions are still going to be true; uncolored functions means they almost never do this. This results in more work for people implementing async, but more correct programs overall.

But back to the awaiting on results. Say I want to read 10 files:

for ($i = 0; $i < 10; $i++) $results[] = file_get_contents($file[$i]);

Right now, we have to read each file, one at a time, because this is synchronous. Even with this RFC and being in a fiber, the overall execution might be non-blocking, but the code still reads one file after another sequentially. Fibers do not change this.

With this RFC (in its original form), we will be able to change it so that we can run it asynchronously though and choose when to wait:

for($i = 0; $i < 10; $i++) $results[] = async\async(fn($f) => file_get_contents($f), $file[$i]);
// convert $results into futures somehow -- though actually doesn't look like it is possible.
$results = async\awaitAll($results);

In that example, we are deliberately starting to read all 10 files at the same time. If we had colored functions (aka, async/await) then changing file_get_contents to async would mean you have to change everywhere it is called too. That means I would see that file_get_contents is synchronous and be able to optimize it without having to even understand the reasoning (in most cases). I was a user of C# when this happened to C#, and it was a pain... So, at least with PHP fibers, this won't be AS painful, but you still have to do some work to take full advantage of them.

I kind of like the idea of a nursery for async, as we could then update file_get_content's return type to something like string|false|future<string|false>. In non-async, you have everything behave as normal, but inside a nursery, it returns a future that can be awaited however you want and is fully non-blocking. In other words, simply returning a future is enough for the engine to realize it should spawn a fiber (similar to how using yield works with generators).

In any case, I believe that a nursery requires the use of colored functions. That may be good or bad, but IMHO makes it much more useful and easier to write correct and fast code.

— Rob

4 months ago by Eugene Sidelnyk — view source

unread

The uncoloring of functions in PHP is probably one of the most annoying

aspects of fibers, IMHO. It's hard to explain unless you've been using them
awhile. But, with colored functions, the caller has control over when the
result is waiting on -- it could be now, it could be in a totally different
part of the program, or not at all. With fibers, the author of the function
you are calling has control over when the result is waited on (and they
don't have control over anything they call). This can create unpredictable
issues when writing code where a specific part wrote some code thinking it
had exclusive access to a property/variable. However, someone else changed
one of the functions being called into an async function, making that
assumption no longer true.

With colored functions, the person making changes also has to update all
the places where it is called and can validate any assumptions are still
going to be true; uncolored functions means they almost never do this. This
results in more work for people implementing async, but more correct
programs overall.

But back to the awaiting on results. Say I want to read 10 files:

for ($i = 0; $i < 10; $i++) $results[] = file_get_contents($file[$i]);

Right now, we have to read each file, one at a time, because this is
synchronous. Even with this RFC and being in a fiber, the overall execution
might be non-blocking, but the code still reads one file after another
sequentially. Fibers do not change this.

With this RFC (in its original form), we will be able to change it so that
we can run it asynchronously though and choose when to wait:

for($i = 0; $i < 10; $i++) $results[] = async\async(fn($f) =>
file_get_contents($f), $file[$i]);
// convert $results into futures somehow -- though actually doesn't look
like it is possible.
$results = async\awaitAll($results);

In that example, we are deliberately starting to read all 10 files at the
same time. If we had colored functions (aka, async/await) then changing
file_get_contents to async would mean you have to change everywhere it is
called too. That means I would see that file_get_contents is synchronous
and be able to optimize it without having to even understand the reasoning
(in most cases). I was a user of C# when this happened to C#, and it was a
pain... So, at least with PHP fibers, this won't be AS painful, but you
still have to do some work to take full advantage of them.

I kind of like the idea of a nursery for async, as we could then update
file_get_content's return type to something like
string|false|future<string|false>. In non-async, you have everything behave
as normal, but inside a nursery, it returns a future that can be awaited
however you want and is fully non-blocking. In other words, simply
returning a future is enough for the engine to realize it should spawn a
fiber (similar to how using yield works with generators).

In any case, I believe that a nursery requires the use of colored
functions. That may be good or bad, but IMHO makes it much more useful and
easier to write correct and fast code.

In my opinion, colored functions is the worst thing that could happen to
PHP.

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function
Describes quite expressively what's wrong about this approach.

As the result, you will make everything async.
Want a repository? It will be all async.
Want a logger? Also async.
Need to cache something? Make it async.

This is going to be a ton of changes, when currently sync (blue function)
will have to become async (red one).

The way amphp goes - it's the right way. They have had this problem of
red-blue functions a long ago until Fibers came into place.

What they used until third version is generator-based coroutines, so that
instead of returning actual object, you spoil the signature of the function
and return generator that will return that object (iow, "Promise").

This is just annoying, and IMO should not be considered.

4 months ago by Rob Landers — view source

unread

The uncoloring of functions in PHP is probably one of the most annoying aspects of fibers, IMHO. It's hard to explain unless you've been using them awhile. But, with colored functions, the caller has control over when the result is waiting on -- it could be now, it could be in a totally different part of the program, or not at all. With fibers, the author of the function you are calling has control over when the result is waited on (and they don't have control over anything they call). This can create unpredictable issues when writing code where a specific part wrote some code thinking it had exclusive access to a property/variable. However, someone else changed one of the functions being called into an async function, making that assumption no longer true.

With colored functions, the person making changes also has to update all the places where it is called and can validate any assumptions are still going to be true; uncolored functions means they almost never do this. This results in more work for people implementing async, but more correct programs overall.

But back to the awaiting on results. Say I want to read 10 files:

for ($i = 0; $i < 10; $i++) $results[] = file_get_contents($file[$i]);

Right now, we have to read each file, one at a time, because this is synchronous. Even with this RFC and being in a fiber, the overall execution might be non-blocking, but the code still reads one file after another sequentially. Fibers do not change this.

With this RFC (in its original form), we will be able to change it so that we can run it asynchronously though and choose when to wait:

for($i = 0; $i < 10; $i++) $results[] = async\async(fn($f) => file_get_contents($f), $file[$i]);
// convert $results into futures somehow -- though actually doesn't look like it is possible.
$results = async\awaitAll($results);

In that example, we are deliberately starting to read all 10 files at the same time. If we had colored functions (aka, async/await) then changing file_get_contents to async would mean you have to change everywhere it is called too. That means I would see that file_get_contents is synchronous and be able to optimize it without having to even understand the reasoning (in most cases). I was a user of C# when this happened to C#, and it was a pain... So, at least with PHP fibers, this won't be AS painful, but you still have to do some work to take full advantage of them.

I kind of like the idea of a nursery for async, as we could then update file_get_content's return type to something like string|false|future<string|false>. In non-async, you have everything behave as normal, but inside a nursery, it returns a future that can be awaited however you want and is fully non-blocking. In other words, simply returning a future is enough for the engine to realize it should spawn a fiber (similar to how using yield works with generators).

In any case, I believe that a nursery requires the use of colored functions. That may be good or bad, but IMHO makes it much more useful and easier to write correct and fast code.

In my opinion, colored functions is the worst thing that could happen to PHP.

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function
Describes quite expressively what's wrong about this approach.

As the result, you will make everything async.
Want a repository? It will be all async.
Want a logger? Also async.
Need to cache something? Make it async.

This is going to be a ton of changes, when currently sync (blue function) will have to become async (red one).

The way amphp goes - it's the right way. They have had this problem of red-blue functions a long ago until Fibers came into place.

What they used until third version is generator-based coroutines, so that instead of returning actual object, you spoil the signature of the function and return generator that will return that object (iow, "Promise").

This is just annoying, and IMO should not be considered.

My point in the email is that this happens anyway. With colored functions, you /always/ decide how to handle async. Which, as you mentioned, can be annoying. With uncolored functions, you /never/ get to decide unless you wrap it in a specific form (async\run or async\async, in this RFC), which ironically colors the function. I can't think of any way around it. My biggest issue with this RFC is that it results in multiple colors: FiberHandle, Future, and Resume.

— Rob

4 months ago by Eugene Sidelnyk — view source

unread

wrap it in a specific form (async\run or async\async, in this RFC), which
ironically colors the function.

It doesn't color the function.
The function is unchanged.

Any existing function in userland do not have to be changed in any way.
It's calls do not have to be rewritten into await, and all that stuff.

This same statement comes as well to all built-in functions like
file_get_contents, that already return needed data, rather than an promise
object whereof we could possibly fetch the data from.

4 months ago by Daniil Gentili — view source

unread

you //never// get to decide unless you wrap it in a specific form (async\run or async\async, in this RFC), which ironically colors the function.

You're wrong, it does not color the function: spawning a new fiber does not make the code running it async: it is always async, regardless of whether you use async() or not, but when not using async(), it's simply the only async execution flow running.

can't think of any way around it. My biggest issue with this RFC is that it results in multiple colors: FiberHandle, Future, and Resume.

You misunderstand again, colors are not objects or classes, colors are special and annoying keywords (red, blue, await) that must always be added to call functions of the same color (red, blue, await).

The FiberHandle is literally just a handle associated with the execution flow spawned with async(), it is not in any way associated with the spawned function, nor it is required in any way to invoke the spawned function.

Adding colors to functions makes async unnecessarily complex to use, for no good reason at all (no, forcing developers to explicitly know if a function is async or not is not a good enough reason, when I write code I want to get things done, when I want to parallelize execution of something I use go/async(), when I don't, I couldn't care less about what the function does inside, and I especially do not want to do a lot of hoop jumping to use it or use other common patterns like functional composition).

Again, take a look at how nicely golang handles concurrency with colorless functions: php fibers weren't the first to do it.

Regards,
Daniil Gentili.

4 months ago by Daniil Gentili — view source

unread

Again, take a look at how nicely golang handles concurrency with colorless functions: php fibers weren't the first to do it.

Also, a colored functions approach for php would make a future thread-based concurrency approach completely non-viable, because it would require marking ALL functions (not just IO-bound functions, CPU-bound ones as well) as async and forcing the use of await for ALL function calls, just to be able to sometimes use some functions in parallel (in separate threads).

Colored functions completely preclude a possible future thread-based implementation of concurrency.

Regards,
Daniil Gentili.

4 months ago by Edmond Dantes — view source

unread

Colored functions completely preclude a possible future thread-based
implementation of concurrency.

I can assure you that colored functions are neither part of this RFC nor any
future ones from me. And it's not because it's my decision it's rather the
language itself and the existing codebase that dictate the implementation.
Moreover, we already have extensive experience with Swoole, where
developers’ reactions are well known. And Swoole is essentially PHP +
coroutines.

I can recall the maintainer’s words from memory: it turned out that
developers don’t want to use special functions instead of standard ones.

I can confirm this from my own experience. For example, adapting a library
for RabbitMQ required changing only 10-20 lines, and overall, it worked
quickly without tests.

The true strength of any language is not in its syntax. The main strength
is its infrastructure.
Even if a language is poor in terms of development quality, if it has a
massive infrastructure, people will use it.

Ed.

4 months ago by Iliya Miroslavov Iliev — view source

unread

I will just put this picture here

Colored functions completely preclude a possible future thread-based
implementation of concurrency.

I can assure you that colored functions are neither part of this RFC nor any
future ones from me. And it's not because it's my decision it's rather
the language itself and the existing codebase that dictate the
implementation. Moreover, we already have extensive experience with Swoole,
where developers’ reactions are well known. And Swoole is essentially PHP +
coroutines.

I can recall the maintainer’s words from memory: it turned out that
developers don’t want to use special functions instead of standard ones.

I can confirm this from my own experience. For example, adapting a library
for RabbitMQ required changing only 10-20 lines, and overall, it worked
quickly without tests.

The true strength of any language is not in its syntax. The main strength
is its infrastructure.
Even if a language is poor in terms of development quality, if it has a
massive infrastructure, people will use it.

Ed.

--
Iliya Miroslavov Iliev
i.miroslavov@gmail.com

4 months ago by daniil@daniil.it — view source

unread

In my opinion, colored functions is the worst thing that could happen to PHP.

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function
Describes quite expressively what's wrong about this approach.

This is going to be a ton of changes, when currently sync (blue function) will have to become async (red one).

The way amphp goes - it's the right way. They have had this problem of red-blue functions a long ago until Fibers came into place.

This is just annoying, and IMO should not be considered.

+++++ on this, the discussion on this RFC is veering in a very annoying direction for absolutely no good reason.

Golang has shown and proven that we do not need colored functions to make use of (extremely simple to use) concurrency.

Please do not cripple the adoption of async php by making it colored and by adding absolutely useless async blocks, forcing everyone to rewrite their codebases for absolutely no good reason, when with the current colorless, fiber approach the only thing that's needed to adapt current codebases for async is the usage of channels and a few appropriately placed synchronization primitives (I know this from experience, migrating a large and complex codebase to be fully async).

The only thing that's truly needed in this RFC is a set of synchronization primitives like in golang, and a way to parent/unparent fibers in order to inherit cancellations (as previously mentioned in this list), not contexts, async blocks and colored functions.

Any issue around $_GET/etc superglobals (i.e. to handle each incoming request in a separate fiber) should be solved at the SAPI level with a separate RFC, not by introducing contexts and async blocks and making concurrency harder to use.

I like and use immutability, but it has it limits, it should not be used everywhere, and it should not be forced upon everyone just because someone is a strong proponent of it.

Regards,
Daniil Gentili.

4 months ago by Daniil Gentili — view source

unread

In my opinion, colored functions is the worst thing that could happen to PHP.

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function
Describes quite expressively what's wrong about this approach.

This is going to be a ton of changes, when currently sync (blue function) will have to become async (red one).

The way amphp goes - it's the right way. They have had this problem of red-blue functions a long ago until Fibers came into place.

This is just annoying, and IMO should not be considered.

+++++ on this, the discussion on this RFC is veering in a very annoying direction for absolutely no good reason.

Golang has shown and proven that we do not need colored functions to make use of (extremely simple to use) concurrency.

Please do not cripple the adoption of async php by making it colored and by adding absolutely useless async blocks, forcing everyone to rewrite their codebases for absolutely no good reason, when with the current colorless, fiber approach the only thing that's needed to adapt current codebases for async is the usage of channels and a few appropriately placed synchronization primitives (I know this from experience, migrating a large and complex codebase to be fully async).

The only thing that's truly needed in this RFC is a set of synchronization primitives like in golang, and a way to parent/unparent fibers in order to inherit cancellations (as previously mentioned in this list), not contexts, async blocks and colored functions.

Any issue around $_GET/etc superglobals (i.e. to handle each incoming request in a separate fiber) should be solved at the SAPI level with a separate RFC, not by introducing contexts and async blocks and making concurrency harder to use.

I like and use immutability, but it has it limits, it should not be used everywhere, and it should not be forced upon everyone just because someone is a strong proponent of it.

Regards,
Daniil Gentili.

(Resent again as the other email has deliverability issues on the list).

4 months ago by Rowan Tommins [IMSoP] — view source

unread

The only thing that's truly needed in this RFC is a set of synchronization primitives like in golang, and a way to parent/unparent fibers in order to inherit cancellations (as previously mentioned in this list), not contexts, async blocks and colored functions.

The async block as I'm picturing it has nothing to do with function colouring, it's about the outermost function in an async stack being able to say "make sure the scheduler is started" and "block here until all child fibers are either concluded, detached, or cancelled".

It's roughly equivalent to calling the RFC's Async\launchScheduler() more than once, but I imagine the later calls would not actually start a new scheduler, just track a group of fibers.

If we're building this into the language, we're not limited to expressing things with functions and objects, and a block syntax makes it trivial for the compiler to detect a mismatched start and end.

Regards,
Rowan Tommins
[IMSoP]

4 months ago by Iliya Miroslavov Iliev — view source

unread

I don't want to get involved, I'm giving just an opinion. If my wife wants
me to get a present for someone and she wants me to get it wrapped up in
either "green with orange ribbon" or "blue with red ribbon" (strictly) but
I have to buy them separately from a different shops. If I buy orange
ribbon from the shop that sells ribbons because it was open early and the
other one was closed yet and the shop that sells the wrapping paper tells
me they sell only blue, what should I do... kill myslef or go back and try
to change it?

On Sat, Mar 8, 2025 at 3:05 PM Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:

On 8 March 2025 10:44:35 GMT, Daniil Gentili daniil.gentili@gmail.com
wrote:

The only thing that's truly needed in this RFC is a set of
synchronization primitives like in golang, and a way to parent/unparent
fibers in order to inherit cancellations (as previously mentioned in this
list), not contexts, async blocks and colored functions.

The async block as I'm picturing it has nothing to do with function
colouring, it's about the outermost function in an async stack being able
to say "make sure the scheduler is started" and "block here until all child
fibers are either concluded, detached, or cancelled".

It's roughly equivalent to calling the RFC's Async\launchScheduler() more
than once, but I imagine the later calls would not actually start a new
scheduler, just track a group of fibers.

If we're building this into the language, we're not limited to expressing
things with functions and objects, and a block syntax makes it trivial for
the compiler to detect a mismatched start and end.

Regards,
Rowan Tommins
[IMSoP]

--
Iliya Miroslavov Iliev
i.miroslavov@gmail.com

4 months ago by Daniil Gentili — view source

unread

The async block as I'm picturing it has nothing to do with function colouring, it's about the outermost function in an async stack being able to say "make sure the scheduler is started" and "block here until all child fibers are either concluded, detached, or cancelled".

There's no need for such a construct, as the awaitAll function does precisely what you describe, without the need to introduce the concept of a child fiber and the excessive limitation of an async block that severely limits concurrency.

There is absolutely nothing wrong with the concept of a fiber without a parent, or a fiber that throws an exception (or a cancellation exception) out of the event loop.

A panic in a golang fiber surfaces out of the event loop (unless it is catched with a recover), just like an uncatched exception in a fiber surfaces out of the event loop: it makes no sense to severely limit concurrency with an async block just to handle the edge case of an uncaught exception (which can be handled anyway with an event loop exception handler).

In general, I really don't like the concept of an async block the way it is presented here, because it implies that concurrency is something bad that must be limited and controlled, or else bad stuff will happen, when in reality, a fiber throwing an exception (without anyone await()ing on the fiber handle, thus throwing out of the event loop) is not the end of the world, and can be handled by other means, without limiting concurrency.

Regards,
Daniil Gentili.

4 months ago by Rob Landers — view source

unread

The async block as I'm picturing it has nothing to do with function colouring, it's about the outermost function in an async stack being able to say "make sure the scheduler is started" and "block here until all child fibers are either concluded, detached, or cancelled".

There's no need for such a construct, as the awaitAll function does precisely what you describe, without the need to introduce the concept of a child fiber and the excessive limitation of an async block that severely limits concurrency.

There is absolutely nothing wrong with the concept of a fiber without a parent, or a fiber that throws an exception (or a cancellation exception) out of the event loop.

A panic in a golang fiber surfaces out of the event loop (unless it is catched with a recover), just like an uncatched exception in a fiber surfaces out of the event loop: it makes no sense to severely limit concurrency with an async block just to handle the edge case of an uncaught exception (which can be handled anyway with an event loop exception handler).

In general, I really don't like the concept of an async block the way it is presented here, because it implies that concurrency is something bad that must be limited and controlled, or else bad stuff will happen, when in reality, a fiber throwing an exception (without anyone await()ing on the fiber handle, thus throwing out of the event loop) is not the end of the world, and can be handled by other means, without limiting concurrency.

Regards,
Daniil Gentili.

As far as I can tell, the entire reason we are talking about this is because adding the event loop changes the behavior of existing code. So we cannot "just turn it on".

I haven't seen an explanation of why this is the case, but that's how we got to this point. We need some way to "opt in" to turning on the event loop.

— Rob

4 months ago by Daniil Gentili — view source

unread

As far as I can tell, the entire reason we are talking about this is because adding the event loop changes the behavior of existing code. So we cannot "just turn it on".

I haven't seen an explanation of why this is the case, but that's how we got to this point. We need some way to "opt in" to turning on the event loop.

This also seems like a very bad idea: there is no reason for the language hide concurrency behind an INI or even worse a compilation flag.

Existing code may not all be free from races, but the choice should be up to the user, not the hoster or whoever provides the php distribution.

Enabling concurrency by default will allow gradual addition of fiber/threadsafety of codebases, as developers will know that concurrency is a (very easy to use) option, and will hopefully want to prepare their codebases for it, and after that happens, it will be even easier for users to use it.
(And actually, this is already the case, as fibers were added in 8.1, limiting the userland scheduler makes no sense now that (thankfully!!) the cat is out of the ba).

Regards,
Daniil Gentili.

4 months ago by Daniil Gentili — view source

unread

As far as I can tell, the entire reason we are talking about this is because adding the event loop changes the behavior of existing code. So we cannot "just turn it on".

I haven't seen an explanation of why this is the case, but that's how we got to this point. We need some way to "opt in" to turning on the event loop.

This also seems like a very bad idea: there is no reason for the language hide concurrency behind an INI or even worse a compilation flag.

Existing code may not all be free from races, but the choice should be up to the user, not the hoster or whoever provides the php distribution.

Enabling concurrency by default will allow gradual addition of fiber/threadsafety of codebases, as developers will know that concurrency is a (very easy to use) option, and will hopefully want to prepare their codebases for it, and after that happens, it will be even easier for users to use it.

In other words, no one's forcing anyone to use async PHP: just because the language will provide a spawn/go keyword to spawn a new fiber, no one's forcing you to use it, you can keep using everything in a single thread, single fiber, no spawning.

Crippling async PHP with async blocks just because some libraries aren't ready for concurrency now, means crippling the future of async php.

Regards,
Daniil Gentili.

4 months ago by Edmond Dantes — view source

unread

Crippling async PHP with async blocks just because some libraries aren't
ready for concurrency now, means crippling the future of async php.

How can calling a single function have such a destructive impact on the
future of PHP?

Yes, you have to write 10-20 more characters than usual one time.
Yes, it’s a hack. Every programming language with history has hacks.
Hacks evoke negative emotions. But life does too :)

All that’s needed is to change a few lines of code in index.php to
initialize the application in an asynchronous context. It’s the same as
launching Swoole: you need a few lines of code to initialize the web
server.

Ed

4 months ago by Daniil Gentili — view source

unread

Crippling async PHP with async blocks just because some libraries aren't ready for concurrency now, means crippling the future of async php.

How can calling a single function have such a destructive impact on the future of PHP?

Very simple: to make an analogy, it's like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

The async {} block is a footgun that tries to meddle with what must be an internal implementation detail of the libraries you're using.

Even if they were optional, their presence in the language could lead library developers to reduce concurrency in order to allow calls from async blocks, (i.e. don't spawn any background fiber in a method call because it might be called from an async {} block) which is what I meant by crippling async PHP.

If the async {} block were to ignore referenced spawned fiber handles, it would still be just as bad: sometimes one really just needs to spawn a background fiber to do a one-off background task, without caring about the result.

I.e. the spawned logic may also contain a catch (\Throwable) block with error handling, making collection of references into an array to awaitAll in __destruct (just because someone might invoke the code from an async {} block!) pointless and an overcomplication.

Amphp's approach of an event loop exception handler is, I believe, the perfect uncaught exception handling solution.

(Also note that amphp also provides an escape hatch even for the exception handler: a Future::ignore() method that prevents uncaught and non-awaited exceptions from bubbling out into the exception handler).

Regards,
Daniil Gentili.

4 months ago by Larry Garfield — view source

unread

As far as I can tell, the entire reason we are talking about this is because adding the event loop changes the behavior of existing code. So we cannot "just turn it on".

I haven't seen an explanation of why this is the case, but that's how we got to this point. We need some way to "opt in" to turning on the event loop.

This also seems like a very bad idea: there is no reason for the
language hide concurrency behind an INI or even worse a compilation
flag.

This is beyond a strawman to the point of being a straw-pile. Literally no one has suggested "hiding concurrency behind an ini flag or compilation flag." Please confine your comments to those that have some basis is reality and in this thread.

--Larry Garfield

4 months ago by Edmond Dantes — view source

unread

This also seems like a very bad idea: there is no reason for the
language hide concurrency behind an INI or even worse a compilation flag.

This is not because someone wants it that way. This situation is solely due
to the fact that the Scheduler contradicts of Fiber.

The Scheduler expects to switch contexts as it sees fit.
Fiber expects context switching to occur only between the Fiber-parent
and its child.

Of course, the switching mechanism can be modified, and the logic of the
main context can also be changed. The problem is that, at a logical level,
these two approaches are mutually exclusive.

For example, the Swow project introduced a separate coroutine library
(libcat) and abandoned Fiber. But we cannot do the same.

Ed.

4 months ago by drealecs@gmail.com — view source

unread

Hi Edmond,

This situation is solely due to the fact that the Scheduler contradicts of
Fiber.

The Scheduler expects to switch contexts as it sees fit.

Fiber expects context switching to occur only between the
Fiber-parent and its child.

Can you please share a bit more details on how the Scheduler is
implemented, to make sure that I understand why this contradiction exists?
Also with some examples, if possible.

Reading the RFC initially, I though that the Scheduler is using fibers for
everything that runs. And that the Scheduler is the direct parent of all
the fibers that are started using it.
I understood that those fibers needs to be special ones and suspend with a
"Promise-like" object and resume when that is resolved.
You mean that when one of the fibers started by the Scheduler is starting
other fibers they would usually await for them to finish, and that is a
blocking operating that blocks also the Scheduler?
In that sense, any long running blocking operation is not compatible with
the Scheduler...

If you can please explain a bit more with some more details and examples,
it would be great.
Thanks!

--
Alex

4 months ago by Edmond Dantes — view source

unread

Good day, Alex.

Can you please share a bit more details on how the Scheduler is
implemented, to make sure that I understand why this contradiction exists?
Also with some examples, if possible.

$fiber1 = new Fiber(function () {
    echo "Fiber 1 starts\n";

    $fiber2 = new Fiber(function () use (&$fiber1) {
        echo "Fiber 2 starts\n";

        Fiber::suspend(); // Suspend the inner fiber
        echo "Fiber 2 resumes\n";

    });

});

Yes, of course, let's try to look at this in more detail.
Here is the classic code demonstrating how Fiber works. Fiber1 creates
Fiber2. When Fiber2 yields control, execution returns to Fiber1.

Now, let's try to do the same thing with Fiber3. Inside Fiber2, we create
Fiber3. Everything will work perfectly—Fiber3 will return control to Fiber2,
and Fiber2 will return it to Fiber1—this forms a hierarchy.

Now, imagine that we want to turn Fiber1 into a Scheduler while following
these rules.
To achieve this, we need to ensure that all Fiber instances are created
from the Scheduler, so that control can always be properly returned.


class Scheduler {
    private array $queue = [];

    public function add(callable $task) {
        $fiber = new Fiber($task);
        $this->queue[] = $fiber;
    }

    public function run() {
        while (!empty($this->queue)) {
            $fiber = array_shift($this->queue);

            if ($fiber->isSuspended()) {
                $fiber->resume($this);
            }
        }
    }

    public function yield() {
        $fiber = Fiber::getCurrent();
        if ($fiber) {
            $this->queue[] = $fiber;
            Fiber::suspend();
        }
    }
}

$scheduler = new Scheduler();

$scheduler->add(function (Scheduler $scheduler) {
    echo "Task 1 - Step 1\n";
    $scheduler->yield();
    echo "Task 1 - Step 2\n";
});

$scheduler->add(function (Scheduler $scheduler) {
    echo "Task 2 - Step 1\n";
    $scheduler->yield();
    echo "Task 2 - Step 2\n";
});

$scheduler->run();

So, to successfully switch between Fibers:

A Fiber must return control to the Scheduler.
The Scheduler selects the next Fiber from the queue and switches to
it.
That Fiber then returns control back to the Scheduler again.

This algorithm has one drawback: it requires two context switches instead
of one. We could switch FiberX to FiberY directly.

Breaking the contract not only disrupts the code in this RFC but also
affects Revolt's functionality. However, in the case of Revolt, you can
say: "If you use this library, follow the library's contracts and do not
use Fiber directly."

But PHP is not just a library, it's a language that must remain consistent
and cohesive.

Reading the RFC initially, I though that the Scheduler is using fibers
for everything that runs.

Exactly.

You mean that when one of the fibers started by the Scheduler is
starting other fibers they would usually await for them to finish, and that
is a blocking operating that blocks also the Scheduler?

When a Fiber from the Scheduler decides to create another Fiber and
then tries to call blocking functions inside it, control can no longer
return to the Scheduler from those functions.

Of course, it would be possible to track the state and disable the
concurrency mode flag when the user manually creates a Fiber. But… this
wouldn't lead to anything good. Not only would it complicate the code, but
it would also result in a mess with different behavior inside and outside
of Fiber.

This is even worse than calling startScheduler.

The hierarchical switching rule is a design flaw that happened
because a low-level
component was introduced into the language as part of the implementation
of a higher-level component. However, the high-level component is in
User-land, while the low-level component is in PHP core.

It's the same as implementing $this in OOP but requiring it to be
explicitly passed in every method. This would lead to inconsistent behavior.

So, this situation needs to be resolved one way or another.

--

Ed

4 months ago by Rob Landers — view source

unread

Good day, Alex.

Can you please share a bit more details on how the Scheduler is implemented, to make sure that I understand why this contradiction exists? Also with some examples, if possible.
$fiber1 = new Fiber(function () {
    echo "Fiber 1 starts\n";

    $fiber2 = new Fiber(function () use (&$fiber1) {
        echo "Fiber 2 starts\n";

        Fiber::suspend(); // Suspend the inner fiber
        echo "Fiber 2 resumes\n";

    });

});
Yes, of course, let's try to look at this in more detail.
Here is the classic code demonstrating how Fiber works. Fiber1 creates Fiber2. When Fiber2 yields control, execution returns to Fiber1.

Now, let's try to do the same thing with Fiber3. Inside Fiber2, we create Fiber3. Everything will work perfectly—Fiber3 will return control to Fiber2, and Fiber2 will return it to Fiber1—this forms a hierarchy.

Now, imagine that we want to turn Fiber1 into a Scheduler while following these rules.
To achieve this, we need to ensure that all Fiber instances are created from the Scheduler, so that control can always be properly returned.
class Scheduler {
    private array $queue = [];

    public function add(callable $task) {
        $fiber = new Fiber($task);
        $this->queue[] = $fiber;
    }

    public function run() {
        while (!empty($this->queue)) {
            $fiber = array_shift($this->queue);

            if ($fiber->isSuspended()) {
                $fiber->resume($this);
            }
        }
    }

    public function yield() {
        $fiber = Fiber::getCurrent();
        if ($fiber) {
            $this->queue[] = $fiber;
            Fiber::suspend();
        }
    }
}

$scheduler = new Scheduler();

$scheduler->add(function (Scheduler $scheduler) {
    echo "Task 1 - Step 1\n";
    $scheduler->yield();
    echo "Task 1 - Step 2\n";
});

$scheduler->add(function (Scheduler $scheduler) {
    echo "Task 2 - Step 1\n";
    $scheduler->yield();
    echo "Task 2 - Step 2\n";
});

$scheduler->run();
So, to successfully switch between Fibers:

A Fiber must return control to the Scheduler.

The Scheduler selects the next Fiber from the queue and switches to it.

That Fiber then returns control back to the Scheduler again.

This algorithm has one drawback: it requires two context switches instead of one. We could switch FiberX to FiberY directly.

Breaking the contract not only disrupts the code in this RFC but also affects Revolt's functionality. However, in the case of Revolt, you can say: "If you use this library, follow the library's contracts and do not use Fiber directly."

But PHP is not just a library, it's a language that must remain consistent and cohesive.

Reading the RFC initially, I though that the Scheduler is using fibers for everything that runs.

Exactly.

You mean that when one of the fibers started by the Scheduler is starting other fibers they would usually await for them to finish, and that is a blocking operating that blocks also the Scheduler?

When a Fiber from the Scheduler decides to create another Fiber and then tries to call blocking functions inside it, control can no longer return to the Scheduler from those functions.

Of course, it would be possible to track the state and disable the concurrency mode flag when the user manually creates a Fiber. But… this wouldn't lead to anything good. Not only would it complicate the code, but it would also result in a mess with different behavior inside and outside of Fiber.

This is even worse than calling startScheduler.

The hierarchical switching rule is a design flaw that happened because a low-level component was introduced into the language as part of the implementation of a higher-level component. However, the high-level component is in User-land, while the low-level component is in PHP core.

It's the same as implementing $this in OOP but requiring it to be explicitly passed in every method. This would lead to inconsistent behavior.

So, this situation needs to be resolved one way or another.

--

Ed

Hi Ed,

If I remember correctly, the original implementation of Fibers were built in such a way that extensions could create their own fiber types that were distinct from fibers but reused the context switch code.

From the original RFC:

An extension may still optionally provide their own custom fiber implementation, but an internal API would allow the extension to use the fiber implementation provided by PHP.

Maybe, we could create a different version of fibers ("managed fibers", maybe?) distinct from the current implementation, with the idea to deprecate them in PHP 10? Then, at least, the scheduler could always be running. If you are using existing code that uses fibers, you can't use the new fibers but it will "just work" if you aren't using the new fibers (since the scheduler will never pick up those fibers).

Something to think about.

— Rob

4 months ago by Edmond Dantes — view source

unread

Maybe, we could create a different version of fibers ("managed fibers",
maybe?) distinct from the current implementation, with the idea to
deprecate them in PHP 10?
Then, at least, the scheduler could always be running. If you are using
existing code that
uses fibers, you can't use the new fibers but it will "just work" if you
aren't using the new fibers (since the scheduler will never pick up those
fibers).

Yes, that can be done. It would be good to maintain compatibility with
XDEBUG, but that needs to be investigated.

During our discussion, everything seems to be converging on the idea that
the changes introduced by the RFC into Fiber would be better moved to a
separate class. This would reduce confusion between the old and new
solutions. That way, developers wouldn't wonder why Fiber and coroutines
behave differently—they are simply different classes.

The new Coroutine class could have a different interface with new logic.
This sounds like an excellent solution.

The interface could look like this:

suspend (or another clear name) – a method that explicitly hands
over execution to the Scheduler.
defer – a handler that is called when the coroutine completes.
cancel – a method to cancel the coroutine.
context – a property that stores the execution context.
parent (public property or getParent() method) – returns the parent
coroutine.

(Just an example for now.)

The Scheduler would be activated automatically when a coroutine is
created. If the index.php script reaches the end, the interpreter would
wait for the Scheduler to finish its work under the hood.

Do you like this approach?

Ed.

4 months ago by drealecs@gmail.com — view source

unread

When a Fiber from the Scheduler decides to create another Fiber and
then tries to call blocking functions inside it, control can no longer
return to the Scheduler from those functions.

Of course, it would be possible to track the state and disable the
concurrency mode flag when the user manually creates a Fiber. But… this
wouldn't lead to anything good. Not only would it complicate the code, but
it would also result in a mess with different behavior inside and outside
of Fiber.

Thank you for explaining the problem space.
Now let's see what solutions we can find.

First of all, I think it would be better for the language to assume the
Scheduler is always running and not have to be manually started.

An idea that I have for now:
Have a different method Fiber::suspendToScheduler(Resume $resume) that
would return the control to the Scheduler. And this one would be used by
all internal functions that does blocking operations, and maybe also user
land ones if they need to. Of course, the name can be better, like
Fiber::await.

Maybe that is what we need: to be able to return control both to the parent
fiber for custom logic that might be needed, and to the Scheduler so that
the language would be concurrent.

As for userland event loops, like Revolt, I am not so sure they fit with
the new language level async model.
But I can see how they could implement a different Event loop that would
run only one "loop", schedule a deffered callback and pass control to the
Scheduler (that would return the control in the next iteration to perform
one more loop, and so on.

--
Alex

4 months ago by Edmond Dantes — view source

unread

Have a different method Fiber::suspendToScheduler(Resume $resume) that
would return the control to the Scheduler.

That's exactly how it works. The RFC includes the method Async\wait()
(Fiber::await() is nice), which hands control over to the Scheduler.
At the PHP core level, there is an equivalent method used by all blocking
functions. In other words, Fiber::suspend is not needed; instead, the
Scheduler API is used.

The only question is backward compatibility. If, for example, it is agreed
that the necessary changes will be made in Revolt when this feature is
released and we do not support the old behavior, then there is no problem.

Maybe that is what we need: to be able to return control both to the
parent fiber for custom logic that might be needed, and to the Scheduler so
that the language would be concurrent.

100% yes.

As for userland event loops, like Revolt, I am not so sure they fit with
the new language level async model.

Revolt can be adapted to this RFC by modifying the Driver module. I
actually reviewed its code again today to assess the complexity of this
change. It looks like it shouldn’t be difficult at all.

The only problem arises with the code that has already been written and is
publicly available. I know that the AMPHP stack is in use, so we need a
flow that ensures a smooth transition.

As I understand it, you believe that it’s better to introduce more radical
changes and not be afraid of breaking old code. In that case, there are no
questions at all.

4 months ago by Iliya Miroslavov Iliev — view source

unread

One person observes 3 persons and he is curious what they are doing because
it looks strange.
The first person digs a hole.
The second person buries the hole.
The third person waters the buried hole.
He asks them: Why are you doing this?
They say: There is a fourth person that plants the seeds but he is sick
right now.

On Sat, Mar 8, 2025 at 3:41 PM Daniil Gentili daniil.gentili@gmail.com
wrote:

The async block as I'm picturing it has nothing to do with function
colouring, it's about the outermost function in an async stack being able
to say "make sure the scheduler is started" and "block here until all child
fibers are either concluded, detached, or cancelled".

There's no need for such a construct, as the awaitAll function does
precisely what you describe, without the need to introduce the concept of a
child fiber and the excessive limitation of an async block that severely
limits concurrency.

There is absolutely nothing wrong with the concept of a fiber without a
parent, or a fiber that throws an exception (or a cancellation exception)
out of the event loop.

A panic in a golang fiber surfaces out of the event loop (unless it is
catched with a recover), just like an uncatched exception in a fiber
surfaces out of the event loop: it makes no sense to severely limit
concurrency with an async block just to handle the edge case of an uncaught
exception (which can be handled anyway with an event loop exception
handler).

In general, I really don't like the concept of an async block the way it
is presented here, because it implies that concurrency is something bad
that must be limited and controlled, or else bad stuff will happen, when in
reality, a fiber throwing an exception (without anyone await()ing on the
fiber handle, thus throwing out of the event loop) is not the end of the
world, and can be handled by other means, without limiting concurrency.

Regards,
Daniil Gentili.

--
Iliya Miroslavov Iliev
i.miroslavov@gmail.com

4 months ago by Rowan Tommins [IMSoP] — view source

unread

The async block as I'm picturing it has nothing to do with function colouring, it's about the outermost function in an async stack being able to say "make sure the scheduler is started" and "block here until all child fibers are either concluded, detached, or cancelled".

There's no need for such a construct, as the awaitAll function does precisely what you describe, without the need to introduce the concept of a child fiber and the excessive limitation of an async block that severely limits concurrency.

No, it's not quite that either. The scenario I have in mind is a web / network server spawning a fiber for each request, and wanting to know when everything related to that request is finished, so that it can manage resources.

If we think of memory management as an analogy, awaitAll would be equivalent to keeping track of all your memory pointers, and making sure to pass them all to free before the end of the request. The construct we're discussing is like a garbage collection checkpoint, that ensures all memory allocated within that request has been freed, even if it wasn't tracked anywhere.

Written in ugly functions rather than concise and fail-safe syntax, it's something like:

$managedScope = new ManagedScope;
$previousScope = set_managed_scope( $managedScope );

spawn handle_request(); // inside here any number of fibers might be spawned

$managedScope->awaitAllChildFibers(); // we don't have the list of fibers here, so we can't use a plain awaitAll

set_managed_scope( $previousScope );
unset($managedScope);

It's certainly worth discussing whether this should be mandatory, default with an easy opt-out, or an equal-footing alternative to go-style unmanaged coroutines. But the idea of automatically cleaning up resources at the end of a task (e.g. an incoming request) is not new, and nor is arranging tasks in a tree structure.

I would also note that the concept of parent and child fibers is also useful for other proposed features, such as cascading cancellations, and having environment-variable style inherited context data. None of those is essential, but unless there are major implementation concerns, they seem like useful features to offer the user.

Rowan Tommins
[IMSoP]

4 months ago by Daniil Gentili — view source

unread

The async block as I'm picturing it has nothing to do with function colouring, it's about the outermost function in an async stack being able to say "make sure the scheduler is started" and "block here until all child fibers are either concluded, detached, or cancelled".

There's no need for such a construct, as the awaitAll function does precisely what you describe, without the need to introduce the concept of a child fiber and the excessive limitation of an async block that severely limits concurrency.

No, it's not quite that either. The scenario I have in mind is a web / network server spawning a fiber for each request, and wanting to know when everything related to that request is finished, so that it can manage resources.

If we think of memory management as an analogy, awaitAll would be equivalent to keeping track of all your memory pointers, and making sure to pass them all to free before the end of the request. The construct we're discussing is like a garbage collection checkpoint, that ensures all memory allocated within that request has been freed, even if it wasn't tracked anywhere.

Written in ugly functions rather than concise and fail-safe syntax, it's something like:

$managedScope = new ManagedScope;
$previousScope = set_managed_scope( $managedScope );

spawn handle_request(); // inside here any number of fibers might be spawned

$managedScope->awaitAllChildFibers(); // we don't have the list of fibers here, so we can't use a plain awaitAll

set_managed_scope( $previousScope );
unset($managedScope);

It's certainly worth discussing whether this should be mandatory, default with an easy opt-out, or an equal-footing alternative to go-style unmanaged coroutines. But the idea of automatically cleaning up resources at the end of a task (e.g. an incoming request) is not new, and nor is arranging tasks in a tree structure.

I still strongly disagree with the concept of this construct.

When we spawn an async function, we care only about its result: all side effects (including spawned fibers I.e. to handle&cache incoming events from sockets, etc..) should not interest us, and eventual cleanup should be handled trasparently by the library we are invoking (i.e. very simply by running awaitAll in a __destruct, according to the library's overall lifetime and logic).

I don't think the language should offer a construct that essentially makes sure that an async function or method may not spawn background fibers.

This makes no sense, in a way it's offering a tool to meddle with the internal implementation details of any async library, that can prevent libraries from using any background fibers.

To make an analogy, it's like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

Libraries can and should handle cleanup of running fibers by themselves, on their own terms, without externally imposed limitations.

I would also note that the concept of parent and child fibers is also useful for other proposed features, such as cascading cancellations, and having environment-variable style inherited context data.

Yes, parenting does make sense for some usecases (indeed I already previously proposed parenting just for cancellations), just not to offer a footgun that explicitly limits concurrency.

Regards,
Daniil Gentili.

4 months ago by Rowan Tommins [IMSoP] — view source

unread

To make an analogy, it's like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

Traditional PHP offers exactly this: the SAPI lifecycle tracks all file handles opened within a request, and closes them cleanly before reusing the thread or process for another request. Essentially what I'm proposing is a way to implement the same isolation in userland, by marking a checkpoint in the code.

As I've said repeatedly, it doesn't necessarily need to be a mandatory restriction, it can be a feature to help users write code without having to worry about accidentally leaving a background fiber running.

Rowan Tommins
[IMSoP]

4 months ago by Daniil Gentili — view source

unread

To make an analogy, it's like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

Traditional PHP offers exactly this: the SAPI lifecycle tracks all file handles opened within a request, and closes them cleanly before reusing the thread or process for another request. Essentially what I'm proposing is a way to implement the same isolation in userland, by marking a checkpoint in the code.

Exposing this in userland offers an extremely dangerous footgun that will severely limit concurrency.

As I've said repeatedly, it doesn't necessarily need to be a mandatory restriction, it can be a feature to help users write code without having to worry about accidentally leaving a background fiber running.

Even its use is optional, its presence in the language could lead library developers to reduce concurrency in order to allow calls from async blocks, (i.e. don't spawn any background fiber in a method call because it might be called from an async {} block) which is what I meant by crippling async PHP.

Libraries can and should handle cleanup of running fibers by themselves, on their own terms, without externally imposed limitations.

It makes absolutely no sense, especially for a SAPI, to force all background fibers to stop after a request is finished.

It would force users to stop and restart all running fibers on each request, which is precisely the main argument for the use of worker mode: reducing overhead by keeping caches primed, sockets open and background loops running.

PHP itself explicitly offers an escape hatch around the "io {} block" of current SAPIs, in the form of persistent resources (and again, this is all for performance reasons).

Even ignoring performance considerations, as I said many times, offering this tool to userland is a major footgun that will either backfire spectacularly (breaking existing and new async libraries by endlessly awaiting upon background fibers when exiting an async {} block haphazardly used by a newbie, or even worse force library developers to reduce concurrency, killing async PHP just because users can use async {} blocks), or simply not get used at all (because the main SAPI usecase listed explicitly does NOT need purity).

Regards,
Daniil Gentili.

4 months ago by Derick Rethans — view source

unread

Hi,

To see to be posting a reply to nearly every other email on this thread. I'd recommend you have another read through our mailing list rules: https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md

cheers
Derick

To make an analogy, it's like saying PHP should have an io {} block, that makes sure all file resources opened within (even internally, 10 stack levels deep into 3 libraries, whose instances are all used after the io {} block) are closed when exiting.

Traditional PHP offers exactly this: the SAPI lifecycle tracks all file handles opened within a request, and closes them cleanly before reusing the thread or process for another request. Essentially what I'm proposing is a way to implement the same isolation in userland, by marking a checkpoint in the code.

Exposing this in userland offers an extremely dangerous footgun that will severely limit concurrency.

As I've said repeatedly, it doesn't necessarily need to be a mandatory restriction, it can be a feature to help users write code without having to worry about accidentally leaving a background fiber running.

Even its use is optional, its presence in the language could lead library developers to reduce concurrency in order to allow calls from async blocks, (i.e. don't spawn any background fiber in a method call because it might be called from an async {} block) which is what I meant by crippling async PHP.

Libraries can and should handle cleanup of running fibers by themselves, on their own terms, without externally imposed limitations.

It makes absolutely no sense, especially for a SAPI, to force all background fibers to stop after a request is finished.

It would force users to stop and restart all running fibers on each request, which is precisely the main argument for the use of worker mode: reducing overhead by keeping caches primed, sockets open and background loops running.

PHP itself explicitly offers an escape hatch around the "io {} block" of current SAPIs, in the form of persistent resources (and again, this is all for performance reasons).

Even ignoring performance considerations, as I said many times, offering this tool to userland is a major footgun that will either backfire spectacularly (breaking existing and new async libraries by endlessly awaiting upon background fibers when exiting an async {} block haphazardly used by a newbie, or even worse force library developers to reduce concurrency, killing async PHP just because users can use async {} blocks), or simply not get used at all (because the main SAPI usecase listed explicitly does NOT need purity).

Regards,
Daniil Gentili.

4 months ago by Daniil Gentili — view source

unread

Offering this tool to userland is a major footgun that will either backfire spectacularly (breaking existing and new async libraries by endlessly awaiting upon background fibers when exiting an async {} block haphazardly used by a newbie, or even worse force library developers to reduce concurrency, killing async PHP just because users can use async {} blocks), or simply not get used at all (because the main SAPI usecase listed explicitly does NOT need purity).

Some extra points:

The naming of "async {}" is also very misleading, as it does the opposite of making things async, if anything it should be called "wait_all {}"
Again, what are waiting for? A fiber spawned by a library we called 10 levels deep in the stack, that exits only when the container object is destroyed (outside of the wait_all block, thus causing an endless hang)? No one should care or be able to control what must remain an internal implementation detail of invoked libraries, adding a wait_all block will only break stuff.
If we did want to wait for all fibers spawned by a method call, nothing is preventing the caller from returning an array of futures for spawned fibers that we can await.

The wait_all block is EXPLICITLY DESIGNED to meddle with the internals of async libraries, because the only feature it offers (that isn't already offered by awaitAll) is one that controls internal implementation details of libraries invoked within the block.

Libraries can full well handle cleanup of fibers in __destruct by themselves, without a wait_all block forcing them to reduce concurrency whenever the caller pleases.

It is, imo, a MAJOR FOOTGUN, and should not be even considered for implementation.

Regards,
Daniil Gentili.

4 months ago by Edmond Dantes — view source

unread

The wait_all block is EXPLICITLY DESIGNED to meddle with the internals of
async libraries,

How exactly does it interfere with the implementation of asynchronous
libraries?
Especially considering that these libraries operate at the User-land level?
It’s a contract. No more. No less.

Libraries can full well handle cleanup of fibers in __destruct by
themselves, without a wait_all block forcing them to reduce concurrency
whenever the caller pleases.

Fiber is a final class, so there can be no destructors here. Even if you
create a "Coroutine" class and allow defining a destructor, the result will
be overly verbose code. I and many other developers have tested this.
And the creators of AMPHP did not take this approach. Go doesn’t have it
either. This is not a coincidence.

It is, imo, a MAJOR FOOTGUN, and should not be even considered for
implementation.

Why exactly is this a FOOTGUN?

Does this block lead to new violations of language integrity?
Does this block increase the likelihood of errors?

A FOOTGUN is something that significantly breaks the language and pushes
developers toward writing bad code. This is a rather serious flaw.

4 months ago by Daniil Gentili — view source

unread

The wait_all block is EXPLICITLY DESIGNED to meddle with the internals of async libraries,

How exactly does it interfere with the implementation of asynchronous libraries?
Especially considering that these libraries operate at the User-land level? It’s a contract. No more. No less.

When you have a construct that is forcing all code within it to to terminate all running fibers.

If any library invoked within a wait_all block suddenly decides to spawn a long-running fiber that is not stopped when exiting the block, but for example later, when the library itself decides to, the wait_all block will not exit, essentially forcing the library user or developer to mess with the internal and forcefully terminate the background fiber.

The choice should never be up to the caller, and the presence of the wait_all block gives any caller the option to break the internal logic of libraries.

I can give you several examples where such logic is used in Amphp libraries, and it will break if they are invoked within an async block.

Libraries can full well handle cleanup of fibers in __destruct by themselves, without a wait_all block forcing them to reduce concurrency whenever the caller pleases.

Fiber is a final class, so there can be no destructors here. Even if you create a "Coroutine" class and allow defining a destructor, the result will be overly verbose code. I and many other developers have tested this.

You misunderstand: this is about storing the FiberHandles of spawned fibers and awaiting them in the __destruct of an object (the same object that spawned them in a method), in order to make sure all spawned fibers are awaited and all unhandled exceptions are handled somewhere (in absence of an event loop error handler).
Also see my discussion about ignoring referenced futures: https://externals.io/message/126537#126661

It is, imo, a MAJOR FOOTGUN, and should not be even considered for implementation.

Why exactly is this a FOOTGUN?

Does this block lead to new violations of language integrity?

Does this block increase the likelihood of errors?

Yes, because it gives users tools to mess with the internal behavior of userland libraries
Yes, because (especially given how it's named) accidental usage will break existing and new async libraries by endlessly awaiting upon background fibers when exiting an async {} block haphazardly used by a newbie when calling most async libraries, or even worse force library developers to reduce concurrency, killing async PHP just because users can use async {} blocks.

A FOOTGUN is something that significantly breaks the language and pushes developers toward writing bad code. This is a rather serious flaw.

Indeed, this is precisely the case.

As the maintainer of Psalm, among others, I fully understand the benefits of purity and immutability: however, this keyword is a toy exercise in purity, with no real usecases (all real usecases being already covered by awaitAll), which cannot work in the real world in current codebases and will break real-world applications if used, with consequences on the ecosystem.

I don't know what else to say on the topic, I feel like I've made myself clear on the matter: if you still feel like it's a good idea and it should be added to the RFC as a separate poll, I can only hope that the majority will see the danger of adding such a useless keyword and vote against on that specific matter.

Regards,
Daniil Gentili.

4 months ago by Edmond Dantes — view source

unread

I can give you several examples where such logic is used in Amphp
libraries, and it will break if they are invoked within an async block.

Got it, it looks like I misunderstood the post due to my focus. So,
essentially, you're talking not so much about wait_all itself, but rather
about the parent-child vs. free model.

This question is what concerns me the most right now.

If you have real examples of how this can cause problems, I would really
appreciate it if you could share them. Code is the best criterion of truth.

You misunderstand:

Yes, I misunderstood. It would be interesting to see the code with the
destructor to analyze this approach better.

Let me summarize the current state for today:

I am abandoning startScheduler and the idea of preserving backward
compatibility with await_all or anything else in that category. The
scheduler will be initialized implicitly, and this does not concern
user-land. Consequently, the spawn function() code will work everywhere
and always.
2.

I will not base the implementation on Fiber (perhaps only on the
low-level part). Instead of Fiber, there will be a separate class. There
will be no changes to Fiber at all. This decision follows the principle
of Win32 COM/DCOM: old interfaces should never be changed. If an old
interface needs modification, it should be given a new name. This should
have been done from the start.
3.

I am abandoning low-level objects in PHP-land (FiberHandle, SocketHandle
etc). Over time, no one has voted for them, which means they are
unnecessary. There might be a low-level interface for compatibility with
Revolt.
4.

 It might be worth restricting microtasks in PHP-land and keeping them

only for C code. This would simplify the interface, but we need to ensure
that it doesn’t cause any issues.

The remaining question on the agenda: deciding which model to choose —
parent-child or the Go-style model.

Thanks

Ed

4 months ago by Larry Garfield — view source

unread

Let me summarize the current state for today:

I am abandoning startScheduler and the idea of preserving
backward compatibility with await_all or anything else in that
category. The scheduler will be initialized implicitly, and this does
not concern user-land. Consequently, the spawn function() code will
work everywhere and always.

I will not base the implementation on Fiber (perhaps only on the
low-level part). Instead of Fiber, there will be a separate class.
There will be no changes to Fiber at all. This decision follows the
principle of Win32 COM/DCOM: old interfaces should never be changed. If
an old interface needs modification, it should be given a new name.
This should have been done from the start.

I am abandoning low-level objects in PHP-land (FiberHandle,
SocketHandle etc). Over time, no one has voted for them, which means
they are unnecessary. There might be a low-level interface for
compatibility with Revolt.

It might be worth restricting microtasks in PHP-land and keeping
them only for C code. This would simplify the interface, but we need to
ensure that it doesn’t cause any issues.

The remaining question on the agenda: deciding which model to choose —
parent-child or the Go-style model.

As noted, I am in broad agreement with the previously linked article on "playpens" (even if I hate that name), that the "go style model" is too analogous to goto statements.

Basically, this is asking "so do we use gotos or for loops?" For which the answer is, I hope obviously, for loops.

Offering both, frankly, undermines the whole point of having structured, predictable concurrency. The entire goal of that is to be able to know if there's some stray fiber running off in the background somewhere still doing who knows what, manipulating shared data, keeping references to objects, and other nefarious things. With a nursery, you don't have that problem... but only if you remove goto. A language with both a for loop and an arbitrary goto statement gets basically no systemic benefit from having the for loop, because neither developers nor compilers get any guarantees of what will or won't happen.

Especially when, as demonstrated, the "this can run in the background and I don't care about the result" use case can be solved more elegantly with nested blocks and channels, and in a way that, in practice, would probably get subsumed into DI Containers eventually so most devs don't have to worry about it.

Of interesting note along similar lines would be Rust, and... PHP.

Rust's whole thing is memory safety. The language simply will not let you write memory-unsafe code, even if it means the code is a bit more verbose as a result. In exchange for the borrow checker, you get enough memory guarantees to write extremely safe parallel code. However, the designers acknowledge that occasionally you do need to turn off the checker and do something manually... in very edge-y cases in very small blocks set off with the keyword "unsafe". Viz, "I know what I'm doing is stupid, but trust me." The discouragement of doing so is built into the language, and tooling, and culture.

PHP... has a goto operator. It was added late, kind of as a joke, but it's there. However, it is not a full goto. It can only jump within the current function, and only "up" control structures. It's basically a named break. While it only rarely has value, it's not al that harmful unless you do something really dumb with it. And then it's only harmful within the scope of the function that uses it. And, very very rarely, there's some micro-optimization to be had. (cf, this classic: https://github.com/igorw/retry/issues/3). But PHP has survived quite well for 30 years without an arbitrary goto statement.

So if we start from a playpen-like, structured concurrency assumption, which (as demonstrated) gives us much more robust code that is easier to follow and still covers nearly all use cases, there's two questions to answer:

Is there still a need for an "unsafe {}" block or in-function goto equivalent?
If so, what would that look like?

I am not convinced of 1 yet, honestly. But if it really is needed, we should be targeting the least-uncontrolled option possible to allow for those edge cases. A quick-n-easy "I'mma violate the structured concurrency guarantees, k?" undermines the entire purpose of structured concurrency.

During our discussion, everything seems to be converging on the idea
that the changes introduced by the RFC into Fiber would be better
moved to a separate class. This would reduce confusion between the old
and new solutions. That way, developers wouldn't wonder why Fiber and
coroutines behave differently—they are simply different classes.
The new Coroutine class could have a different interface with new
logic. This sounds like an excellent solution.

The interface could look like this:

• suspend (or another clear name) – a method that explicitly hands
over execution to the Scheduler.
• defer – a handler that is called when the coroutine completes.
• cancel – a method to cancel the coroutine.
• context – a property that stores the execution context.
• parent (public property or getParent() method) – returns the
parent coroutine.
(Just an example for now.)

The Scheduler would be activated automatically when a coroutine is
created. If the index.php script reaches the end, the interpreter
would wait for the Scheduler to finish its work under the hood.

Do you like this approach?

That API is essentially what I was calling "AsyncContext" before. I am flexible on the name, as long as it is descriptive and gives the user the right mental model. :-) (I'm not sure if Coroutine would be the right name either, since in what I was describing it's the spawn command that starts a coroutine; the overall async scope is the container for several coroutines.)

But perhaps that is a sufficient "escape hatch"? Spitballing again:

async $nursery { // Formerly AsyncContext
// Runs at the end of this nursery scope
$nursery->defer($fn);

// This creates and starts a coroutine, in this scope.
$future = $nursery->spawn($fn);

// A short-hand for "spawn this coroutine, in whatever the nearest async nursery scope is.
// aka, an alias for the above line, but doesn't require passing $nursery around.
$future spawn $fn;

// If you want.
$future->cancel();

// See below.
$nursery->spawn(stuff(...));
} // This blocks until escape() finishes, too, because it was bound to this scope.

function stuff() {
async $inner {
// This is bound to the $inner scope; $inner cannot end
// until this is complete. This is by design.
spawn $inner;

  // This spawns a new coroutine on the parent scope, if any.
  // If there isn't one, $inner->parent is null so it falls back
  // to the current scope.
  // One could technically climb the entire tree to the top-most
  // scope and spawn a coroutine there.  It would be a bit annoying to do,
  // but, as noted, that's a good thing, because you shouldn't be doing that 99.9% of the time!
  // Channels are better 99.9% of the time.
  ($inner->parent ?? $inner)->spawn(escape(...));
}

}

I'm not sure I fully like the above. I don't know if it makes the guarantees too weak still. But it does offer a limited, partial escape hatch, so may be an acceptable compromise.

It would be valuable to take this idea (or whatever we end up with) to experts in other languages with better async models than JS, and maybe a few academics, to let them poke obvious-to-them holes in it.

Edmund, does that make any sense to you?

--Larry Garfield

4 months ago by Edmond Dantes — view source

unread

As noted, I am in broad agreement with the previously linked article on
"playpens" (even if I hate that name), that the "go style model" is too
analogous to goto statements.

The syntax and logic you describe are very close to Kotlin's implementation.

I would say that Kotlin is probably the best example of structured
concurrency organization, which is closest to PHP in terms of abstraction
level.
One downside I see in Kotlin's syntax is its complexity.

However, what stands out is the CoroutineScope concept. Instead of
linking coroutines through Parent-Child relationships, Kotlin binds them
to execution contexts. At the same time, the GlobalScope context is
accessible everywhere.

https://kotlinlang.org/docs/coroutines-and-channels.html#structured-concurrency

So, it is not the coroutine that maintains the hierarchy, but the Scope
object, which essentially aligns with the RFC proposal: contexts can be
hierarchically linked.

This model sounds promising because it relieves coroutines from the
responsibility of waiting for their child coroutines. It is not the
coroutine that should wait, but the Scope that should "wait."


spawn function {
    echo "c1\n";
    spawn function {
        echo "c2\n";
    };

    echo "c1 end\n";
};

There is no reason to keep coroutine1 in memory if it does not need
coroutine2. However, the Scope will remain in memory until all
associated coroutines are completed. This memory model is entirely fair.

Let's consider a scenario with libraries. A library may want to run
coroutines under its own control. This means that the library wants to
execute coroutines within its own Scope. For example:


class Logger {
   public function __construct() {
         $this->scope = new CoroutineScope();
    }

   public function log(mixed $data) {
         // Adding another coroutine to our personal Scope
         $this->scope->spawn($this->handle_log(...), $data);
   }

   public function __destruct()
   {
      // We can explicitly cancel all coroutines in the destructor if we
find it appropriate
      $this->scope->cancel();
   }
}

Default Behavior

By default, the context is always inherited when calling spawn, so there is
no need to pass it explicitly. The expression spawn function {} is
essentially equivalent to currentScope->spawn.

The behavior of an HTTP server in a long-running process would look like
this:


function receiveLoop()
{
    while (true) {
        // Simulating waiting for an incoming connection
        $connection = waitForIncomingConnection();

        // Creating a new Scope for handling the request
        $requestScope = new CoroutineScope();

        // Processing the request inside its own scope
        $requestScope->spawn(function () use ($connection) {
            handleRequest($connection);
        });
    }
}

Scope allows the use of the "task group" and "await all" patterns
without additional syntax, making it convenient.

$scope = new CoroutineScope();

$scope->spawn(function () {
    echo "Task 1 started\n";
    sleep(1);
    echo "Task 1 finished\n";
});

$scope->spawn(function () {
    echo "Task 2 started\n";
    sleep(2);
    echo "Task 2 finished\n";
});

// Wait for all tasks to complete
$scope->awaitAll();

What is the advantage of using Scope instead of parent-child
relationships in coroutines?

If a programmer never uses Scope, then the behavior is literally the
same as in Go. This means that the programmer does not need structured
relationships, and it does not matter when a particular coroutine completes.

At the same time, code at a higher level can control the execution of
coroutines created at lower levels. The known downside of this approach
is that if lower-level code needs its coroutines to run independently, it
must explicitly define this behavior.

According to analysis, this model is effective in most modern languages,
especially those designed for business logic.
Who Will Use Coroutines?

I believe that the primary consumers of coroutines are libraries and
frameworks, which will provide developers with services and logic to solve
tasks.

If a library refuses to consider that its coroutine may be canceled by
the user's code, or how it should be canceled, it means the library is
neglecting its responsibility to provide a proper contract.

The default contract must give the user the power to terminate all
coroutines launched within a given context because only the user of the
library knows when this needs to be done. If a library has a different
opinion, then it is obligated to explicitly implement this behavior.

This means that libraries and services bear greater responsibility than
their users. But isn’t that already the case?

The language simply will not let you write memory-unsafe code

And this is more of an anti-example than an example.

But this analogy, like any other, cannot be used as an argument for or
against. Memory safety is not the same as launching a coroutine in
GlobalScope.

Where is the danger here? That the coroutine does something? But why is
that inherently bad?

It only becomes a problem when a coroutine accidentally captures memory
from the current context via use(), leaving an object in memory that logically
should have been destroyed when the request Scope was destroyed.

However, you cannot force a programmer to avoid writing such code.
Neither nurseries nor structured concurrency will take away this
possibility.

If a coroutine does not capture incorrect objects and does not wait on
the wrong $channel, then it should not be considered an issue.

I'm not sure if Coroutine would be the right name either

I'm not an expert in choosing good names, so I rely on others' opinions. If
the term coroutine feels overused, we can look for something else. But
what?

($inner->parent ?? $inner)->spawn(escape(...));

But the meaning of this code raises a question: why am I placing my
coroutine in the parent's context if my own context is already inherited
from the parent?

Or do I want my coroutine to be destroyed only with the parent's context,
but not with the current one? But then, how do I know that I should
place the coroutine in the parent, rather than in the parent’s parent?

It makes an assumption about something it cannot possibly know.

I would suggest explicitly specifying which Scope we want:


function stuff() {
    async $inner {
        // While the request is active
        ($inner->find('requestScope') ?? $inner)->spawn(escape(...));

        // Or while the server is running
        ($inner->find('serverScope') ?? $inner)->spawn(escape(...));
    }
}

Edmund, does that make any sense to you?

If there are expert-level people who have spent years working on language
syntax while also having a deep understanding of asynchrony, and they are
willing to help us, I would say that this is not just *reasonable *— it is
more like a necessary step that absolutely must be taken.

However, not for the current RFC, but rather for a draft of the next one,
which will focus on a much narrower topic. So 100% yes.

In addition to expert input, I would also like to create a database of
real-world use cases from code and examples. This would allow us to use code
as an argument against purely logical reasoning.

Ed

4 months ago by Daniil Gentili — view source

unread

Let me summarize the current state for today:

I am abandoning startScheduler and the idea of preserving backward compatibility with await_all or anything else in that category. The scheduler will be initialized implicitly, and this does not concern user-land. Consequently, the spawn function() code will work everywhere and always.

Very glad to hear this, this is the correct approach for concurrency, one that will not break all existing libraries and give them the freedom to handle their own resource cleanup.

I’ve also seen your latest email about kotlin-like contexts, and they also make more sense than an await_all block (which can only cause deadlocks): note how a kotlin coroutine context may only be cancelled (cancelling all inner coroutines with CancelledExceptions), never awaited.

I can give you several examples where such logic is used in Amphp libraries, and it will break if they are invoked within an async block.

Got it, it looks like I misunderstood the post due to my focus. So, essentially, you're talking not so much about wait_all itself, but rather about the parent-child vs. free model.

This question is what concerns me the most right now.

If you have real examples of how this can cause problems, I would really appreciate it if you could share them. Code is the best criterion of truth.

Sure:

This is the main example where it is most evident that background fibers are needed: logic which requires periodic background pings to be sent in order to keep a connection alive, a mutex held or something similar.

Constructing a PeriodicHeartbeatQueueinside of a wait_all block invoking a a someClass::connect(), storing it in a property and destroying it outside of it in someClass::close or someClass::__destruct, would cause a deadlock (the EventLoop::repeat doesn’t technically spawn a fiber immediately, it spawns one every $interval, but it behaves as though a single background fiber is spawned with a sleep($interval), so essentially it’s a standalone thread of execution, collected only on __destruct).

https://github.com/danog/MadelineProto/tree/v8/src/Loop/Connection contains multiple examples of tasks of the same kind in my own library (ping loops to keep connections alive, read loops to handle updates (which contain vital information needed to keep the client running correctly) in the background, etc...), all started on __construct when initialising the library, and stopped in __destruct when they are not needed anymore.

A coroutine context/scope a-la kotlin is fine, but it should absolutely not have anything to await all coroutines in the scope, or else it can cause deadlocks with the very common logic listed above.

Regards,
Daniil Gentili.

4 months ago by Edmond Dantes — view source

unread

Sure:

Yeah, this is a Watcher, a periodic function that is called to clean up or
check something. Yes, it’s a very specific pattern. And of course, the
Watcher belongs to the service. If the service is destroyed, the Watcher
should also be stopped.

In the context of this RFC, it's better to use the Async\Interval class.

contains multiple examples of tasks of the same kind in my own library
(ping loops to keep connections alive,
read loops to handle updates (which contain vital information needed to
keep the client running correctly) in the background, etc...), all started
on
__construct when initialising the library, and stopped in __destruct when
they are not needed anymore.

Thank you, I will read it.

--

Ed.

4 months ago by Daniil Gentili — view source

unread

Yeah, this is a Watcher, a periodic function that is called to clean up or check something. Yes, it’s a very specific pattern. And of course, the Watcher belongs to the service. If the service is destroyed, the Watcher should also be stopped.

It’s a Async\Interval, but it behaves entirely like a background fiber (and it can be implemented using a background fiber as well): what I mean is, it can be treated in the same way as a background fiber, because it’s an background task that can be spawned by the library in any method: if await_all was used during construction but not during destruction, it would cause a deadlock (because it would wait for an uncontrolled background task when exiting the block, according the proposed functionality of wait_all).

Regards,
Daniil Gentili.

4 months ago by Rowan Tommins [IMSoP] — view source

unread

Even its use is optional, its presence in the language could lead
library developers to reduce concurrency in order to allow calls from
async blocks, (i.e. don't spawn any background fiber in a method call
because it might be called from an async {} block) which is what I
meant by crippling async PHP.

I think you've misunderstood what I meant by optional. I meant that
putting the fiber into the managed context would be optional at the
point where the fiber was spawned.

A library wouldn't need to "avoid spawning background fibers", it would
simply have the choice between "spawn a fiber that is expected to finish
within the current managed scope, if any", and "spawn a fiber that I
promise to manage myself, and please ignore anyone trying to manage it
for me".

There have been various suggestions of exactly what that could look
like, e.g. in https://externals.io/message/126537#126625 and
https://externals.io/message/126537#126630

The naming of "async {}" is also very misleading, as it does the
opposite of making things async, if anything it should be called
"wait_all {}"

Yes, "async{}" is a bit of a generic placeholder name; I think Larry was
the first to use it in an illustration, and we've been discussing
exactly what it might mean. As we pin down more precise suggestions, we
can probably come up with clearer names for them.

The tone of your recent e-mails suggests you believe someone is forcing
this precise keyword into the language, right now, and you urgently need
to stop it before it's too late. That's not where we are at all, we're
trying to work out if some such facility would be useful, and what it
might look like.

It sounds like you think:

The language absolutely needs a "spawn detached" operation, i.e. a
way of starting a new fiber which is queued in the global scheduler, but
has no automatic relationship to its parent.
If the language offered both "spawn managed" and "spawn detached",
the "detached" mode would be overwhelmingly more common (i.e. users and
library authors would want to manage the lifecycle of their coroutines
manually), so the "spawn managed" mode isn't worth implementing.

Would that be a fair summary of your opinion?

--
Rowan Tommins
[IMSoP]

4 months ago by Daniil Gentili — view source

unread

I think you've misunderstood what I meant by optional. I meant that putting the fiber into the managed context would be optional at the point where the fiber was spawned.

It sounds like you think:

The language absolutely needs a "spawn detached" operation, i.e. a way of starting a new fiber which is queued in the global scheduler, but has no automatic relationship to its parent.

If the language offered both "spawn managed" and "spawn detached", the "detached" mode would be overwhelmingly more common (i.e. users and library authors would want to manage the lifecycle of their coroutines manually), so the "spawn managed" mode isn't worth implementing.

Would that be a fair summary of your opinion?

Indeed, yes! That would be a complete summary of my opinion.

If the user could choose whether to add fibers to the managed context or not, that would be more acceptable IMO.

Then again see point 2, plus even an optional managed fiber context still introduces a certain degree of "magicness" and non-obvious/implicit behavior on initiative of the caller, that can be avoided by simply explicitly returning and awaiting any spawned fibers.

Regards,
Daniil Gentili.

4 months ago by Edmond Dantes — view source

unread

for($i = 0; $i < 10; $i++) $results[] = async\async(fn($f) =>
file_get_contents($f),
$file[$i]);
// convert $results into futures somehow -- though actually doesn't look
like it is
possible.
$results = async\awaitAll($results);

Future can be obtained via getFuture(), according to the current RFC.

async\async(fn($f) => file_get_contents($f), $file[$i])->getFuture();

And this semantics can be simplified to:
async file_get_contents($file[$i]);
or
spawn file_get_contents($file[$i]);

From this perspective, I like that any function can be called with
spawn/async without worrying about its internals or modifying its code. The
pros and cons of this approach are well known.

4 months ago by Rob Landers — view source

unread

for($i = 0; $i < 10; $i++) $results[] = async\async(fn($f) => file_get_contents($f),
$file[$i]);
// convert $results into futures somehow -- though actually doesn't look like it is
possible.
$results = async\awaitAll($results);

Future can be obtained via getFuture(), according to the current RFC.
async\async(fn($f) => file_get_contents($f), $file[$i])->getFuture();
And this semantics can be simplified to:
async file_get_contents($file[$i]);
or
spawn file_get_contents($file[$i]);

From this perspective, I like that any function can be called with spawn/async without worrying about its internals or modifying its code. The pros and cons of this approach are well known.

Yes, that is much much nicer! It feel familiar to go:

go file_get_contents($file[$i])

And yes, I realize that would be a fun error in go, but you get the gist.

— Rob

4 months ago by Eugene Sidelnyk — view source

unread

Yes, that is much much nicer! It feel familiar to go:

go file_get_contents($file[$i])

And yes, I realize that would be a fun error in go, but you get the gist.

— Rob

Yes, that's the point that we don't bother client code with any of the
async stuff 🙂

If we want to create "async space" for functions to have switching on IO,
only then do we call them like this, and the await'em all with awaitAll 🙂.

4 months ago by Daniil Gentili — view source

unread

Mar 8, 2025 9:29:15 AM Edmond Dantes edmond.ht@gmail.com:

for($i = 0; $i < 10; $i++) $results[] = async\async(fn($f) => file_get_contents($f),
$file[$i]);
// convert $results into futures somehow -- though actually doesn't look like it is
possible.
$results = async\awaitAll($results);

And this semantics can be simplified to:
async file_get_contents($file[$i]);
or
spawn file_get_contents($file[$i]);

From this perspective, I like that any function can be called with spawn/async without worrying about its internals or modifying its code. The pros and cons of this approach are well known.

Loving this.

One might even consider to use the go keyaord along with async/spawn, to more easily associate the operation with go's (gc)oroutines...

Regards,
Daniil Gentili.

4 months ago by Edmond Dantes — view source

unread

Neither of these is a bad use case, and they're not mutually exclusive,
but they do lead to different priorities.
I freely admit my bias is towards Type 1, while it sounds like Edmond is
coming from a Type 2 perspective.

Exactly. A coroutine-based server is what I work with, so this aspect has a
greater influence on the RFC. However, both cases need to be considered.

Right now, background services are handled with Go. If PHP gets solid
concurrency tools, convenient process management, and execution tracking,
the situation might shift in a different direction—because a unified
codebase is almost always more beneficial.

4 months ago by Daniil Gentili — view source

unread

Of course, this is not an elegant solution, as it adds one more rule to the language, making it more complex. However, from a legacy perspective, it seems like a minimal scar.
(to All: Please leave your opinion if you are reading this )

Larry’s approach seems like a horrible idea to me: it increases complexity, prevents easy migration of existing code to an asynchronous model and is incredibly verbose for no good reason.

The arguments mentioned in https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/ are not good arguments at all, as they essentially propose explicitly reducing concurrency (by allowing it only within async blocks) or making it harder to use by forcing users to pass around contexts (which is even worse than function colouring https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/).
This (supposedly) reduces issues with resource contention/race conditions: sure, if you don’t use concurrency or severely limit it, you will have less issues with race conditions, but that’s not an argument in favour of nurseries, that’s an argument against concurrency.

Race conditions and deadlocks are possible either way when using concurrency, and the way to avoid them is to introduce synchronisation primitives (locks, mutexes similar to the ones in https://github.com/amphp/sync/, or lockfree solutions like actors, which I am a heavy user of), not bloating signatures by forcing users to pass around contexts, reducing concurrency and completely disallowing global state.

Golang is the perfect example of a language that does colourless, (mostly) contextless concurrency without the need for coloured (async/await keywords) functions and other complications.
Race conditions are deadlocks are avoided, like in any concurrent model, by using appropriate synchronisation primitives, and by communicating with channels (actor model) instead of sharing memory, where appropriate.

Side note, I very much like the current approach of implicit cancellations, because they even remove the need to pass contexts to make use of cancellations, like in golang or amphp (though the RFC could use some further work regarding cancellation inheritance between fibers, but that’s a minor issue).

Yeah, so basically, you're creating the service again and again for each coroutine if the coroutine needs to use it. This is a good solution in the context of multitasking, but it loses in terms of performance and memory, as well as complexity and code size, because it requires more factory classes.

^ this

Regarding backwards compatibility (especially with revolt), since I also briefly considered submitting an async RFC and thought about it a bit, I can suggest exposing an event loop interface like https://github.com/revoltphp/event-loop/blob/main/src/EventLoop.php, which would allow userland event loop implementations to simply switch to using the native event loop as backend (this’ll be especially simple to do for which is the main user of fibers, revolt, since the current implementation is clearly inspired by revolt’s event loop).

Essentially, the only thing that’s needed for backwards-compatibility in most cases is an API that can be used to register onWritable, onReadable callbacks for streams and a way to register delayed (delay) tasks, to completely remove the need to invoke stream_select.

I’d recommend chatting with Aaron to further discuss backwards compatibility and the overall RFC: I’ve already pinged him, he’ll chime in once he has more time to read the RFC.


To Edmond, as someone who submitted RFCs before: stand your ground, try not to listen too much to what people propose in this list, especially if it’s regarding radical changes like Larry's; avoid bloating the RFC with proposals that you do not really agree with.


Regards,
Daniil Gentili

—

Daniil Gentili - Senior software engineer 

Portfolio: https://daniil.it <https://daniil.it/>;
Telegram: https://t.me/danogentili

4 months ago by Jakub Zelenka — view source

unread

Hi,

https://wiki.php.net/rfc/true_async

I believe this version is not perfect and requires analysis. And I
strongly believe that things like this shouldn't be developed in isolation.
So, if you think any important (or even minor) aspects have been
overlooked, please bring them to attention.

I thought about this quite a bit and I think we should first try to clarify
the primary design that we want to go for. What I mean is whether we would
like to ever support a true concurrency (threads) in it. If we think it
would be worth it (even thought it wouldn't be initially supported), then
we should take it into the account from the beginning and add restrictions
to prevent race conditions. It means it should probably disallow global
(e.g. global $var;) variables or at least make them context specific as
well as disallowing object sharing. I think PHP users should not deal with
synchronization primitives. Basically what I want to say is that
multithreading should not be just something mentioned in the future scope
but the whole design should be done in a way that will make sure everything
will work fine for users. Ideally also having some simplified
implementation that will verify it.

I also agree that the scope is currently too big. It should be reduced to
the absolute minimum and just show what's possible. It's great to have a
proof of concept for that but the initial proposal should be mainly about
the design and introducing the core components.

Regards,

Jakub

4 months ago by Edmond Dantes — view source

unread

Hello, Jakub.

I thought about this quite a bit and I think we should first try to
clarify the primary design that we want to go for.
What I mean is whether we would like to ever support a true concurrency
(threads) in it.
If we think it would be worth it (even thought it wouldn't be initially
supported), then we should take it into the account from the beginning and
add restrictions to prevent race conditions.

If you mean multitasking, i.e., executing coroutines in different OS
threads, then this feature is far beyond the scope of this RFC and would
require significant changes to the PHP core (Memory manager first).
And even if we imagine that such changes are made, eliminating data races
without breaking the language is, to put it mildly, a questionable task
from the current perspective.

Although this RFC raises the question of whether a concurrent version
without multitasking is worth implementing at all, my opinion is positive.
For PHP, this could be sufficient as a language primarily used in the
context of asynchronous I/O, whereas a multitasking version may never
happen.

I will likely pose this as a direct question in the final part of this
RFC. Thanks!

Ed.

4 months ago by Jakub Zelenka — view source

unread

Hi,

I thought about this quite a bit and I think we should first try to
clarify the primary design that we want to go for.
What I mean is whether we would like to ever support a true concurrency
(threads) in it.
If we think it would be worth it (even thought it wouldn't be initially
supported), then we should take it into the account from the beginning and
add restrictions to prevent race conditions.

If you mean multitasking, i.e., executing coroutines in different OS
threads, then this feature is far beyond the scope of this RFC and would
require significant changes to the PHP core (Memory manager first).

You might want to look to parallel extension as it's already dealing with
that and mostly works - of course combination with coroutines will
certainly complicate it but the point is that memory is not shared.

And even if we imagine that such changes are made, eliminating data races
without breaking the language is, to put it mildly, a questionable task
from the current perspective.

That's exactly what I meant is to make sure that there won't be any data
races - it means use only channel communication (more below).

Although this RFC raises the question of whether a concurrent version
without multitasking is worth implementing at all, my opinion is positive.
For PHP, this could be sufficient as a language primarily used in the
context of asynchronous I/O, whereas a multitasking version may never
happen.

I didn't really mean to introduce it as part of this RFC. What I meant is
to design the API so there is still possibility to add it in the future
without risking various race condition in the code. It means primarily to
put certain restrictions that will prevent it like limited access to global
and passing anything by reference (including objects) to the running tasks.

Regards

Jakub

4 months ago by Edmond Dantes — view source

unread

You might want to look to parallel extension as it's already dealing
with that and mostly works - of course combination with coroutines will
certainly complicate it but the point is that memory is not shared.

Do you mean this extension: https://www.php.net/manual/en/book.parallel.php?

Yes, I studied it before starting the development of True Async. My very
first goal was to enable, if not coroutine execution across threads, then
at least interaction, because the same thing is already possible with
Swoole.

However, parallel does not provide multitasking and will not be able to in
the future. The best it can offer is interaction via Channel between two
different threads, which will be useful for job processing and built-in web
servers.

And here's the frustrating part. It turns out that parallel has to copy PHP
bytecode for correct execution in another thread. This means that not only
does the memory manager need to be replaced with a multi-threaded version,
but the virtual machine itself must also be refactored.

For PHP to work correctly in multiple threads with context switching, it
will be necessary to find a way to rewrite all the code that reads/writes
global variables. (Where TLS macros are used, this shouldn't be too
difficult. But is that the case everywhere?) This applies to all
extensions, both built-in and third-party.

Such a language update would create an "extension vacuum." When a new
version is released, many extensions will become unavailable due to the
need to adapt to the new multitasking model.

I didn't really mean to introduce it as part of this RFC.
What I meant is to design the API so there is still possibility to add it
in the future without risking various race condition in the code.
It means primarily to put certain restrictions that will prevent it like
limited access to global and passing anything by
reference (including objects) to the running tasks.

Primitives like Context are unlikely to be the main issue for
multitasking. The main problem will be the code that has been developed for
many years with single-threaded execution in mind. This is another factor
that raises doubts about the rationale for introducing real multitasking in
PHP.

If we are talking about a model similar to Python’s, the current RFC
already works with it, as a separate thread is used on Windows to wait for
processes and send events to the PHP thread. Therefore, integrating this
RFC with parallel is not an issue.

It would be great to solve the bytecode problem in a way that allows it to
be freely executed across different threads. This would enable running any
closure as a coroutine in another OS thread and interacting through a
channel. If you talk about this functionality, it does not block
concurrency or current RFC. This capability should be considered as an
additional feature that can be implemented later without modifying the
existing primitives.

While working on this RFC, I also considered finding a way to create
SharedObject instances that could be passed between threads. However, I
ultimately concluded that this solution would require changes to the memory
manager, so these objects were not included in the final document.

Ed.

4 months ago by Larry Garfield — view source

unread

You might want to look to parallel extension as it's already dealing
with that and mostly works - of course combination with coroutines will certainly complicate it but the point is that memory is not shared.

Do you mean this extension: https://www.php.net/manual/en/book.parallel.php?

Yes, I studied it before starting the development of True Async. My
very first goal was to enable, if not coroutine execution across
threads, then at least interaction, because the same thing is already
possible with Swoole.

snip

Such a language update would create an "extension vacuum." When a new
version is released, many extensions will become unavailable due to the
need to adapt to the new multitasking model.

It would necessitate a major version release, certainly.

I didn't really mean to introduce it as part of this RFC.
What I meant is to design the API so there is still possibility to add it in the future without risking various race condition in the code.
It means primarily to put certain restrictions that will prevent it like limited access to global and passing anything by
reference (including objects) to the running tasks.

Primitives like Context are unlikely to be the main issue for
multitasking. The main problem will be the code that has been developed
for many years with single-threaded execution in mind. This is another
factor that raises doubts about the rationale for introducing real
multitasking in PHP.

I think the point is more that the concurrency primitives that are introduced (async block, async() function, whatever it is) should be designed in such a way that PHP could introduce multiple parallel threads in the future to run multiple async blocks simultaneously... without any impact on the user code. To reuse my earlier example:

function parallel_map(iterable $it, Closure $fn) {
$result = [];
async $ctx {
foreach ($it as $k => $v) {
$result[$k] = $ctx->run($fn($v));
}
}
return $result;
}

Whether each run() invocation is handled by one thread switching between them or 3 threads switching between them is not something the above code should care about. Which means designing that API in such a way that I... don't need to care. Which probably means something like "no inter-fiber communication other than channels", as in Go. And thinking through what it means for the context object if it does have some kind of global property bag. (This is one reason I don't want one.) And it means there's no way to control threads directly from user-space. You just get async blocks, and that's it. This is an area that Go got pretty solidly right, and is worth emulating.

The implications on C code of adding true-thread support in the future is a separate question; the async API should be built such that it can be a separate future question.

--Larry Garfield

4 months ago by Edmond Dantes — view source

unread

Hello all.

A few thoughts aloud about the emerging picture.

Entry point into the asynchronous context

Most likely, it should be implemented as a separate function (I haven't
come up with a good name yet), with a unique name to ensure its behavior
does not overlap with other operators. It has a unique property: it waits
for the full completion of the event loop and the Scheduler.

Inside the asynchronous context, Fiber is prohibited, and conversely,
inside a Fiber, the asynchronous context is prohibited.

The `async` operator

The async (or spawn?) operator can be used as a shorthand for spawning
a coroutine:

function my($param) {}

// Operator used as a function call
async my(1);
or
spawn my(1);

// Operator used as a closure
async {
    code
};

// Since it's a closure, the `use` statement can be used without
restrictions
async use($var) {
    code
};

// Returns a coroutine class instance
$x = async use($var) {
    code
};

The `await` operator

The await operator can be added to async, allowing explicit suspension
of execution to wait for a result:

$x = await async use($var) {
    code
};

Context Manipulations

I didn't like functions like overrideContext. They allow changing the
context multiple times at any point in a function, which can lead to errors
that are difficult to debug. This is a really bad approach. It is much
better to declare the context at the time of coroutine invocation. With
syntax, it might look like this:

async in $context use() {}
async in $context myFun()
async in $context->with("key", value) myFun()
or
spawn in $context ...

Thread

The syntax spawn in/async in can be used not only in standard cases.

$coro = async in new ThreadContext() use($channel) {
    while() {}
};

// This expression is also valid
$coro = async in new $threadPool->borrowContext() use($channel) {
    while() {}
};

It is worth noting that $threadPool itself may be provided by an extension
and not be a part of PHP.

Unrelated Coroutines

An additional way is needed to create a coroutine that is not bound to a
parent.
It's worth considering how to make this as clear and convenient as possible.
Maybe as keyword:

async unbound ...

Of course, an async child modifier can be used. This is the inverse
implementation, but I think it will not be used often. Making unbound a
separate method is not very appealing at the moment because a programmer
might forget to call it. They could forget the word unbound, and even
more so a whole method.

Context Operations

await $context; // Waits for all coroutines in the context
$context.cancel(); // Cancels everything within the context

Flow

I want to thank all the participants in the discussion.
Thanks to your ideas, questions, and examples. A week ago, answering this
question would have been impossible.

If we add exception handling and graceful shutdown, and remove the new
syntax by replacing it with an equivalent of 2-3 functions, we will get a
fairly cohesive RFC that describes the high-level part without unnecessary
details. Channels and even Future can be excluded from this RFC —
everything except the coroutine class and context. Microtasks, of course,
will remain.

As a result, the RFC will be clean and compact, focusing solely on how
coroutines and context work.

Channels, Future, and iterators can be moved to a separate RFC dedicated
specifically to primitives.

Finally, after reviewing the high-level RFCs, we can return to the
implementation — in other words, top-down. Given that the approximate
structure of the lower level is already clear, discussing abstractions will
remain practical and grounded.

Just to clarify, I’m not planning to end the current discussion these are
just intermediate thoughts.

Ed.

4 months ago by Larry Garfield — view source

unread

Hello all.

A few thoughts aloud about the emerging picture.

Entry point into the asynchronous context

Most likely, it should be implemented as a separate function (I haven't
come up with a good name yet), with a unique name to ensure its
behavior does not overlap with other operators. It has a unique
property: it waits for the full completion of the event loop and the
Scheduler.

Inside the asynchronous context, Fiber is prohibited, and conversely,
inside a Fiber, the asynchronous context is prohibited.

Yes.

The async operator

The async (or spawn?) operator can be used as a shorthand for
spawning a coroutine:

This is incorrect. "Create an async bounded context playpen" (what I called "async" in my example) and "start a fiber/thread/task" (what I called "spawn") are two separate operations, and must remain so.

create space for async stuff {
start async task a();
start async task b();
}

However those get spelled, they're necessarily separate things. If any creation of a new async task also creates a new async context, then we no longer have the ability to run multiple tasks in parallel in the same context. Which is, as I understand it, kinda the point.

I also don't believe that an async bounded context necessarily needs to be a function, as doing so introduces a lot of extra complexity for the user when they need to manually "use" things. (Though perhaps sometimes we can have a shorthand for that; that comes later.)

I am also still very much against allowing tasks to "detach". If a thread is allowed to escape its bounded context, then I can no longer rely on that context being bounded. It removes the very guarantee that we're trying to provide. There are better ways to handle "throwing off a long-running background task." (See below.)

Edmond, correct me if I'm wrong here, but in practice, the only places that it makes sense to switch fibers are:

At an otherwise-blocking IO call.
In a very long running CPU task, where the task is easily broken up into logical pieces so that we can interleave it with shorter tasks in the same process. This is only really necessary when running a shared single process for multiple requests.

And in this proposal, IO operations auto-switch between blocking and thread-sharing as appropriate.

To be more concrete, let's consider specific use cases that should be addressed:

Multiplexing IO, within an otherwise sync context like PHP-FPM

I predict that, in the near term, this will be the most common usage pattern. (Long term, who knows.) This one is easily solvable; it's basically par_map() and variations therein.

// Creates a context in which async is allowed to happen. IO operations auto
async $ctx = new AsyncContext() {
$val1 = spawn task1();
$val2 = spawn task2();
// Do stuff with those values.
}
// We are absolutely certain nothing started in that block is still running.

(I'm still unclear if $val1 and $val2 should be values or a Future object. Possibly the latter.)

Shared-process async server

This is the ReactPHP/Swoole space. This... honestly gets kind of easy.

Wrap the entire application in an async {} block. Boom. All IO is now async.

<?php

async {
while (true) {
$request = spawn listen_for_request();
spawn handle_request($request);
}
}

Importantly, since IO is the primary switch point, and IO automatically deals with thread switching, my DB-query-heavy Repository object doesn't care if I'm doing this or not. If each $handler (controller, whatever) is written 100% sync, with lots of IO... it still works fine.

Set-and-forget background job

This is the logger example, but probably also queue tasks, etc. This is where the request for detaching comes from. I would argue detaching is both the wrong approach, and an unnecessary one. Because you can send data to fibers from OTHER contexts... via channels.

So rather than this:

spawn detatch log('message'); // Who the hell knows when this will complete, or if it ever does.

We have this:

async {
$logger = new AsyncLogger();
$channel = $logger->inputChannel();

spawn handler($logChannel);
}

function handler($logger) {
async {
while (true) {
$request = spawn listen_for_request();
spawn handle_request($request, $logChannel);
} // An exception could get us to here.
}
}

function handle_request($request, $logChannel) {
$logChannel->send($request->url());
// Do other complex stuff with the request.
}

This is probably not the ideal way to structure it in practice, but it should get the point across. The background logger fiber already exists in the parent async playpen. That's OK! We can send messages to it via a channel. It can keep running after the inner async block ends. The logger fiber doesn't need to be attached, because it was already attached to a parent playpen anyway!

This means passing either a channel-enabled logger instance around (probably better for BC; this should be easy to do behind PSR-3) or the sending channel itself. I'm sure someone will object that is too much work. However, it is no more, or less, work than passing a PSR-3 logger to services today. And in practice "your DI container handles that, stop worrying" is a common and effective answer.

An async-aware DI Container could have an Async-aware PSR-3 logger it passes to various services like any other boring PSR-3 instance. That logger forwards the message across a channel to a waiting parent-playpen-bound fiber, where it just enters the rotation of other fibers getting run.

Services don't need to be modified at all. We don't need to have dangling fibers. And for smaller, more contained cases, eh, Go has shown that "just pass the channel around and move on with life" can be an effective approach. The only caveat is you can't pass a channel-based logger to a scope that will be called outside of an async playpen... But that would be the case anyway, so it's not really an issue.

There's still the context question, as well as whether spawn is a method on a context object or a keyword, but I think this gets us to 80% of what the original RFC tries to provide, with 20% of the mental overhead.

--Larry Garfield

4 months ago by Edmond Dantes — view source

unread

This is incorrect. "Create an async bounded context playpen" (what I
called "async" in my example)
and "start a fiber/thread/task" (what I called "spawn") are two
separate operations, and > must remain so.

So, you use async to denote the context and spawn to create a coroutine.

Regarding the context, it seems there's some confusion with this term.
Let's try to separate it somehow.

For coroutines to work, a Scheduler must be started. There can be only
one Scheduler per OS thread. That means creating a new async task does
not create a new Scheduler.

Apparently, async {} in the examples above is the entry point for the
Scheduler.

This is probably not the ideal way to structure it in practice, but it
should get the point across.

Sounds like a perfect solution.

However, the initialization order raises some doubts: it seems that all
required coroutines must be created in advance. Will this be convenient?
What if a service doesn’t want to initialize a coroutine immediately? What
if it’s not loaded into memory right away? Lazy load.

For example, we have a Logger service, which usually starts a coroutine
for log flushing. Or even multiple coroutines (e.g., a timer as well). But
the service itself might not be initialized and could start only on first
use.

Should we forbid this practice?
If you want to be a service, should you always initialize yourself
upfront?

Wait a minute. This resembles how an OS works. At level 0, the operating
system runs, while user-level code interacts with it via interrupts.

It's almost the same as opening a channel in the ROOT context and sending
a message through the channel from some child context. Instead of sending
a message directly to the Logger, we could send it to the service manager
through a channel.

Since the channel was opened in the ROOT context, all operations would
also execute in the ROOT context. And if the LOGGER was not initialized,
it would be initialized from the ROOT context.
Possible drawbacks:

It's unclear how complex this would be to implement.
If messages are sent via a channel, the logger won't be able to
fetch additional data from the request environment. All data must be
explicitly passed, or the entire context must be thrown into the
channel.

Needs more thought.

But in any case, the idea with the channel is good. It can cover many
scenarios.

Everything else is correct, I don’t have much to add.

Ed.

4 months ago by Rowan Tommins [IMSoP] — view source

unread

For coroutines to work, a Scheduler must be started. There can be only
one Scheduler per OS thread. That means creating a new async task does
not create a new Scheduler.

Apparently, async {} in the examples above is the entry point for the
Scheduler.

I've been pondering this, and I think talking about "starting" or
"initialising" the Scheduler is slightly misleading, because it implies
that the Scheduler is something that "happens over there".

It sounds like we'd be writing this:

// No scheduler running, this is probably an error
Async\runOnScheduler( something(...) );

Async\startScheduler();
// Great, now it's running...

Async\runonScheduler( something(...) );

// If we can start it, we can stop it I guess?
Async\stopScheduler();

But that's not we're talking about. As the RFC says:

Once the Scheduler is activated, it will take control of the
Null-Fiber context, and execution within it will pause until all Fibers,
all microtasks, and all event loop events have been processed.

The actual flow in the RFC is like this:

// This is queued somewhere special, ready for a scheduler to pick it up
later
Async\enqueueForScheduler( something(...) );

// Only now does anything actually run
Async\runSchedulerUntilQueueEmpty();
// At this point, the scheduler isn't running any more

// If we add to the queue now, it won't run unless we run another scheduler
Async\enqueueForScheduler( something(...) );

Pondering this, I think one of the things we've been missing is what
Unix[-like] systems call "process 0". I'm not an expert, so may get
details wrong, but my understanding is that if you had a single-tasking
OS, and used it to bootstrap a Unix[-like] system, it would look
something like this:

You would replace the currently running single process with the new
kernel / scheduler process
That scheduler would always start with exactly one process in the
queue, traditionally called "init"
The scheduler would hand control to process 0 (because it's the only
thing in the queue), and that process would be responsible for starting
all the other processes in the system: TTYs and login prompts, network
daemons, etc

I think the same thing applies to scheduling coroutines: we want the
Scheduler to take over the "null fiber", but in order to be useful, it
needs something in its queue. So I propose we have a similar "coroutine
zero" [name for illustration only]:

// No scheduler running, this is an error
Async\runOnScheduler( something(...) );

Async\runScheduler(
coroutine_zero: something(...);
);
// At this point, the scheduler isn't running any more

It's then the responsibility of "coroutine 0", here the function
"something", to schedule what's actually wanted, like a network
listener, or a worker pool reading from a queue, etc.

At that point, the relationship to a block syntax perhaps becomes clearer:

async {
spawn start_network_listener();
}

is roughly (ignoring the difference between a code block and a closure)
sugar for:

Async\runScheduler(
    coroutine_zero: function() {
   spawn start_network_listener();
   }
);

That leaves the question of whether it would ever make sense to nest
those blocks (indirectly, e.g. something() itself contains an async{}
block, or calls something else which does).

I guess in our analogy, nested blocks could be like running Containers
within the currently running OS: they don't actually start a new
Scheduler, but they mark a namespace of related coroutines, that can be
treated specially in some way.

Alternatively, it could simply be an error, like trying to run the
kernel as a userland program.

--
Rowan Tommins
[IMSoP]

4 months ago by Edmond Dantes — view source

unread

I think the same thing applies to scheduling coroutines: we want the
Scheduler to take over the "null fiber",

Yes, you have quite accurately described a possible implementation.
When a programmer loads the initial index.php, its code is already running
inside a coroutine.
We can call it the main coroutine or the root coroutine.

When the index.php script reaches its last instruction, the coroutine
finishes, execution is handed over to the Scheduler, and then everything
proceeds as usual.

Accordingly, if the Scheduler has more coroutines in the queue, reaching
the last line of index.php does not mean the script terminates. Instead, it
continues executing the queue until... there is nothing left to execute.

At that point, the relationship to a block syntax perhaps becomes clearer:

Thanks to the extensive discussion, I realized that the implementation with
startScheduler raises too many questions, and it's better to sacrifice a
bit of backward compatibility for the sake of language elegance.

After all, Fiber is unlikely to be used by ordinary programmers.

4 months ago by Iliya Miroslavov Iliev — view source

unread

Edmond,
The language barrier is bigger (because of me, I cannot properly explain
it) so I will keep it simple. Having "await" makes it sync, not async. In
hardware we use interrupts but we have to do it grandma style... The main
loop checks from variables set on the interrupts which is async. So you
have a main loop that checks a variable but that variable is set from
another part of the processor cycle that has nothing to do with the main
loop (it is not fire and forget style it is in real time). Basically you
can have a standard int main()function that is sync because you can delay
in it (yep sleep(0)) and while you block it you have an event that
interrupts a function that works on another register which is independent
from the main function. More details of this will be probably not
interesting so I will stop. If you want to make async PHP with multiple
processes you have to check variables semaphored to make it work.

I think the same thing applies to scheduling coroutines: we want the
Scheduler to take over the "null fiber",

Yes, you have quite accurately described a possible implementation.
When a programmer loads the initial index.php, its code is already
running inside a coroutine.
We can call it the main coroutine or the root coroutine.

When the index.php script reaches its last instruction, the coroutine
finishes, execution is handed over to the Scheduler, and then everything
proceeds as usual.

Accordingly, if the Scheduler has more coroutines in the queue, reaching
the last line of index.php does not mean the script terminates. Instead,
it continues executing the queue until... there is nothing left to execute.

At that point, the relationship to a block syntax perhaps becomes
clearer:

Thanks to the extensive discussion, I realized that the implementation
with startScheduler raises too many questions, and it's better to
sacrifice a bit of backward compatibility for the sake of language elegance.

After all, Fiber is unlikely to be used by ordinary programmers.

--
Iliya Miroslavov Iliev
i.miroslavov@gmail.com

4 months ago by Edmond Dantes — view source

unread

Edmond,

....

If you want to make async PHP with multiple processes you have to check
variables semaphored to make it work.

Hello, Iliya.

Thank you for your feedback. I'm not sure if I fully understood the entire
context. But.

At the moment, I have no intention of adding multitasking to PHP in the
same way it works in Go.

Therefore, code will not require synchronization. The current RFC proposes
adding only asynchronous execution. That means each thread will have its
own event loop, its own memory, and its own coroutines.

P.s. I know also Russian and a bit asm.
Ed.

4 months ago by Rob Landers — view source

unread

For coroutines to work, a Scheduler must be started. There can be only
one Scheduler per OS thread. That means creating a new async task does
not create a new Scheduler.

Apparently, async {} in the examples above is the entry point for the
Scheduler.

I've been pondering this, and I think talking about "starting" or
"initialising" the Scheduler is slightly misleading, because it implies
that the Scheduler is something that "happens over there".

It sounds like we'd be writing this:

// No scheduler running, this is probably an error
Async\runOnScheduler( something(...) );

Async\startScheduler();
// Great, now it's running...

Async\runonScheduler( something(...) );

// If we can start it, we can stop it I guess?
Async\stopScheduler();

But that's not we're talking about. As the RFC says:

Once the Scheduler is activated, it will take control of the
Null-Fiber context, and execution within it will pause until all Fibers,
all microtasks, and all event loop events have been processed.

The actual flow in the RFC is like this:

// This is queued somewhere special, ready for a scheduler to pick it up
later
Async\enqueueForScheduler( something(...) );

// Only now does anything actually run
Async\runSchedulerUntilQueueEmpty();
// At this point, the scheduler isn't running any more

// If we add to the queue now, it won't run unless we run another scheduler
Async\enqueueForScheduler( something(...) );

Pondering this, I think one of the things we've been missing is what
Unix[-like] systems call "process 0". I'm not an expert, so may get
details wrong, but my understanding is that if you had a single-tasking
OS, and used it to bootstrap a Unix[-like] system, it would look
something like this:

You would replace the currently running single process with the new
kernel / scheduler process

That scheduler would always start with exactly one process in the
queue, traditionally called "init"

The scheduler would hand control to process 0 (because it's the only
thing in the queue), and that process would be responsible for starting
all the other processes in the system: TTYs and login prompts, network
daemons, etc

Slightly off-topic, but you may find the following article interesting: https://manybutfinite.com/post/kernel-boot-process/

It's a bit old, but probably still relevant for the most part. At least for x86.

— Rob

4 months ago by Larry Garfield — view source

unread

That leaves the question of whether it would ever make sense to nest
those blocks (indirectly, e.g. something() itself contains an async{}
block, or calls something else which does).

I guess in our analogy, nested blocks could be like running Containers
within the currently running OS: they don't actually start a new
Scheduler, but they mark a namespace of related coroutines, that can be
treated specially in some way.

Alternatively, it could simply be an error, like trying to run the
kernel as a userland program.

Support for nested blocks is absolutely mandatory, whatever else we do. If you cannot nest one async block (scheduler instance, coroutine, whatever it is) inside another, then basically no code can do anything async except the top level framework.

This function needs to be possible, and work anywhere, regardless of whether there's an "open" async session 5 stack calls up.

function par_map(iterable $it, callable $c) {
$result = [];
async {
foreach ($it as $val) {
$result[] = $c($val);
}
}
return $result;
}

However it gets spelled, the above code needs to be supported.

--Larry Garfield

4 months ago by Rowan Tommins [IMSoP] — view source

unread

Support for nested blocks is absolutely mandatory, whatever else we do. If you cannot nest one async block (scheduler instance, coroutine, whatever it is) inside another, then basically no code can do anything async except the top level framework.

To stretch the analogy slightly, this is like saying that no Linux program could call fork() until containers were invented. That's quite obviously not true; in a system without containers, the forked process is tracked by the single global scheduler, and has a default relationship to its parent but also with other top-level processes.

Nested blocks are necessary if we want automatic resource management around user-selected parts of the program - which is close to being a tautology. If we don't provide them, we just need a defined start and end of the scheduler - and Edmond's current suggestion is that that could be an automatic part of the process / thread lifecycle, and not visible to the user at all.

This function needs to be possible, and work anywhere, regardless of whether there's an "open" async session 5 stack calls up.

function par_map(iterable $it, callable $c) {
$result = [];
async {
foreach ($it as $val) {
$result[] = $c($val);
}
}
return $result;
}

This looks to me like an example where you should not be creating an extra context/nursery/whatever. A generic building block like map() should generally not impose resource restrictions on the code it's working with. In fact as written there's no reason for this function to exist at all - if $c returns a Future, a normal array_map will return an array of Futures, and can be composed with await_all, await_any, etc as necessary.

If an explicit nursery/context was required in order to use async features, you'd probably want instead to have a version of array_map which took one as an extra parameter, and passed it to along to the callback:

function par_map(iterable $it, callable $c, AsyncContext $ctx) {
$result = [];
async {
foreach ($it as $val) {
$result[] = $c($val, $ctx);
}
}
return $result;
}

This is pretty much just coloured functions, but with uglier syntax, since par_map itself isn't doing anything useful with the context, just passing along the one from an outer scope. An awful lot of functions would be like this; maybe FP experts would like it, authors of existing PHP code would absolutely hate it.

The place I see nested async{} blocks potentially being useful is to have a handful of key "firewalls" in the application, where any accidentally orphaned coroutines can be automatically awaited before declaring a particular task "done". But Daniil is probably right to ask for concrete use cases, and I have not used enough existing async code (in PHP or any other language) to answer that confidently.

Rowan Tommins
[IMSoP]

4 months ago by Edmond Dantes — view source

unread

function par_map(iterable $it, callable $c) {
$result = [];
async {
foreach ($it as $val) {
$result[] = $c($val);
}
}
return $result;
}

If the assumption is that each call can be asynchronous and all elements
need to be processed, the only proper tool is a concurrent iterator.
Manually using a foreach loop is not the best idea because the iterator
does not necessarily create a coroutine for each iteration.
And, of course, such an iterator should have a getFuture method that allows
waiting for the result.

Yes, Kotlin has an explicit blocking Scope, but I don’t see much need for
it. So far, all the cases we’re considering fit neatly into a framework:

I want to launch a coroutine and wait: await spawn
I want to launch a coroutine and not wait: spawn
I want to launch a group of coroutines and wait: await CoroutineScope
I want to launch a group of coroutines and not wait: spawn
I want a concurrent iteration: special iterator.

What else are we missing?

PHP True Async RFC

Parent-child model is ideal: If the parent coroutine is canceled, the child coroutines are meaningless as well.

Entry point into the asynchronous context

The async operator

The await operator

Context Manipulations

Thread

Unrelated Coroutines

Context Operations

Flow

Entry point into the asynchronous context

The async operator

Parent-child model is ideal:
If the parent coroutine is canceled, the child coroutines are
meaningless as well.

The `async` operator

The `await` operator

The `async` operator