Hi!
Historically in my company PHP has been used as a CLI tool for project
building purposes. For example for code generation, configs parsing and
converting them into binary blobs, calling dotnet and go tools, etc. In
general it's a cross platform 'make on steroids'.
We are pretty much happy about it except one particular thing: convenient
cross platform multi-threading. And we don't actually need a true
multi-threading but rather something similar to Worker threads in Node.js.
We tried using the AMPHP Parallel package but it wasn't stable enough for
our needs (hangs or emits serialization errors for our workloads). And it
also seems to be using a special 'ProcessWrapper.exe' (bundled with a
composer package) to overcome some Window specific issues. Due to strict
anti-virus policies we had to manually add an exception for it... So we
ended up with some primitive wrappers around platform specific tools: for
*nix we start background PHP processes with '&', for Windows - using
'powershell.exe Start-Process' for the same purpose. We pass data to and
from these background processes using serialization and temporary files and
while it works, well, it's pretty inconvenient and cumbersome.
Out of curiosity I tried Node.js Worker threads and it just worked as
expected for our sample loads both on *nix and Windows boxes. And there are
some high-level wrappers over Worker threads like Piscina which simplify
things even further:
const worker = new Piscina({
filename: path.resolve(__dirname, "worker.js"),
maxThreads: 32,
maxQueue: "auto",
});
const results = await Promise.all(
files.map((file, idx) => worker.run([file]))
)
I think PHP could benefit from having the similar 'multi-processing'
support in the core. It seems like here's what Node.js does:
- Spawns/stops worker threads
- Passes serialized input/output between the worker manager and worker
threads
Since worker threads are fully isolated you can't directly access their
internal data, there are no data races and there's no need for any
synchronization primitives.
I guess PHP could do the same even in non ZTS mode? And the aforementioned
AMPHP could use this built-in Worker threads support without having to
resort to any platform specific hacks providing a nice and clean
interface.
--
Best regards, Pavel
Hello, Pavel.
PHP has the Parallel library, which already allows you to do quite a lot.
But you probably already know about it. From the multithreading
perspective, there is also the following research:
https://github.com/true-async/multithreaded-php/blob/main/mm-en.md
https://github.com/true-async/multithreaded-php/blob/main/mm-ru.md
This article discusses a direct analogue of the architecture used in Node.js.
I hope that this year we will be able to move TrueAsync forward, and
along with it possibly introduce convenient multithreading as well.
Hello, Pavel.
PHP has the Parallel library, which already allows you to do quite a lot.
But you probably already know about it. From the multithreading
perspective, there is also the following research:
https://github.com/true-async/multithreaded-php/blob/main/mm-en.md
https://github.com/true-async/multithreaded-php/blob/main/mm-ru.mdThis article discusses a direct analogue of the architecture used in
Node.js.
Lots of cool ideas, thanks for sharing!
I hope that this year we will be able to move TrueAsync forward, and
along with it possibly introduce convenient multithreading as well.
Keep up the great work guys :)
--
Best regards, Pavel
Hi!
Historically in my company PHP has been used as a CLI tool for project building purposes. For example for code generation, configs parsing and converting them into binary blobs, calling dotnet and go tools, etc. In general it's a cross platform 'make on steroids'.
We are pretty much happy about it except one particular thing: convenient cross platform multi-threading. And we don't actually need a true multi-threading but rather something similar to Worker threads in Node.js.
We tried using the AMPHP Parallel package but it wasn't stable enough for our needs (hangs or emits serialization errors for our workloads). And it also seems to be using a special 'ProcessWrapper.exe' (bundled with a composer package) to overcome some Window specific issues. Due to strict anti-virus policies we had to manually add an exception for it... So we ended up with some primitive wrappers around platform specific tools: for *nix we start background PHP processes with '&', for Windows - using 'powershell.exe Start-Process' for the same purpose. We pass data to and from these background processes using serialization and temporary files and while it works, well, it's pretty inconvenient and cumbersome.
Out of curiosity I tried Node.js Worker threads and it just worked as expected for our sample loads both on *nix and Windows boxes. And there are some high-level wrappers over Worker threads like Piscina which simplify things even further:
const worker = new Piscina({ filename: path.resolve(__dirname, "worker.js"), maxThreads: 32, maxQueue: "auto", }); const results = await Promise.all( files.map((file, idx) => worker.run([file])) )I think PHP could benefit from having the similar 'multi-processing' support in the core. It seems like here's what Node.js does:
- Spawns/stops worker threads
- Passes serialized input/output between the worker manager and worker threads
Since worker threads are fully isolated you can't directly access their internal data, there are no data races and there's no need for any synchronization primitives.
I guess PHP could do the same even in non ZTS mode? And the aforementioned AMPHP could use this built-in Worker threads support without having to resort to any platform specific hacks providing a nice and clean interface.
--
Best regards, Pavel
Hi Pavel,
It isn’t directly possible to share zvals between threads. They must be copied/serialized, and the context reconstructed. That’s why the parallel extension works the way it does. There’s also the phasync library that does this stuff in non-zts PHP, if you’re interested in a solution you can use today. I don’t know if it is used in production, but the maintainer is pretty responsive.
There’s some work to make this easier (me refactoring TSM, TrueAsync), but it probably won’t be a thing until PHP 9.0.
— Rob