Add viable long running execution model to php 8

5 years ago by Robert Hickman — view source

unread

PHP is pretty unusual in comparison to most web platforms nowadays as it
runs each request in an isolated process. Web development in other
languages is instead based around a long lived set of processes, which
serve multiple requests.

That model has advantages in that it is very easy to cache data in process,
and should be simpler in theory to get good performance as all code can be
loaded into memory once during startup. Autoloading therefore goes away.

There are userland implementations like PHP-PM, but I think it good to have
an official way of running code in this way in php 8.

5 years ago by Rowan Tommins — view source

unread

PHP is pretty unusual in comparison to most web platforms nowadays as it
runs each request in an isolated process. Web development in other
languages is instead based around a long lived set of processes, which
serve multiple requests.

That model has advantages in that it is very easy to cache data in process,
and should be simpler in theory to get good performance as all code can be
loaded into memory once during startup. Autoloading therefore goes away.

Hi Robert,

Could you share some more thoughts on what you are thinking of here? I'm
guessing you're thinking along the lines of an "event-based" system,
where each request is a function call, rather than a whole script
invocation?

PHP is sometimes described as having a "shared nothing" architecture -
everything is wiped out at the end of each requests - so perhaps one way
to look at this is to propose a "shared something" version.

The first question is what we want to share. We already have sharing
of compiled code, with OpCache bundled since 5.5, and the pre-loading
feature in 7.4 could make autoloading redundant for many applications.
What we can't share natively is any data structures, resources like
database connections, or other global state like set_error_handler.

The second question is how it should be shared - what would it look
like to the user? For instance, should simultaneous requests be able
to share state? That might mean making them explicitly aware of other
threads, or running them all on one thread but in a series of
asynchronous functions (as I understand it, that's how node.js works).

It's definitely an interesting question to explore, though.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by Robert Hickman — view source

unread

Hi Rowan

Could you share some more thoughts on what you are thinking of here? I'm
guessing you're thinking along the lines of an "event-based" system,
where each request is a function call, rather than a whole script
invocation?

Yes that is what I was thinking, for example there is a userspace implementation
'Swoole' that works in the following way, ReactPHP is similar although I won't
include that example as well.

<?php
$http = new Swoole\HTTP\Server("127.0.0.1", 9501);

$http->on("start", function ($server) {
echo "Swoole http server is started at http://127.0.0.1:9501\n";;
});

$http->on("request", function ($request, $response) {
$response->header("Content-Type", "text/plain");
$response->end("Hello World\n");
});

$http->start();

PHP is sometimes described as having a "shared nothing" architecture -
everything is wiped out at the end of each requests - so perhaps one way
to look at this is to propose a "shared something" version.

The first question is what we want to share. We already have sharing
of compiled code, with OpCache bundled since 5.5, and the pre-loading
feature in 7.4 could make autoloading redundant for many applications.
What we can't share natively is any data structures, resources like
database connections, or other global state like set_error_handler.

The concept of a shared nothing architecture is a great basis for designing
scaleable systems, and it does reduce/eliminate some types of bugs. However it
may be better to approach this at a conceptual level instead of enforcing it
with the design of the language.

In my mind right now, everything should be shareable within a single process,
as one could do in the Swoole example above, nothing stopping you defining a
global in that script that could cache data in-process.

NodeJS, Python (wsgi) and others work fine using this model and allow sharing
of data within the same process. Trying to limit it to only some types of things
would be more complex as each type of thing would end up having a different
programmatic interface.

Changing the execution model would also allow PHP to natively handle
web-sockets natively without 3rd party implementations, which are all based
around long running processes. I got the following from the Swoole docs:

<?php
$server = new Swoole\Websocket\Server("127.0.0.1", 9502);

$server->on('open', function($server, $req) {
echo "connection open: {$req->fd}\n";
});

$server->on('message', function($server, $frame) {
echo "received message: {$frame->data}\n";
$server->push($frame->fd, json_encode(["hello", "world"]));
});

$server->on('close', function($server, $fd) {
echo "connection close: {$fd}\n";
});

$server->start();

5 years ago by Rowan Tommins — view source

unread

Yes that is what I was thinking, for example there is a userspace implementation
'Swoole' that works in the following way, ReactPHP is similar although I won't
include that example as well.

So trying to get concrete: the first "official" component we'd need
would be an outer event loop, mapping requests and responses to the
parameter and return values of userland callbacks. In principle, not too
difficult, although I'm sure there are plenty of devils in the details.

In my mind right now, everything should be shareable within a single process,
as one could do in the Swoole example above, nothing stopping you defining a
global in that script that could cache data in-process.

NodeJS, Python (wsgi) and others work fine using this model and allow sharing
of data within the same process. Trying to limit it to only some types of things
would be more complex as each type of thing would end up having a different
programmatic interface.

I may be wrong, but I think this is where it gets complicated. It's not
that we'd want to deliberately have different things have different
behaviour between requests, it's just that we've got a bunch of existing
stuff built on the assumptions of the current architecture.

In a single-threaded event loop, you want as much as possible to be
asynchronous, which is why both Swoole and React have a lot of modules
for things like network requests, file I/O, databases, and general
asynchronous programming.

Other things just wouldn't exist if PHP hadn't been modelled as shared
nothing from the beginning. Would set_time_limit() still be global, and
abort the server after a fixed number of seconds? Or would it configure
the event loop somehow?

I think there'd need to be at least a roadmap for sorting out those
questions in the official distribution before it felt like a properly
supported part of the language.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by Mike Schinkel — view source

unread

Yes that is what I was thinking, for example there is a userspace implementation
'Swoole' that works in the following way, ReactPHP is similar although I won't
include that example as well.

So trying to get concrete: the first "official" component we'd need would be an outer event loop, mapping requests and responses to the parameter and return values of userland callbacks. In principle, not too difficult, although I'm sure there are plenty of devils in the details.

In my mind right now, everything should be shareable within a single process,
as one could do in the Swoole example above, nothing stopping you defining a
global in that script that could cache data in-process.

NodeJS, Python (wsgi) and others work fine using this model and allow sharing
of data within the same process. Trying to limit it to only some types of things
would be more complex as each type of thing would end up having a different
programmatic interface.

I may be wrong, but I think this is where it gets complicated. It's not that we'd want to deliberately have different things have different behaviour between requests, it's just that we've got a bunch of existing stuff built on the assumptions of the current architecture.

In a single-threaded event loop, you want as much as possible to be asynchronous, which is why both Swoole and React have a lot of modules for things like network requests, file I/O, databases, and general asynchronous programming.

Other things just wouldn't exist if PHP hadn't been modelled as shared nothing from the beginning. Would set_time_limit() still be global, and abort the server after a fixed number of seconds? Or would it configure the event loop somehow?

I think there'd need to be at least a roadmap for sorting out those questions in the official distribution before it felt like a properly supported part of the language.

I'm not following the discussion 100% – more like 85% — but it seems like what we might be saying is the need for a user-land implementation of a long-running PHP request, one that does not timeout?

If that is the case, could we consider allowing a PHP page to opt-in to no timeout? These types of requests could then handle web sockets, etc.

Then we could look to prior art with GoLang channels where they "Communicate to share memory" and do not "Share memory to communicate." IOW, add an API that allows a regular PHP page to communicate with a long-running page. This would decouple and allow for better testing, and hopefully fewer hard to track down bugs.

Further, I would suggest the long running requests not be able to generate output except when the ini setting display_errors is true to ensure they are only used for communicating with regular "shared nothing" pages and not used in place of shared-nothing pages?

Would this not be a workable approach?

-Mike

5 years ago by Dik Takken — view source

unread

Then we could look to prior art with GoLang channels where they "Communicate to share memory" and do not "Share memory to communicate." IOW, add an API that allows a regular PHP page to communicate with a long-running page. This would decouple and allow for better testing, and hopefully fewer hard to track down bugs.

Go channels are about solving problems related to true concurrency:
Multiple threads concurrently handling requests in a single shared
memory environment. I think Robert is talking about sequential request
handling in a single shared memory environment.

Regards,
Dik Takken

5 years ago by Mike Schinkel — view source

unread

Go channels are about solving problems related to true concurrency:
Multiple threads concurrently handling requests in a single shared
memory environment. I think Robert is talking about sequential request
handling in a single shared memory environment.

I think you are making a distinction without a difference. I am not saying to exactly copy everything about channels, I am saying to learn aspects of architecture design from them.

If we had one long-running process that manages web-socket communication then sequential requests could have the ability via a constrained API to communicate with the long-running process to be able to use web socket communications. This as opposed to allowing all PHP requests to be long-running.

-Mike

5 years ago by Rowan Tommins — view source

unread

I'm not following the discussion 100% – more like 85% — but it seems
like what we might be saying is the need for a user-land
implementation of a long-running PHP request, one that does not timeout?

It's not about timing out, as such, it's about starting with a fresh
state each time a request comes in from the web server. So the
long-running script can't just "opt out of time outs", it's got to be
launched by something other than a request - in existing
implementations, it's basically started as a command-line script, and
then handles the networking itself.

Then we could look to prior art with GoLang channels where they "Communicate to share memory" and do not "Share memory to communicate." IOW, add an API that allows a regular PHP page to communicate with a long-running page. This would decouple and allow for better testing, and hopefully fewer hard to track down bugs.

In general, though, this is an interesting concept: keep each request
separate, but have a "master process" (initialised when the server
starts, or by something similar to a fork() call in a normal request)
that all requests can explicitly share things with. I'm not sure that
would work well for Web Sockets, because it still relies on the
traditional request-response cycle, but I've never really used them, so
don't know what kind of architectural patterns make sense for them.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by Mike Schinkel — view source

unread

It's not about timing out, as such, it's about starting with a fresh state each time a request comes in from the web server. So the long-running script can't just "opt out of time outs", it's got to be launched by something other than a request - in existing implementations, it's basically started as a command-line script, and then handles the networking itself.

Other than an assumption that we would use the same infrastructure we have for existing PHP requests, which is not an assumption I am making, why is it not technically possible to have an HTTP request be the trigger vs. a command line script? The long running process actually would return an HTTP status code letting the caller know if it was started or not.

Alternately an API called from a regular request could do the starting of the long-running process.

In that case we would of course need a regular request to be able to restart the long-running script too via an API, if needed. Or even terminate it.

In general, though, this is an interesting concept: keep each request separate, but have a "master process" (initialised when the server starts, or by something similar to a fork() call in a normal request) that all requests can explicitly share things with.

Yes. Thank you for acknowledging.

I'm not sure that would work well for Web Sockets, because it still relies on the traditional request-response cycle, but I've never really used them, so don't know what kind of architectural patterns make sense for them.

Considering the Swoole PHP extension (https://www.swoole.co.uk) the long-running process would take the place of its functionality and new apis to communicate with a long running process could allow communication that resulted in regular requests being able to send messages via the web socket server, and read queued messages from the web socket server.

You won't be able to process notifications to clients via request-response, but where that is needed it would be done by the long-running process. So the request-response might "register" a notification listener to the long-running process, for example.

-Mike

5 years ago by Robert Hickman — view source

unread

I'm not sure that would work well for Web Sockets, because it still
relies on the traditional request-response cycle, but I've never really
used them, so don't know what kind of architectural patterns make sense for
them.

Considering the Swoole PHP extension (https://www.swoole.co.uk) the
long-running process would take the place of its functionality and new apis
to communicate with a long running process could allow communication that
resulted in regular requests being able to send messages via the web socket
server, and read queued messages from the web socket server.

You won't be able to process notifications to clients via
request-response, but where that is needed it would be done by the
long-running process. So the request-response might "register" a
notification listener to the long-running process, for example.

A web socket is just a way of re-using the tcp connection from a http
request and holding it open for bidirectional communication. Due to this
you can make assumptions about performance charicteristics from other long
running socket server approaches.

From what i understand, the main problem with using fork or a thread pool
for handling sockets syncroniously is overhead caused by the switching
between userspace and os kernel space, as well as high memory overhead.
Using fork would have the same behaviour as something like apache prefork.
Handling a long running connection entails a process per connection, which
could mean an awful lot of processes.

Pretty much everything seems to be switching to async io based on some
event loop, as it allows a single process to handle requests from a large
number of connections, and long tunning but sparsely used connections don't
require holding processes open that are mostly doing nothing.

Based on what i see from the direction things are going in other languages,
just 'getting with the times' and switching to an event based model
probably makes the most sense.

Anyway, i think PHP really needs to be able to handle web sockets in core.

5 years ago by Larry Garfield — view source

unread

I'm not sure that would work well for Web Sockets, because it still
relies on the traditional request-response cycle, but I've never really
used them, so don't know what kind of architectural patterns make sense for
them.

Considering the Swoole PHP extension (https://www.swoole.co.uk) the
long-running process would take the place of its functionality and new apis
to communicate with a long running process could allow communication that
resulted in regular requests being able to send messages via the web socket
server, and read queued messages from the web socket server.

You won't be able to process notifications to clients via
request-response, but where that is needed it would be done by the
long-running process. So the request-response might "register" a
notification listener to the long-running process, for example.

A web socket is just a way of re-using the tcp connection from a http
request and holding it open for bidirectional communication. Due to this
you can make assumptions about performance charicteristics from other long
running socket server approaches.

From what i understand, the main problem with using fork or a thread pool
for handling sockets syncroniously is overhead caused by the switching
between userspace and os kernel space, as well as high memory overhead.
Using fork would have the same behaviour as something like apache prefork.
Handling a long running connection entails a process per connection, which
could mean an awful lot of processes.

Pretty much everything seems to be switching to async io based on some
event loop, as it allows a single process to handle requests from a large
number of connections, and long tunning but sparsely used connections don't
require holding processes open that are mostly doing nothing.

Based on what i see from the direction things are going in other languages,
just 'getting with the times' and switching to an event based model
probably makes the most sense.

Anyway, i think PHP really needs to be able to handle web sockets in core.

async IO and promises (sometimes wrapped into async/await) has been growing in popularity, but I would be highly cautious about adding them to PHP. Async functions are viral, and once you start down the async path, forever will it dominate your code base. Adding a single async call somewhere can force you to refactor large swaths of your code base.

This is an excellent writeup of why we should be very skeptical of that approach:

http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

I want to point out that there's 2 or 3 very different use cases and models being discussed in this thread, and it's important to keep them separate:

Faster bootstrap time for shared-nothing requests. This is the "partial start" option that a few have mentioned. This would be mainly to avoid the cost of data initialization like the DIC (as preloading already has the potential to avoid the cost of code initialization), which in turn could allow for different optimizations, like pre-creating all service objects rather than coming up with complex lazy-creation and code compilation logic. But each request is logically separate and cannot interact.
Long-lived processes. Currently this can only be done via the CLI, and is more the model used by most compiled languages. Usually this is done with a single process that multiplexes multiple requests, which requires some sort of multiplexing tool; either async IO, threads (including nicer models like Go and Rust use), or process pools. Let's please not assume async IO is the obvious best answer just because that's what Javascript does.
Technically, process pools could allow for long-lived processes that don't require inter-process synchronization but still allow processes to stay open until explicitly closed. That might be a more approachable model for most current PHP devs.

I cannot speak for how easy any of those would be to implement. But let's be careful which one we're talking about and not confuse them. Also, let's avoid the trap of assuming that a long-lived multiplexed process must be using async IO. That's only one model among many, and one with a lot of very unpleasant implications we should be very wary of.

--Larry Garfield

5 years ago by Rowan Tommins — view source

unread

I want to point out that there's 2 or 3 very different use cases and
models being discussed in this thread, and it's important to keep them
separate:
[...]

Can I add another that's been mentioned in passing:

A mechanism for working with WebSockets. This doesn't actually
require a different process model (although it would undoubtedly benefit
from one, in terms of system resources), but does require a new way to
talk to the web server / network (a new kind of "SAPI"). I can't think
of a fundamental reason why php-fpm couldn't leave a process running
that was connected as one end of a web socket, but doing so would be
pretty useless if there was no API to send and receive data over that
socket.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by Thomas Hruska — view source

unread

I'm not following the discussion 100% – more like 85% — but it seems
like what we might be saying is the need for a user-land
implementation of a long-running PHP request, one that does not timeout?

It's not about timing out, as such, it's about starting with a fresh
state each time a request comes in from the web server. So the
long-running script can't just "opt out of time outs", it's got to be
launched by something other than a request - in existing
implementations, it's basically started as a command-line script, and
then handles the networking itself.

Then we could look to prior art with GoLang channels where they
"Communicate to share memory" and do not "Share memory to
communicate." IOW, add an API that allows a regular PHP page to
communicate with a long-running page. This would decouple and allow
for better testing, and hopefully fewer hard to track down bugs.

In general, though, this is an interesting concept: keep each request
separate, but have a "master process" (initialised when the server
starts, or by something similar to a fork() call in a normal request)
that all requests can explicitly share things with. I'm not sure that
would work well for Web Sockets, because it still relies on the
traditional request-response cycle, but I've never really used them, so
don't know what kind of architectural patterns make sense for them.

Regards,

WebSocket is a framing protocol over TCP/IP for long-lived (potentially
hours), mostly quiet, two-way connections to a web server usually from a
compliant web browser that tends to slowly use more and more RAM on the
client for whatever reason. My favorite example of WebSocket usage is
lightningmaps.org - accurately triangulates lightning strikes using
ground-based sensor stations thousands of miles apart.

People tend to write WebSocket servers in NodeJS partly because they
don't realize that PHP can already do the same. Example:

https://github.com/cubiclesoft/php-drc

In recent years, I've used PHP for a lot of non-web things. In fact, a
lot of my real PHP work these days is CLI-based usually running either
as cron jobs or as installable at-boot system services via Service Manager.

I recommend reading libev's documentation regarding various "special
problems." PHP would be implicitly taking on a lot of complicated
issues that even the authors of libev and libuv (NodeJS uses libuv)
haven't entirely solved and then passing those issues onto users. It
would be important to at least point the various issues out to users in
documentation.

I personally prefer to use userland isolated libraries even though it
takes me slightly further away from the metal. From my experience,
libev and subsequently PECL ev are a more natural transition for
existing socket code whereas Swoole/libuv basically demand a rewrite for
little gain. The only downside to using PECL ev is that its support for
Windows is actually non-functional despite PECL generating DLLs.

WebSocket generally introduces network and processing overhead - HTTP
headers and parsing for setup + framing protocol handling. In many
cases, a simpler "JSON blob per newline" approach works just as well (if
not better) and can afford better isolation and performance models (i.e.
not everything has to be WebSocket). There are plenty of flaws inherent
to the design of the WebSocket protocol itself (some are
security-oriented) and so anything built on it shares those flaws.

WebSocket and WebRTC SAPIs could be an interesting direction to go in.
I'm not opposed to the idea. I do think though that people need to be
far more reserved when it comes to writing TCP/IP servers than they are
right now. A lot of careful thought needs to happen prior to writing
bind(). Network code is quite notably hard to get right and complexity
multiplies with multi-OS/platform support.

--
Thomas Hruska
CubicleSoft President

I've got great, time saving software that you will find useful.

http://cubiclesoft.com/

And once you find my software useful:

http://cubiclesoft.com/donate/

5 years ago by Peter Bowyer — view source

unread

People tend to write WebSocket servers in NodeJS partly because they
don't realize that PHP can already do the same. Example:

https://github.com/cubiclesoft/php-drc

I didn't realize, so this is a great share. Thanks.

WebSocket generally introduces network and processing overhead - HTTP
headers and parsing for setup + framing protocol handling. In many
cases, a simpler "JSON blob per newline" approach works just as well (if
not better) and can afford better isolation and performance models (i.e.
not everything has to be WebSocket). There are plenty of flaws inherent
to the design of the WebSocket protocol itself (some are
security-oriented) and so anything built on it shares those flaws.

This critique of WebSockets sounds similar to that of https://mercure.rocks/,
which uses HTTP/2 and Server-Sent Events instead of WebSockets.

I'm interested in WebSockets because I've been following the development of
Phoenix LiveView [1] for the last 13 months. Something similar in PHP would
be awesome (though the statelessness of PHP will complicate matters), but
when I looked at Swoole et al they wouldn't work with my existing
framework-based code. An approach that avoids rewriting, like some form of
built-in support in the language would be good for this alone.

Peter

https://dockyard.com/blog/2018/12/12/phoenix-liveview-interactive-real-time-apps-no-need-to-write-javascript

5 years ago by Rowan Tommins — view source

unread

On Tue, 28 Jan 2020 at 15:59, Peter Bowyer phpmailinglists@gmail.com
wrote:

An approach that avoids rewriting, like some form of
built-in support in the language would be good for this alone.

I'd just like to point out that those two things are orthogonal: the fact
that Swoole is distributed as an extension is not the reason it's
incompatible with your existing code, and building a similar implementation
into PHP under a different name wouldn't make the migration any easier.

Regards,

Rowan Tommins
[IMSoP]

5 years ago by Peter Bowyer — view source

unread

I'd just like to point out that those two things are orthogonal: the fact
that Swoole is distributed as an extension is not the reason it's
incompatible with your existing code, and building a similar implementation
into PHP under a different name wouldn't make the migration any easier.

You're absolutely right. The difference I'm thinking is that if there is
built-in support in the language, frameworks will embrace it. At the moment
I'd need to make my own fork to add compatibility with Swoole et al (or use
one of the experimental but unsupported forks out there), which isn't
attractive.

Peter

5 years ago by Robert Hickman — view source

unread

On Wed, 29 Jan 2020, 7:42 pm Peter Bowyer, phpmailinglists@gmail.com
wrote:

On Tue, 28 Jan 2020 at 17:12, Rowan Tommins rowan.collins@gmail.com
wrote:

I'd just like to point out that those two things are orthogonal: the fact
that Swoole is distributed as an extension is not the reason it's
incompatible with your existing code, and building a similar
implementation
into PHP under a different name wouldn't make the migration any easier.

You're absolutely right. The difference I'm thinking is that if there is
built-in support in the language, frameworks will embrace it. At the moment
I'd need to make my own fork to add compatibility with Swoole et al (or use
one of the experimental but unsupported forks out there), which isn't
attractive.

This was why i raised the issue. If it is covered in the core more people
are going to use it. It would be espesially good if it could be supported
within the shared hosting settings where php is commonly used.

As swoole etc have allready done good work, would it make sense to adopt
one of these as official?

5 years ago by Dik Takken — view source

unread

PHP is pretty unusual in comparison to most web platforms nowadays as it
runs each request in an isolated process. Web development in other
languages is instead based around a long lived set of processes, which
serve multiple requests.

The shared-nothing architecture of PHP is the very thing that makes it
simple and robust for web development by default. I do a lot of Python
web development as well and the fact that it implicitly shares state
between handling multiple requests has bitten me more than once.

The same is true for writing multi-threaded code. This is very tricky
business in most programming languages. PHP extension 'parallel' by Joe
Watkins leverages the shared-nothing architecture of PHP to make writing
multi-threaded code simpler and more robust. Very clever.

That model has advantages in that it is very easy to cache data in process,
and should be simpler in theory to get good performance as all code can be
loaded into memory once during startup. Autoloading therefore goes away.

Yes, getting better performance is easier but writing robust code
becomes harder.

There are userland implementations like PHP-PM, but I think it good to have
an official way of running code in this way in php 8.

I'm not sure what you mean by an 'official way'. What is the problem
with using one of the userland implementations?

One thing I can imagine PHP could offer in this area is exposing the
existing internal mechanism to mark data as persistent. This mechanism
is used by extensions to offer persistent network connections for
instance. It could be used to implement a language feature to allow
sharing specific state between requests in an explicit way. When using
an event loop based framework everything is shared implicitly. Now
suppose that we could declare a static variable like this:

static persistent $var;

When the request ends the PHP interpreter data is cleared to handle the
next request. Only the data that is explicitly marked as persistent
survives the request and is available in the next.

This would allow retaining the current shared-nothing architecture while
offering the means to break the rules in a well defined way.

Regards,
Dik Takken