Hi,
PCS provides a fast and easy mechanism to mix C and PHP code in PHP
extensions (more about PCS at http://pcs.tekwire.net). Thanks to the PHP
7 performance improvement and the inclusion of opcache in the core, a
lot of existing non-performance-critical extension code may now be
converted to PHP without significant performance loss (this must be
measured case by case, of course, but tests show that opcode-cached PHP
code is often faster than C).
Another motivation is the lack of extension maintainers. It may be
complex to convert a C extension to PHP but, once it's done, maintenance
becomes much easier.
As one of PCS goals is to allow converting parts of existing core
extensions to PHP, it seems natural to initiate the movement by an
inclusion of PCS in the core distribution. Then, I and others will start
proposing conversions of existing code. IMO, the PDO generic layer is a
perfect candidate, but there are many others.
Converting existing C code to PHP is not the only usage. With PCS,
adding an OO layer to a function-only extension becomes an easy task.
Sara recently told about a curl OOP layer
(https://gist.github.com/sgolemon/e95bfc34d34c4f63fa953ee9294ae02c).
Using PCS, adding such PHP code on top of the curl extension would take
less than one hour.
I hadn't proposed this so far because the 'cache_key' operation
currently proposed for 7.2 is a pre-requisite, as PCS exposes the PHP
code it manages via a stream wrapper.
So, please give me your thoughts. Suggestions of potential candidates to
be rewritten from C to PHP are welcome too.
Regards
François
Hi François
2017-06-05 19:46 GMT+02:00 François Laupretre francois@tekwire.net:
Hi,
PCS provides a fast and easy mechanism to mix C and PHP code in PHP
extensions (more about PCS at http://pcs.tekwire.net). Thanks to the PHP 7
performance improvement and the inclusion of opcache in the core, a lot of
existing non-performance-critical extension code may now be converted to PHP
without significant performance loss (this must be measured case by case, of
course, but tests show that opcode-cached PHP code is often faster than C).
I think this is on the edge of being too late for 7.2, remember we got
the first alpha coming this thursday and the feature freeze on July
20th.
On topic, I think it looks very interesting, tho I need to dig a bit
more into it and would look forward to a formal RFC.
--
regards,
Kalle Sommer Nielsen
kalle@php.net
So, please give me your thoughts. Suggestions of potential candidates to
be rewritten from C to PHP are welcome too.Regards
François
Hi François!
I really, really like this. It would allow us to write most of the stuff
in PHP, especially the reflection part, while delegating to C where
appropriate.
I skimmed through the documentation of yours. There is however one
question left. Is it possible to have C code that is accessible only to
the PHP code of an extensions, instead of all user-level code?
Some things are simply easier in C than in PHP (binary).
--
Richard "Fleshgrinder" Fussenegger
Hi,
Le 05/06/2017 à 21:26, Fleshgrinder a écrit :
I skimmed through the documentation of yours. There is however one
question left. Is it possible to have C code that is accessible only to
the PHP code of an extensions, instead of all user-level code?
As extension PHP code is executed in the same context as user-level
code, there is no way to have such access restriction. The workaround,
IMO, would be to expose non-documented methods or functions. But nothing
will prevent user-level code to call them.
Regards
François
Le 05/06/2017 à 19:46, François Laupretre a écrit :
Hi,
PCS provides a fast and easy mechanism to mix C and PHP code in PHP
extensions (more about PCS at http://pcs.tekwire.net). Thanks to the PHP
7 performance improvement and the inclusion of opcache in the core, a
lot of existing non-performance-critical extension code may now be
converted to PHP without significant performance loss (this must be
measured case by case, of course, but tests show that opcode-cached PHP
code is often faster than C).
Sorry, but I don't like the idea of having PHP code bundled in C extension.
Have low-level part written in C, and user-land part in PHP is indeed a
good way (e.g. mondodb, phpiredis + phredis...), but having the PHP
library distributed via composer or any other way is enough.
Remi.
P.S. IIRC couchbase have tried this way, and revert it
Le 05/06/2017 à 19:46, François Laupretre a écrit :
PCS provides a fast and easy mechanism to mix C and PHP code in PHP
extensions (more about PCS at http://pcs.tekwire.net). Thanks to the
PHP 7 performance improvement and the inclusion of opcache in the
core, a lot of existing non-performance-critical extension code may
now be converted to PHP without significant performance loss (this
must be measured case by case, of course, but tests show that
opcode-cached PHP code is often faster than C).Sorry, but I don't like the idea of having PHP code bundled in C
extension.Have low-level part written in C, and user-land part in PHP is indeed
a good way (e.g. mondodb, phpiredis + phredis...), but having the PHP
library distributed via composer or any other way is enough.P.S. IIRC couchbase have tried this way, and revert it
We did it as well, in the early early days. It wasn't great and reverted
that too. I also believe PHP code should be distributed through
composer, and that is much easier to upgrade, as well as allow for
multiple versions running on the same server.
cheers,
Derick
Hi,
Le 06/06/2017 à 11:13, Derick Rethans a écrit :
Sorry, but I don't like the idea of having PHP code bundled in C
extension.Have low-level part written in C, and user-land part in PHP is indeed
a good way (e.g. mondodb, phpiredis + phredis...), but having the PHP
library distributed via composer or any other way is enough.P.S. IIRC couchbase have tried this way, and revert it
We did it as well, in the early early days. It wasn't great and reverted
that too. I also believe PHP code should be distributed through
composer, and that is much easier to upgrade, as well as allow for
multiple versions running on the same server.
I agree that, in the case of 3rd-party code, like mongodb, splitting the
C code and the PHP code brings benefits. The biggest one being that you
can release the PHP package more often than the C extension (at the
expense of maintaining compatibility between both packages). This also
takes place in the context of an application, whose dependencies are
managed by composer.
What I am proposing here is very different, as the main objective is to
dramatically reduce the line count of the core source, without
significant performance loss. If we had an army of C developers
maintaining every core extension, maybe we wouldn't need that, but we
don't (we even have fewer and fewer). What we have instead is thousands
of lines of C code without any active maintainer. 'phar' is an example
we talked about recently, but there are many others. Converting some of
this code to PHP without loosing performance would improve the
situation, IMO. So, while I agree that 3rd-party extensions may have
very good reasons to maintain both an extension and a PHP package,
opposing this for core extensions is very different.
Regards
François
Am 06.06.2017 um 12:27 schrieb François Laupretre:
What I am proposing here is very different, as the main objective is to
dramatically reduce the line count of the core source, without
significant performance loss. If we had an army of C developers
maintaining every core extension, maybe we wouldn't need that, but we
don't (we even have fewer and fewer). What we have instead is thousands
of lines of C code without any active maintainer. 'phar' is an example
we talked about recently, but there are many others. Converting some of
this code to PHP without loosing performance would improve the
situation, IMO. So, while I agree that 3rd-party extensions may have
very good reasons to maintain both an extension and a PHP package,
opposing this for core extensions is very different.
but what is the difference? just because you re-write some code in a
different programming language don't grow maintainers for the future of
that code
Le 06/06/2017 à 12:33, lists@rhsoft.net a écrit :
Am 06.06.2017 um 12:27 schrieb François Laupretre:
What I am proposing here is very different, as the main objective is
to dramatically reduce the line count of the core source, without
significant performance loss. If we had an army of C developers
maintaining every core extension, maybe we wouldn't need that, but we
don't (we even have fewer and fewer). What we have instead is
thousands of lines of C code without any active maintainer. 'phar' is
an example we talked about recently, but there are many others.
Converting some of this code to PHP without loosing performance would
improve the situation, IMO. So, while I agree that 3rd-party
extensions may have very good reasons to maintain both an extension
and a PHP package, opposing this for core extensions is very different.but what is the difference? just because you re-write some code in a
different programming language don't grow maintainers for the future
of that code
Wrong. Moving code from C to PHP reduces the code size, improves
readability, and dramatically increases the count of potential
maintainers. How many times did we get messages on the list such as 'I
would love improving/maintaining the xxx extension but I cannot program
in C' ?
Let's take the 'phar' extension as an example. The source code is about
18,000 lines of C. After a quick look, I consider that more than 90 % of
this code can be rewritten in PHP without loosing ANY performance
(because this code is used during package creation only). Prior C to PHP
conversions show that the resulting PHP line count is about 10 % of the
original. So, we can transform 18,000 lines of very complex C code into
about 1,500 lines of PHP and probably less than 1,000 remaining lines of
C. From a maintainability POV, this makes the situation very different.
After such an operation, phar can attract active maintainers and evolve.
If it remains as it is now, experience shows that it is frozen for a
very long time.
Regards
François
Am 06.06.2017 um 13:06 schrieb François Laupretre:
Le 06/06/2017 à 12:33, lists@rhsoft.net a écrit :
Am 06.06.2017 um 12:27 schrieb François Laupretre:
What I am proposing here is very different, as the main objective is
to dramatically reduce the line count of the core source, without
significant performance loss. If we had an army of C developers
maintaining every core extension, maybe we wouldn't need that, but we
don't (we even have fewer and fewer). What we have instead is
thousands of lines of C code without any active maintainer. 'phar' is
an example we talked about recently, but there are many others.
Converting some of this code to PHP without loosing performance would
improve the situation, IMO. So, while I agree that 3rd-party
extensions may have very good reasons to maintain both an extension
and a PHP package, opposing this for core extensions is very different.but what is the difference? just because you re-write some code in a
different programming language don't grow maintainers for the future
of that codeWrong. Moving code from C to PHP reduces the code size, improves
readability, and dramatically increases the count of potential
maintainers. How many times did we get messages on the list such as 'I
would love improving/maintaining the xxx extension but I cannot program
in C' ?Let's take the 'phar' extension as an example. The source code is about
18,000 lines of C. After a quick look, I consider that more than 90 % of
this code can be rewritten in PHP without loosing ANY performance
(because this code is used during package creation only). Prior C to PHP
conversions show that the resulting PHP line count is about 10 % of the
original. So, we can transform 18,000 lines of very complex C code into
about 1,500 lines of PHP and probably less than 1,000 remaining lines of
C. From a maintainability POV, this makes the situation very different.
After such an operation, phar can attract active maintainers and evolve.
If it remains as it is now, experience shows that it is frozen for a
very long time.
looking at the code quality (style, readability, robustness,
error-handling) of 99% of php userland code out there - which is
horrible to say it nice - even if all that is true i still doubt that
it improves quality in the long term, sometimes it's better working
things are not maintained then badly maintained
looking at the code quality (style, readability, robustness,
error-handling) of 99% of php userland code out there - which is
horrible to say it nice - even if all that is true i still doubt that
it improves quality in the long term, sometimes it's better working
things are not maintained then badly maintained
There is no reason to assume either that we would attract the worst possible PHP programmers, or that we currently attract the best possible C programmers. Indeed, it's likely that a lot of existing extensions have poor style, lack of robustness, etc, because they were written by people "speaking a second language", i.e. PHP programmers trying their hand at C.
I'm not even sure your last sentence is true very often - changes to the core require changes to extensions, so either the entire core stagnates (in fear of breaking things) or extensions get abandoned (because rather than working but unmaintained, they are now broken and unmaintained).
There are certainly details to be worked out, but I think the principle of making it easier to build and maintain a rich core library is a very good one.
Regards,
--
Rowan Collins
[IMSoP]
Am 06.06.2017 um 15:33 schrieb Rowan Collins:
looking at the code quality (style, readability, robustness,
error-handling) of 99% of php userland code out there - which is
horrible to say it nice - even if all that is true i still doubt that
it improves quality in the long term, sometimes it's better working
things are not maintained then badly maintainedThere is no reason to assume either that we would attract the worst possible PHP programmers, or that we currently attract the best possible C programmers. Indeed, it's likely that a lot of existing extensions have poor style, lack of robustness, etc, because they were written by people "speaking a second language", i.e. PHP programmers trying their hand at C.
I'm not even sure your last sentence is true very often - changes to the core require changes to extensions, so either the entire core stagnates (in fear of breaking things) or extensions get abandoned (because rather than working but unmaintained, they are now broken and unmaintained).
There are certainly details to be worked out, but I think the principle of making it easier to build and maintain a rich core library is a very good one
that's all nice but in this thread even "composer" was brought once
again to the game - frankly making composer mandatory will lead to put a
lot of things require it on a blacklist for a many people because i am
pretty sure speaking in this context for a silent mass (just the offlist
responses on other threads where i called composer a red line for me
with the summary "and i thought i am the only one" are enough to back
this up)
where will this php scripts stored - how do they deal with openbasedir -
do you need to place their location in openbasedir while you normally
avoid to add anything oustide your application there - and so on
that's all nice but in this thread even "composer" was brought once
again to the game - frankly making composer mandatory ...
Don't panic. Composer was mentioned in the context of third parties (e.g. mongodb) which want to distribute both a low-level driver, and a high-level class library. It was then immediately clarified that that is not the use case being discussed here, and that this discussion is explicitly about code which can't be distributed and installed separately from the core.
Please have another look at what this thread is about before turning it into a platform for your favourite rants.
Regards,
--
Rowan Collins
[IMSoP]
Hi,
Le 06/06/2017 à 17:19, lists@rhsoft.net a écrit :
where will this php scripts stored - how do they deal with openbasedir
- do you need to place their location in openbasedir while you
normally avoid to add anything oustide your application there - and so on
Your question proves you didn't even take the time to read the
'Introduction to PCS' document
(http://www.tekwire.net/joomla/projects/pcs/intro) before starting to
complain. Please read it first and you will understand that the PHP
scripts we are talking about would be embedded into the extension shared
library file and transparently loaded when needed. composer is also out
of scope here.
Regards
François
Am 06.06.2017 um 18:49 schrieb François Laupretre:
Le 06/06/2017 à 17:19, lists@rhsoft.net a écrit :
where will this php scripts stored - how do they deal with openbasedir
- do you need to place their location in openbasedir while you
normally avoid to add anything oustide your application there - and so onYour question proves you didn't even take the time to read the
'Introduction to PCS' document
(http://www.tekwire.net/joomla/projects/pcs/intro) before starting to
complain. Please read it first and you will understand that the PHP
scripts we are talking about would be embedded into the extension shared
library file and transparently loaded when needed. composer is also out
of scope here
the point is that people should stop talking about composer in every
context like a holy grail because it has no place for proper systems
management and i was not the one calling composer first in this thread
the point is that people should stop talking about composer
They did, 7 hours ago, after François clarified why it's not relevant for this particular case: https://externals.io/thread/926#email-15428
Can we move on now please?
Thanks,
--
Rowan Collins
[IMSoP]
On Mon, Jun 5, 2017 at 7:46 PM, François Laupretre francois@tekwire.net
wrote:
Hi,
PCS provides a fast and easy mechanism to mix C and PHP code in PHP
extensions (more about PCS at http://pcs.tekwire.net). Thanks to the PHP
7 performance improvement and the inclusion of opcache in the core, a lot
of existing non-performance-critical extension code may now be converted to
PHP without significant performance loss (this must be measured case by
case, of course, but tests show that opcode-cached PHP code is often faster
than C).Another motivation is the lack of extension maintainers. It may be complex
to convert a C extension to PHP but, once it's done, maintenance becomes
much easier.As one of PCS goals is to allow converting parts of existing core
extensions to PHP, it seems natural to initiate the movement by an
inclusion of PCS in the core distribution. Then, I and others will start
proposing conversions of existing code. IMO, the PDO generic layer is a
perfect candidate, but there are many others.Converting existing C code to PHP is not the only usage. With PCS, adding
an OO layer to a function-only extension becomes an easy task. Sara
recently told about a curl OOP layer (https://gist.github.com/sgole
mon/e95bfc34d34c4f63fa953ee9294ae02c). Using PCS, adding such PHP code on
top of the curl extension would take less than one hour.I hadn't proposed this so far because the 'cache_key' operation currently
proposed for 7.2 is a pre-requisite, as PCS exposes the PHP code it manages
via a stream wrapper.So, please give me your thoughts. Suggestions of potential candidates to
be rewritten from C to PHP are welcome too.
Hi,
First of all: I think the ability to implement parts of PHP extensions in
PHP is extremely important and will be a game changer in our ability to
maintain and improve our standard library.
There are essentially only two good reasons for implementing functionality
in C: One is performance, the other is FFI. Unfortunately, the requirement
to use C for everything inside an extension means that we write a large
amounts of C code that does not fall into either of those categories. The
resulting code is hard to maintain, often subtly buggy and usually not
consistent with ordinary userland PHP code. Typical issues we see all the
time are bad or completely absent serialization support, lack of circular
garbage collection, crashes when the object is improperly initialized and
bugs appearing when internal classes are extended.
On top of that, implementing certain functionality in C actually makes the
resulting code slower than equivalent PHP code. While our virtual machine
is highly optimized, our internal APIs are often not, or not typically used
in their most efficient form. One case where internal code loses are
invocations of userland callbacks. Another is access to properties.
The current situation also has a large and somewhat hidden impact on our
API design. Due to the large maintenance burden that implementing "proper"
APIs imposes on us, we tend to go with the simplest possible API. Usually
this means that we end up directly exposing C binding APIs, even if they
are a very bad fit for PHP. As already noted in this thread, the current
curl API is such an example. (I know that some people will argue that its
better to expose simple procedural APIs rather than fancy object oriented
APIs -- however, that's a choice that should be made based on technical
arguments, not due to technical limitations.)
Some people have mentioned that this is better solved by shipping the PHP
code separately using composer. While this may be viable for 3rd party
extensions (and may be preferable if they have large fractions of PHP
code), this option does not exist for our standard library. We can hardly
tell people that they should go install a composer package in order to make
use of some APIs in our standard library.
Anyway, to get back to the topic of PCS. First, I would recommend to target
PHP 7.3 for this change. Feature freeze for 7.2 is in a bit over a month
and I think we'll want to make some non-trivial changes to how this works
if we integrate it in PHP. If added to PHP, I think this should be
integrated into the core, rather than being an extension.
Here are some random thoughts:
-
As far as I understand, PCS relies on autoloading. There are two issues
here: First, autoloading does not register symbols prior to autoloading.
This means that functions like get_defined_classes() will not behave as
expected. Second, autoloading does not support functions. I think both of
these problems can be solved with some up-front symbol analysis. Lazily
compiling internal functions should not run into any of the problems we
have with userland function autoloading. -
It has already been mentioned in the thread, but what seems to lack
right now is a good way of integrating PHP and C portions. As far as I
understand, PCS allows you to write an entire class in PHP, but it does not
allow you to offload parts of the functionality to C without exposing
additional public APIs. I think there are two things we can do here:
a) Provide a mechanism that makes certain functions only available inside
extension PHP code. This would allow exposing some private PHP functions
which are only used in the internal implementation of the extension.
b) Allow binding some methods specified in PHP to an internal
implementation. The way this works in HHVM is that the PHP file contains
just a signature, with an attribute that signifies that an internal
implementation will be bound to that function:
class Bar {
<<__Native>>
function foo($args);
}
This would be useful for other reasons as well. In particular, this could
become a replacement for the existing arginfo-based signature
specification, which is somewhat limited and causes discrepancies with
userland classes. For example, arginfo does not support default values.
Regards,
Nikita
Hi together,
Am 06.06.2017 um 14:43 schrieb Nikita Popov:
First of all: I think the ability to implement parts of PHP extensions in
PHP is extremely important and will be a game changer in our ability to
maintain and improve our standard library.
I agree with you here 100%.
- As far as I understand, PCS relies on autoloading. There are two issues
here: First, autoloading does not register symbols prior to autoloading.
This means that functions like get_defined_classes() will not behave as
expected. Second, autoloading does not support functions. I think both of
these problems can be solved with some up-front symbol analysis. Lazily
compiling internal functions should not run into any of the problems we
have with userland function autoloading.
(disclaimer: I have not deep knowledge how the core really works)
From what I see there is a difference between writing extension code in
PHP and placing that plain PHP files somewhere when installing. So I
would argue when compiling the extension or the core, all these PHP
extension stuff should get compiled to OpCodes and put in a segment of
the resulting binary.
The OpCache could then be modified to allow it to use the OpCodes from
these read-only segments (one for the core, one optionally for each
extension). That would void the problems you mentioned above, but maybe
I am missing how the OpCache and symbol registration work in PHP.
Greets
Dennis
First of all: I think the ability to implement parts of PHP extensions in
PHP is extremely important and will be a game changer in our ability to
maintain and improve our standard library.
Ditto this. The argument for having a MUCH larger pool of maintainers
is #1. The reduced footprint for crash-bugs is a very close #2.
Anyway, to get back to the topic of PCS. First, I would recommend to target
PHP 7.3 for this change. Feature freeze for 7.2 is in a bit over a month
and I think we'll want to make some non-trivial changes to how this works
if we integrate it in PHP. If added to PHP, I think this should be
integrated into the core, rather than being an extension.
As an RM for 7.2 and someone who's been daydreaming of something
like PCS for awhile, I'm going to second this strongly. I literally
cut the alpha1 tag this morning (for Thursday's official release), and
something like PCS is pretty significant in its disruptive capacity
(not to mention being lower-value until we start actually moving
some of those extensions to PHP code). Save this for 7.3, please.
- It has already been mentioned in the thread, but what seems to lack
right now is a good way of integrating PHP and C portions. As far as I
understand, PCS allows you to write an entire class in PHP, but it does not
allow you to offload parts of the functionality to C without exposing
additional public APIs. I think there are two things we can do here:b) Allow binding some methods specified in PHP to an internal
implementation. The way this works in HHVM is that the PHP file contains
just a signature, with an attribute that signifies that an internal
implementation will be bound to that function:
As the author of HNI*, I look forward to iterating on this as a
potential design (maybe learning from HNI's experience and making
something better), but I regard it as a "version 2" thing. We can get
PCS good and stable first, THEN worry about bridging to C later.
This would be useful for other reasons as well. In particular, this could
become a replacement for the existing arginfo-based signature
specification, which is somewhat limited and causes discrepancies with
userland classes. For example, arginfo does not support default values.
100% this, though PHP's version of HNI will suffer a few shortcomings
due to the lack of a type_traits equivalent in C99. I'm not
suggesting we go C++11 just to get a better bridge, but it's a real
constraint in getting the same advantages that HNI has.
TL;DR - This is one giant "Me Too" response. :p
-Sara
- HHVM Native Interface. i.e. Exactly what Niki just described.
I agree with Nikita and Sara here, interfacing between PHP and C however
would be very important. My current UUID proposal for instance. Doing
the bit shifting in PHP is a pain. Doing it in C is a breeze. However,
doing the signatures and accessors in PHP would be MUCH simpler.
There are essentially only two good reasons for implementing functionality
in C: One is performance, the other is FFI. Unfortunately, the requirement
to use C for everything inside an extension means that we write a large
amounts of C code that does not fall into either of those categories. The
resulting code is hard to maintain, often subtly buggy and usually not
consistent with ordinary userland PHP code. Typical issues we see all the
time are bad or completely absent serialization support, lack of circular
garbage collection, crashes when the object is improperly initialized and
bugs appearing when internal classes are extended.
I think that one of the main reasons for this is that lots of the C code
implements this stuff again, custom made. Instead of just using the
default stuff.
Regarding the maintainer problem.
PHP internals is a very hard turf and literally has a very bad
reputation out there. It is very hard to get in, and it is very hard to
contribute. Other communities (Go, Rust, ...) are much more welcoming. I
think that the move to GitHub already helped a little, but it needs to
open up even more. Internals needs to encourage, support, guide, and not
simply turn down every idea. The internals book goes in the right
direction here. Going more community with stuff like the mailing list
(maybe a forum that is easier to join) and a chat (maybe something like
gitter) are only tiny things that can help a lot here. We can learn from
the other communities. I think that there are more than enough people
out there who would be able to write some C.
100% this, though PHP's version of HNI will suffer a few shortcomings
due to the lack of a type_traits equivalent in C99. I'm not
suggesting we go C++11 just to get a better bridge, but it's a real
constraint in getting the same advantages that HNI has.
Upgrading to C99 is imho long overdue! No clue why we are not finally
doing the switch.
I'd rather invest in Rust than C++11, seriously. C++ (regardless of
version) is as painful as C after all. Sure, RAII solves all problems,
but than we could also do Python instead of PHP if conventions is all we
ask for. ;)
--
Richard "Fleshgrinder" Fussenegger
Hi Nikita,
Le 06/06/2017 à 14:43, Nikita Popov a écrit :
Anyway, to get back to the topic of PCS. First, I would recommend to
target PHP 7.3 for this change. Feature freeze for 7.2 is in a bit
over a month and I think we'll want to make some non-trivial changes
to how this works if we integrate it in PHP. If added to PHP, I think
this should be integrated into the core, rather than being an extension.
Agreed. My initial choice was to keep it as a separate extension. This
was fine for a proof of concept but we cannot solve the points raised
below without a tighter integration with the core. I would have liked to
keep the coupling as loose as possible but solving the remaining issues
is more important. So, let's consider 7.3 as the new target.
- As far as I understand, PCS relies on autoloading. There are two
issues here: First, autoloading does not register symbols prior to
autoloading. This means that functions like get_defined_classes() will
not behave as expected. Second, autoloading does not support
functions. I think both of these problems can be solved with some
up-front symbol analysis. Lazily compiling internal functions should
not run into any of the problems we have with userland function
autoloading.
Right. PCS implements a workaround by loading the code containing
functions at the beginning of each request. So, my plan was a future
extension of the autoloading mechanism to functions (and constants).
The PCS autoloader already extracts symbols from the PHP code at
registration time. So, PCS already has a list of the class, function,
and constant names it manages. I don't know what you mean with 'Lazily
compiling internal functions'. Autoloading, combined with opcache,
avoids to execute the whole code at the beginning of each request but
there are probably better alternatives.
- It has already been mentioned in the thread, but what seems to lack
right now is a good way of integrating PHP and C portions. As far as I
understand, PCS allows you to write an entire class in PHP, but it
does not allow you to offload parts of the functionality to C without
exposing additional public APIs. I think there are two things we can
do here:a) Provide a mechanism that makes certain functions only available
inside extension PHP code. This would allow exposing some private PHP
functions which are only used in the internal implementation of the
extension.b) Allow binding some methods specified in PHP to an internal
implementation. The way this works in HHVM is that the PHP file
contains just a signature, with an attribute that signifies that an
internal implementation will be bound to that function:class Bar {
<<__Native>>
function foo($args);
}This would be useful for other reasons as well. In particular, this
could become a replacement for the existing arginfo-based signature
specification, which is somewhat limited and causes discrepancies with
userland classes. For example, arginfo does not support default values.
Implementing this may be beyond my capacity but, if you agree, we can
work together to implement solutions to these issues. Other volunteers
are welcome too.
Regards
François
Hi,
-----Original Message-----
From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Tuesday, June 6, 2017 2:43 PM
To: François Laupretre francois@tekwire.net
Cc: PHP internals internals@lists.php.net
Subject: Re: [PHP-DEV] Proposing inclusion of PCS in the 7.2 core distributionOn Mon, Jun 5, 2017 at 7:46 PM, François Laupretre francois@tekwire.net
wrote:Hi,
PCS provides a fast and easy mechanism to mix C and PHP code in PHP
extensions (more about PCS at http://pcs.tekwire.net). Thanks to the
PHP
7 performance improvement and the inclusion of opcache in the core, a
lot of existing non-performance-critical extension code may now be
converted to PHP without significant performance loss (this must be
measured case by case, of course, but tests show that opcode-cached
PHP code is often faster than C).Another motivation is the lack of extension maintainers. It may be
complex to convert a C extension to PHP but, once it's done,
maintenance becomes much easier.As one of PCS goals is to allow converting parts of existing core
extensions to PHP, it seems natural to initiate the movement by an
inclusion of PCS in the core distribution. Then, I and others will
start proposing conversions of existing code. IMO, the PDO generic
layer is a perfect candidate, but there are many others.Converting existing C code to PHP is not the only usage. With PCS,
adding an OO layer to a function-only extension becomes an easy task.
Sara recently told about a curl OOP layer
(https://gist.github.com/sgole mon/e95bfc34d34c4f63fa953ee9294ae02c).
Using PCS, adding such PHP code on top of the curl extension would take less
than one hour.I hadn't proposed this so far because the 'cache_key' operation
currently proposed for 7.2 is a pre-requisite, as PCS exposes the PHP
code it manages via a stream wrapper.So, please give me your thoughts. Suggestions of potential candidates
to be rewritten from C to PHP are welcome too.Hi,
First of all: I think the ability to implement parts of PHP extensions in PHP is
extremely important and will be a game changer in our ability to maintain and
improve our standard library.There are essentially only two good reasons for implementing functionality in C:
One is performance, the other is FFI. Unfortunately, the requirement to use C
for everything inside an extension means that we write a large amounts of C
code that does not fall into either of those categories. The resulting code is hard
to maintain, often subtly buggy and usually not consistent with ordinary userland
PHP code. Typical issues we see all the time are bad or completely absent
serialization support, lack of circular garbage collection, crashes when the
object is improperly initialized and bugs appearing when internal classes are
extended.On top of that, implementing certain functionality in C actually makes the
resulting code slower than equivalent PHP code. While our virtual machine is
highly optimized, our internal APIs are often not, or not typically used in their
most efficient form. One case where internal code loses are invocations of
userland callbacks. Another is access to properties.The current situation also has a large and somewhat hidden impact on our API
design. Due to the large maintenance burden that implementing "proper"
APIs imposes on us, we tend to go with the simplest possible API. Usually this
means that we end up directly exposing C binding APIs, even if they are a very
bad fit for PHP. As already noted in this thread, the current curl API is such an
example. (I know that some people will argue that its better to expose simple
procedural APIs rather than fancy object oriented APIs -- however, that's a
choice that should be made based on technical arguments, not due to technical
limitations.)Some people have mentioned that this is better solved by shipping the PHP code
separately using composer. While this may be viable for 3rd party extensions
(and may be preferable if they have large fractions of PHP code), this option
does not exist for our standard library. We can hardly tell people that they
should go install a composer package in order to make use of some APIs in our
standard library.Anyway, to get back to the topic of PCS. First, I would recommend to target PHP
7.3 for this change. Feature freeze for 7.2 is in a bit over a month and I think
we'll want to make some non-trivial changes to how this works if we integrate it
in PHP. If added to PHP, I think this should be integrated into the core, rather
than being an extension.Here are some random thoughts:
As far as I understand, PCS relies on autoloading. There are two issues
here: First, autoloading does not register symbols prior to autoloading.
This means that functions like get_defined_classes() will not behave as
expected. Second, autoloading does not support functions. I think both of these
problems can be solved with some up-front symbol analysis. Lazily compiling
internal functions should not run into any of the problems we have with userland
function autoloading.It has already been mentioned in the thread, but what seems to lack right now
is a good way of integrating PHP and C portions. As far as I understand, PCS
allows you to write an entire class in PHP, but it does not allow you to offload
parts of the functionality to C without exposing additional public APIs. I think
there are two things we can do here:a) Provide a mechanism that makes certain functions only available inside
extension PHP code. This would allow exposing some private PHP functions
which are only used in the internal implementation of the extension.b) Allow binding some methods specified in PHP to an internal implementation.
The way this works in HHVM is that the PHP file contains just a signature, with an
attribute that signifies that an internal implementation will be bound to that
function:class Bar {
<<__Native>>
function foo($args);
}This would be useful for other reasons as well. In particular, this could become a
replacement for the existing arginfo-based signature specification, which is
somewhat limited and causes discrepancies with userland classes. For example,
arginfo does not support default values.
The mechanism like HHVM has is what were surely useful. Where I see a concern regarding PHP is, that with the original proposal a PHP interpreter is needed for the partials processing. To the compilation time for the core, it is not expected to be available. In further, for example if the binary just compiled would be used, it has an issue potential with FDO/PGO. Reason - required preparation tasks would produce training data not necessarily desired. Perhaps that is solvable by extending the build time - an independent minimal binary could be produced just for the goal. That, however, might not suffice depending on complexity, fe like it were about a PECL ext depending on classes of another one and trying to use type hints, classes from dependent, etc. Perhaps this needs more evaluation for non core exts. It's somehow a chinken/egg issue.
IMO, in any case the pieces, that can be handed out into a PHP code, would be a huge win. That would add to complexity reduction of the actual C parts an make the actual dev faster and more qualitative. There are always cases, where moving the implementation partially to C is a win in speed or functionality, or where is moving C implementation into userland doesn't make things worse but simplifies a lot. Currently it's almost only one way - only moving parts to C is a win in most case. Flexibility is a huge win in having both, too.
I'd also see the topic as coupled tightly with the previous discussions about Opcache core integration. Looking at what Python does, a possibility to redistribute the opcode cached bins might make sense. Of course, there are differences in how it works, where we could introduce some naming/configuration conventions to pursue the goal. That might involve also a change to package.xml specs for PECL exts. In general, I'd put the Opcache integration in the foreground, as that is the long standing topic that would also make evermore sense when the JIT branch in integrated. A PECL ext can have sources only, but given it's compiled for and on a specific platform - having right the opcode bins is more optimal. Maybe having both ways of either pure PHP or opcode bins were useful, too. In any case, Opcache integration with the core could be a game changer in many topics, as for me. Another related topic might be the integration of libffi, also looking at the good examples from Python.
Thanks
Anatol
Hi,
thanks to all for taking the time to think about it and give your opinion.
It seems that we may gather a consensus on the concept, as most of us
seem to agree about the benefits a mechanism like PCS can bring to the
core development process in general.
It also appears that PCS is not ready for an official release yet. Among
others, we need to refine the way PHP code is loaded and registered.
So, here is what I'm proposing :
-
Keep enriching the discussion during about 2 weeks. Please give
examples of core extensions that, in your opinion, are susceptible to be
partially or fully converted to PHP. Suggestions to complement existing
core extensions (like an additional curl OO layer) are welcome too.
Please tell us if a PHP implementation of these features already exists
somewhere. I would be especially interested by a PHP implementation of
the PDO generic layer... -
Then, I'll write an RFC summarizing the discussion. The RFC will list
blocking and non-blocking issues, as well as an indicative timeline.
Target for an initial release will be 7.3. It should leave enough time
to solve blocking issues and start converting some existing code to PHP. -
This RFC will then be voted upon. The vote will just ensure we agree
on the concept and the objectives. It won't authorize the final merge in
advance. This will be validated by another vote when the implementation
to release is ready. The objective is to avoid restarting the whole
discussion when we're ready for a release.
Regards
François
Hi Francois,
I'm in favour of shipping PHP code as part of either PHP core, and for
extensions.
I would use this for Imagick. There are several 'helper' functions
that would make this library easier to use, that are trivial to write
in PHP, that are incredibly not trivial to write in C.
Please give examples of core extensions that, in your opinion, are susceptible to be partially or fully converted to PHP.
Uh - I don't really agree that any existing core code should be
converted to PHP. If the code is already there and working as C code,
there are very few benefits converting it to PHP now. If we ever get
around to refactoring how strings are stored in PHP, it's at that
point that converting existing functionality to PHP code might be a
good idea.
Suggestions to complement existing core extensions (like an additional curl OO layer) are welcome too.
If you want this RFC to pass, I'd strongly recommend having
conversations about adding those as separate RFCs, once the "having
PHP code distributed with PHP" RFC is passed as otherwise the
signal-to-noise ratios will be terrible.
cheers
Dan
Ack