All,
I spent a little bit of time today trying to debug an issue with 7
that Drupal 8 was facing, specifically regarding an array index not
behaving correctly ($array["key"] returned null, even though the key
existed in the hash table).
I noticed that the hash table implementation has gotten orders of
magnitude more complex in recent times (since phpng was merged).
Specifically, that ardata and arhash are now the same block of memory,
and that we're now doing negative indexing into arData to get the hash
map list. From Dmitry's commit message, it was done to keep the data
that's accessed most often in the same CPU cache line. While I am sure
that there are definitive performance gains to doing this, I do worry
about the development and debugging costs of this added complexity.
As well as the way it increases the busfactor of the project.
There is definitely a tradeoff there, as the change is pretty well
encapsulated behind macros. But that introduces a new level of
abstraction. But deeper than that it really makes debugging with gdb a
pain in the neck.
Without hard data on this particular patch, I'm not suggesting we roll
back the change or anything. I more just want to express concern with
the trend lately to increase complexity significantly on developers
for the sake of performance.
While I'm definitely not saying performance doesn't matter, I also
think performance at all costs is dangerous. And I wonder if some of
the more fundamental (even if isolated) changes such as this should be
way more documented and include the performance justification for
them. I'm definitely not suggesting an RFC, but perhaps some level of
discussion should be required for these sorts of changes...
Thoughts?
On Fri, Apr 3, 2015 at 11:57 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:
All,
I spent a little bit of time today trying to debug an issue with 7
that Drupal 8 was facing, specifically regarding an array index not
behaving correctly ($array["key"] returned null, even though the key
existed in the hash table).I noticed that the hash table implementation has gotten orders of
magnitude more complex in recent times (since phpng was merged).Specifically, that ardata and arhash are now the same block of memory,
and that we're now doing negative indexing into arData to get the hash
map list. From Dmitry's commit message, it was done to keep the data
that's accessed most often in the same CPU cache line. While I am sure
that there are definitive performance gains to doing this, I do worry
about the development and debugging costs of this added complexity.As well as the way it increases the busfactor of the project.
There is definitely a tradeoff there, as the change is pretty well
encapsulated behind macros. But that introduces a new level of
abstraction. But deeper than that it really makes debugging with gdb a
pain in the neck.Without hard data on this particular patch, I'm not suggesting we roll
back the change or anything. I more just want to express concern with
the trend lately to increase complexity significantly on developers
for the sake of performance.While I'm definitely not saying performance doesn't matter, I also
think performance at all costs is dangerous. And I wonder if some of
the more fundamental (even if isolated) changes such as this should be
way more documented and include the performance justification for
them. I'm definitely not suggesting an RFC, but perhaps some level of
discussion should be required for these sorts of changes...Thoughts?
I think it is generally true that increased performance often requires more
sophisticated approaches.
Generally speaking I've observed that the faster, more modern runtime
engines all need to deal with that additional sophistication.
JIT runtime engines typically are the worst because they deal with hundreds
of micro-optimizations around code generation (register allocation, cache
line optimization, etc...).
So what you have in PHP 7 today is actually not "that" bad compared to some
of the other runtimes (IMO).
I think it can be partially addressed in a combination of documenting key
datastructures (some of which was already written) and maybe some
additional comments in areas of code where the complexity level goes up for
some very specific "tricks".
You can see by the level of interest in performance (whether ones opinion
is that this is fully warranted or not) around PHP 7, HHVM and other
languages, that this is an area we need to invest in on an ongoing basis.
And sophistication will likely go up.
Andi
Andi,
On Fri, Apr 3, 2015 at 11:57 AM, Anthony Ferrara ircmaxell@gmail.com
wrote:All,
I spent a little bit of time today trying to debug an issue with 7
that Drupal 8 was facing, specifically regarding an array index not
behaving correctly ($array["key"] returned null, even though the key
existed in the hash table).I noticed that the hash table implementation has gotten orders of
magnitude more complex in recent times (since phpng was merged).Specifically, that ardata and arhash are now the same block of memory,
and that we're now doing negative indexing into arData to get the hash
map list. From Dmitry's commit message, it was done to keep the data
that's accessed most often in the same CPU cache line. While I am sure
that there are definitive performance gains to doing this, I do worry
about the development and debugging costs of this added complexity.As well as the way it increases the busfactor of the project.
There is definitely a tradeoff there, as the change is pretty well
encapsulated behind macros. But that introduces a new level of
abstraction. But deeper than that it really makes debugging with gdb a
pain in the neck.Without hard data on this particular patch, I'm not suggesting we roll
back the change or anything. I more just want to express concern with
the trend lately to increase complexity significantly on developers
for the sake of performance.While I'm definitely not saying performance doesn't matter, I also
think performance at all costs is dangerous. And I wonder if some of
the more fundamental (even if isolated) changes such as this should be
way more documented and include the performance justification for
them. I'm definitely not suggesting an RFC, but perhaps some level of
discussion should be required for these sorts of changes...Thoughts?
I think it is generally true that increased performance often requires more
sophisticated approaches.
Generally speaking I've observed that the faster, more modern runtime
engines all need to deal with that additional sophistication.
JIT runtime engines typically are the worst because they deal with hundreds
of micro-optimizations around code generation (register allocation, cache
line optimization, etc...).
So what you have in PHP 7 today is actually not "that" bad compared to some
of the other runtimes (IMO).
I think it can be partially addressed in a combination of documenting key
datastructures (some of which was already written) and maybe some additional
comments in areas of code where the complexity level goes up for some very
specific "tricks".You can see by the level of interest in performance (whether ones opinion is
that this is fully warranted or not) around PHP 7, HHVM and other languages,
that this is an area we need to invest in on an ongoing basis. And
sophistication will likely go up.
Thanks for the reply. I'm not really saying everything needs to be
dead simple. Most of the issues I'm more talking about could be solved
through communication, documentation, tooling and refactoring. But
some I do question at a more fundamental level. The hash table is one
of them.
If we were using a pure abstraction (only accessing the hash table
information through the public API), then fine because it's isolated.
However, many extensions and even places in core access hash table
structure directly (as can be seen by the updates needed by
https://github.com/php/php-src/commit/2b42d719084631d255ec7ebb6c2928b9339915c2).
Meaning the complexity isn't encapsulated.
Sophistication is fine. What worries me though is magic. What worries
me is the growing inability to debug with normal tools. Perhaps we
need a GDB extension to provide tooling for common debugging tasks.
Heck, even dumping a zend_string requires a cast (p (char*)str->val).
I am all for the performance improvements. I just don't think "at all
costs" is a viable model (nor do I think that's what people are
doing). I just think it's worth discussing (and hopefully mitigating)
the costs of them explicitly. At least for the more significant ones.
Anthony
Sophistication is fine. What worries me though is magic. What worries
me is the growing inability to debug with normal tools. Perhaps we
need a GDB extension to provide tooling for common debugging tasks.
Heck, even dumping a zend_string requires a cast (p (char*)str->val).
On that: we do already have a .gdbinit in php-src. I wonder if a
concrete thing that could be done right now to improve matters for
master would be to extend it on master to cover those sorts of common
operations that we're going to need to debug PHP 7.
Adam
De : Anthony Ferrara [mailto:ircmaxell@gmail.com]
If we were using a pure abstraction (only accessing the hash table
information through the public API), then fine because it's isolated.
However, many extensions and even places in core access hash table
structure directly (as can be seen by the updates needed by
https://github.com/php/php-
src/commit/2b42d719084631d255ec7ebb6c2928b9339915c2).
Meaning the complexity isn't encapsulated.
IMHO, that's the main problem. If a piece of code is accessed through an official published API only, its internal complexity can grow, provided the API still provides the same services. So, the first step should be to define and publish an 'official' full-featured API. Phpinternalsbook.com is a fine place for this. The question is what we do for PHP 5 : do we publish a PHP7-only API ? Do we backport it to 5.6, 5.5, 5.4 ?
Once we have an API, we can fix the code to use it exclusively. One way I already used to check zval access is through an additional configure option that modifies the field names in the structure so that any access outside the API fails at compile time. This can be a valuable tool for extension developers.
If you think the same, I'd be glad to participate.
Regards
François
Fancois,
De : Anthony Ferrara [mailto:ircmaxell@gmail.com]
If we were using a pure abstraction (only accessing the hash table
information through the public API), then fine because it's isolated.
However, many extensions and even places in core access hash table
structure directly (as can be seen by the updates needed by
https://github.com/php/php-
src/commit/2b42d719084631d255ec7ebb6c2928b9339915c2).
Meaning the complexity isn't encapsulated.IMHO, that's the main problem. If a piece of code is accessed through an
official published API only, its internal complexity can grow, provided the
API still provides the same services. So, the first step should be to
define and publish an 'official' full-featured API.
We already have one. The public API defined in zend_hash.h (all methods and
macros not prefixed with "_").
And my argument would be if code outside of zend_hash.(c|h) needs to access
the internal hash table structure for anything (a public API doesn't serve
the needs), then a new macro or API should be introduced for that use-case
(so that it doesn't need to access the structure anymore).
Anthony
All,
I spent a little bit of time today trying to debug an issue with 7
that Drupal 8 was facing, specifically regarding an array index not
behaving correctly ($array["key"] returned null, even though the key
existed in the hash table).I noticed that the hash table implementation has gotten orders of
magnitude more complex in recent times (since phpng was merged).Specifically, that ardata and arhash are now the same block of memory,
and that we're now doing negative indexing into arData to get the hash
map list. From Dmitry's commit message, it was done to keep the data
that's accessed most often in the same CPU cache line. While I am sure
that there are definitive performance gains to doing this, I do worry
about the development and debugging costs of this added complexity.As well as the way it increases the busfactor of the project.
There is definitely a tradeoff there, as the change is pretty well
encapsulated behind macros. But that introduces a new level of
abstraction. But deeper than that it really makes debugging with gdb a
pain in the neck.Without hard data on this particular patch, I'm not suggesting we roll
back the change or anything. I more just want to express concern with
the trend lately to increase complexity significantly on developers
for the sake of performance.
While I'm definitely not saying performance doesn't matter, I also
think performance at all costs is dangerous. And I wonder if some of
the more fundamental (even if isolated) changes such as this should be
way more documented and include the performance justification for
them. I'm definitely not suggesting an RFC, but perhaps some level of
discussion should be required for these sorts of changes...
The idea was described months ago, then implemented part by part.
internals@lists.php.net/msg72362.html" rel="nofollow" target="_blank">https://www.mail-archive.com/internals@lists.php.net/msg72362.html
Thanks. Dmitry.
Thoughts?
On Fri, Apr 3, 2015 at 9:57 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:All,
I spent a little bit of time today trying to debug an issue with 7
that Drupal 8 was facing, specifically regarding an array index not
behaving correctly ($array["key"] returned null, even though the key
existed in the hash table).I noticed that the hash table implementation has gotten orders of
magnitude more complex in recent times (since phpng was merged).Specifically, that ardata and arhash are now the same block of memory,
and that we're now doing negative indexing into arData to get the hash
map list. From Dmitry's commit message, it was done to keep the data
that's accessed most often in the same CPU cache line. While I am sure
that there are definitive performance gains to doing this, I do worry
about the development and debugging costs of this added complexity.As well as the way it increases the busfactor of the project.
There is definitely a tradeoff there, as the change is pretty well
encapsulated behind macros. But that introduces a new level of
abstraction. But deeper than that it really makes debugging with gdb a
pain in the neck.Without hard data on this particular patch, I'm not suggesting we roll
back the change or anything. I more just want to express concern with
the trend lately to increase complexity significantly on developers
for the sake of performance.While I'm definitely not saying performance doesn't matter, I also
think performance at all costs is dangerous. And I wonder if some of
the more fundamental (even if isolated) changes such as this should be
way more documented and include the performance justification for
them. I'm definitely not suggesting an RFC, but perhaps some level of
discussion should be required for these sorts of changes...
I agree with Anthony.
Many things however can be solved with a nice .gdbinit.
We already have dump_ht() , dump_htptr() , f.e , that I'm using heavilly to
debug HT in PHP5.
Not talking about dump_bt().
I think one step is to improve our .gdbinit with many more features, and
obviously port the actual ones to work with PHP7.
A second step is documentation.
Anthony, you know about our project phpinternalsbook.com, don't you ;-)
There has been recent discussions on IRC to actually merge this project
under php.net.
I'm really feeling enthusiast about helping or even taking the lead of such
a project : I would like php.net to hold a real, detailed documentation
about internals.
I think with PHP7 should come an internal documentation, somewhere behind
php.net , that will explain to a C-aware developper our main internal
structures and choices, especially about performance optimisations.
Have you had a look at the new Zend Memory Manager ? It has become insanely
complex, with many performance-turned code.
Same, but in a lower footprint, for the executor : the executor stack frame
has really changed from PHP5's one, and is also not very easy to debug
(with a long alloced buffer shrinked with many pointer tricks that needs
you to have a complete image of the memory buffer in your head).
I won't be able myself to document all those tricks, because I'm not the
author of them.
I think Zend, through Dmitry, Nikic, Bob or Laruence , should help us
understanding some concepts, if they are not around to help with the doc.
Julien.Pauli
On Fri, Apr 3, 2015 at 9:57 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:All,
I spent a little bit of time today trying to debug an issue with 7
that Drupal 8 was facing, specifically regarding an array index not
behaving correctly ($array["key"] returned null, even though the key
existed in the hash table).I noticed that the hash table implementation has gotten orders of
magnitude more complex in recent times (since phpng was merged).Specifically, that ardata and arhash are now the same block of memory,
and that we're now doing negative indexing into arData to get the hash
map list. From Dmitry's commit message, it was done to keep the data
that's accessed most often in the same CPU cache line. While I am sure
that there are definitive performance gains to doing this, I do worry
about the development and debugging costs of this added complexity.As well as the way it increases the busfactor of the project.
There is definitely a tradeoff there, as the change is pretty well
encapsulated behind macros. But that introduces a new level of
abstraction. But deeper than that it really makes debugging with gdb a
pain in the neck.Without hard data on this particular patch, I'm not suggesting we roll
back the change or anything. I more just want to express concern with
the trend lately to increase complexity significantly on developers
for the sake of performance.While I'm definitely not saying performance doesn't matter, I also
think performance at all costs is dangerous. And I wonder if some of
the more fundamental (even if isolated) changes such as this should be
way more documented and include the performance justification for
them. I'm definitely not suggesting an RFC, but perhaps some level of
discussion should be required for these sorts of changes...I agree with Anthony.
Many things however can be solved with a nice .gdbinit.
We already have dump_ht() , dump_htptr() , f.e , that I'm using heavilly
to debug HT in PHP5.
Not talking about dump_bt().I think one step is to improve our .gdbinit with many more features, and
obviously port the actual ones to work with PHP7.A second step is documentation.
Anthony, you know about our project phpinternalsbook.com, don't you ;-)
There has been recent discussions on IRC to actually merge this project
under php.net.I'm really feeling enthusiast about helping or even taking the lead of
such a project : I would like php.net to hold a real, detailed
documentation about internals.I think with PHP7 should come an internal documentation, somewhere behind
php.net , that will explain to a C-aware developper our main internal
structures and choices, especially about performance optimisations.Have you had a look at the new Zend Memory Manager ? It has become
insanely complex, with many performance-turned code.
Same, but in a lower footprint, for the executor : the executor stack
frame has really changed from PHP5's one, and is also not very easy to
debug (with a long alloced buffer shrinked with many pointer tricks that
needs you to have a complete image of the memory buffer in your head).I won't be able myself to document all those tricks, because I'm not the
author of them.
I think Zend, through Dmitry, Nikic, Bob or Laruence , should help us
understanding some concepts, if they are not around to help with the doc.
Hi Julien,
It would be great, if you lead PHP-7 internals documentation project.
You are always welcome with questions about implementation details.
I may also take care about documenting some features in more or less
complete form.
Thanks. Dmitry.
Julien.Pauli
On Fri, Apr 3, 2015 at 9:57 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:All,
I spent a little bit of time today trying to debug an issue with 7
that Drupal 8 was facing, specifically regarding an array index not
behaving correctly ($array["key"] returned null, even though the key
existed in the hash table).I noticed that the hash table implementation has gotten orders of
magnitude more complex in recent times (since phpng was merged).Specifically, that ardata and arhash are now the same block of memory,
and that we're now doing negative indexing into arData to get the hash
map list. From Dmitry's commit message, it was done to keep the data
that's accessed most often in the same CPU cache line. While I am sure
that there are definitive performance gains to doing this, I do worry
about the development and debugging costs of this added complexity.As well as the way it increases the busfactor of the project.
There is definitely a tradeoff there, as the change is pretty well
encapsulated behind macros. But that introduces a new level of
abstraction. But deeper than that it really makes debugging with gdb a
pain in the neck.Without hard data on this particular patch, I'm not suggesting we roll
back the change or anything. I more just want to express concern with
the trend lately to increase complexity significantly on developers
for the sake of performance.While I'm definitely not saying performance doesn't matter, I also
think performance at all costs is dangerous. And I wonder if some of
the more fundamental (even if isolated) changes such as this should be
way more documented and include the performance justification for
them. I'm definitely not suggesting an RFC, but perhaps some level of
discussion should be required for these sorts of changes...I agree with Anthony.
Many things however can be solved with a nice .gdbinit.
We already have dump_ht() , dump_htptr() , f.e , that I'm using heavilly
to debug HT in PHP5.
Not talking about dump_bt().I think one step is to improve our .gdbinit with many more features, and
obviously port the actual ones to work with PHP7.A second step is documentation.
Anthony, you know about our project phpinternalsbook.com, don't you ;-)
There has been recent discussions on IRC to actually merge this project
under php.net.I'm really feeling enthusiast about helping or even taking the lead of
such a project : I would like php.net to hold a real, detailed
documentation about internals.I think with PHP7 should come an internal documentation, somewhere behind
php.net , that will explain to a C-aware developper our main internal
structures and choices, especially about performance optimisations.Have you had a look at the new Zend Memory Manager ? It has become
insanely complex, with many performance-turned code.
Same, but in a lower footprint, for the executor : the executor stack
frame has really changed from PHP5's one, and is also not very easy to
debug (with a long alloced buffer shrinked with many pointer tricks that
needs you to have a complete image of the memory buffer in your head).I won't be able myself to document all those tricks, because I'm not the
author of them.
I think Zend, through Dmitry, Nikic, Bob or Laruence , should help us
understanding some concepts, if they are not around to help with the doc.Hi Julien,
It would be great, if you lead PHP-7 internals documentation project.
You are always welcome with questions about implementation details.
Yes I know that you - as well as other guys I talked about in my last post
- are really open and answer quickly and efficiently to our technical
questions, which is a nice point.
I'm OK to take the lead of such a project.
However, as PHP itself, the project should stay wide open and everyone may
have something to say/bring.
Perhaps time to start a thread about this ?
Julien.P
Julien,
On Fri, Apr 3, 2015 at 9:57 PM, Anthony Ferrara ircmaxell@gmail.com
wrote:All,
I spent a little bit of time today trying to debug an issue with 7
that Drupal 8 was facing, specifically regarding an array index not
behaving correctly ($array["key"] returned null, even though the key
existed in the hash table).I noticed that the hash table implementation has gotten orders of
magnitude more complex in recent times (since phpng was merged).Specifically, that ardata and arhash are now the same block of
memory,
and that we're now doing negative indexing into arData to get the
hash
map list. From Dmitry's commit message, it was done to keep the data
that's accessed most often in the same CPU cache line. While I am
sure
that there are definitive performance gains to doing this, I do worry
about the development and debugging costs of this added complexity.As well as the way it increases the busfactor of the project.
There is definitely a tradeoff there, as the change is pretty well
encapsulated behind macros. But that introduces a new level of
abstraction. But deeper than that it really makes debugging with gdb
a
pain in the neck.Without hard data on this particular patch, I'm not suggesting we
roll
back the change or anything. I more just want to express concern with
the trend lately to increase complexity significantly on developers
for the sake of performance.While I'm definitely not saying performance doesn't matter, I also
think performance at all costs is dangerous. And I wonder if some of
the more fundamental (even if isolated) changes such as this should
be
way more documented and include the performance justification for
them. I'm definitely not suggesting an RFC, but perhaps some level of
discussion should be required for these sorts of changes...I agree with Anthony.
Many things however can be solved with a nice .gdbinit.
We already have dump_ht() , dump_htptr() , f.e , that I'm using
heavilly to debug HT in PHP5.
Not talking about dump_bt().I think one step is to improve our .gdbinit with many more features,
and obviously port the actual ones to work with PHP7.A second step is documentation.
Anthony, you know about our project phpinternalsbook.com, don't you ;-)
There has been recent discussions on IRC to actually merge this project
under php.net.I'm really feeling enthusiast about helping or even taking the lead of
such a project : I would like php.net to hold a real, detailed
documentation about internals.I think with PHP7 should come an internal documentation, somewhere
behind php.net , that will explain to a C-aware developper our main
internal structures and choices, especially about performance
optimisations.Have you had a look at the new Zend Memory Manager ? It has become
insanely complex, with many performance-turned code.
Same, but in a lower footprint, for the executor : the executor stack
frame has really changed from PHP5's one, and is also not very easy to
debug (with a long alloced buffer shrinked with many pointer tricks that
needs you to have a complete image of the memory buffer in your head).I won't be able myself to document all those tricks, because I'm not
the author of them.
I think Zend, through Dmitry, Nikic, Bob or Laruence , should help us
understanding some concepts, if they are not around to help with the doc.Hi Julien,
It would be great, if you lead PHP-7 internals documentation project.
You are always welcome with questions about implementation details.Yes I know that you - as well as other guys I talked about in my last
post - are really open and answer quickly and efficiently to our technical
questions, which is a nice point.I'm OK to take the lead of such a project.
However, as PHP itself, the project should stay wide open and everyone
may have something to say/bring.Perhaps time to start a thread about this ?
+1 from me. That would go a long way towards mitigating some of these
issues.
Anthony