hi,
First of all thanks everyone for voicing your opinion on this topic.
Due to the mailing system issues, we extend the vote until nexts
Friday, to avoid any possible complains about it.
Now, about the RFC itself. I would like to first remind some critical points:
- The memory usage is not increase by 8% but ~4% in real life usage:
Wordpress master: 24.33Mb
Wordpress str_size_and_int64: 25.72Mb
delta 5.4%
Symfony master: 26.59Mb
Symfony str_size_and_int64: 27.19Mb
delta 2.2%
Drupal master: 23.46MB
Drupal str_size_and_int64: 24.60Mb
delta 4.63%
-
The main priorities we have been worked on are:
. Stability
This patch has been tested intensively since the very first proposal,
based on 5.5, 5.6 and master. Largely used frameworks and applications
have been used to valid the changes. Everything has been done
publicly, snapshots builds have been provided, tests results, updates,
etc. have been published and announced at regular intervals.. Correctness and safenes
Correct typing and reduce risks by removing magic casting,
warnings about time junglings, etc make PHP safer, cleaner and less
errorprone, while drastically reduce the work to catch new issues. I
highly recommend to read http://news.php.net/php.internals/74193 and
the references listed there. Performance
Performance is one of the key of PHP success. Everyone takes
performance seriously and so we do. The 64bit patch does not impact
performance. It is at worst as fast as before and in many cases it
even runs faster.
Now that these points may be more clear, I would like to explain what
we plan to do given the sudden announce about phpng.
First of all, I never repeat it enough, cooperation is the key to
success. If you did not notice we put effort too to get phpng in a
usable state as well, providing fixes, ports and tests (for what is
testable at this point).
As the current voting results show, it makes sense to target phpng
instead of master for the 64bit patch. Not only because it is the
community will but also because it will reduce the amount of work on
both side while allowing more tweaks and improvements. We have been
explained that already and Nikita summarized it pretty well in this
post:http://news.php.net/php.internals/74284
As I have told earlier, stability and correctness are the top
priorities in such work. It is then much easier to tweak a stable,
well tested and correct implementation than trying to tweak, optimize,
re arrange a stack of hacks. We have reached the stability and
correctness stand.
However, asking to take a moving and still not proposed target into
account before voting on a finished, stable patch is a hard thing. It
is not possible to merge it now without basically redoing everything
from scratch, possibly many times.
It is not uncommon to vote on a patch that will change. Let take
opcache as an example, by the time it was proposed it was months away
from being ready. We delayed the 5.5 release for it, much later than
what was told. The only difference here, and with phpng when it will
come to a RFC, is that we have ~2 years to get them rock solid. Please
keep that in mind while voting.
What we propose is the following:
- get phpng in a alpha state, somehow testable in common scenarios
(should take 2-3 months max) - merge the 64bit patch
- do the improvements and tweaks we discussed (be ours or Nikita's),
they will be based on real results and not some random moving targets,
safer, cleaner, better (c)
Doing so will bring the benefits of both patches to PHP without
increase the work load for any of us (well, except for my team, that
will cost us quite some time to do it). In the meantime, we will keep
our effort to get phpng ready as soon as possible, by porting missing
parts, fixing it, etc.
I do think that this is a good and reasonable way of doing it.
Thanks for having read this mail until here and for your upcoming
votes or feedbacks.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
Hi Pierre,
I appreciate your professionalism in helping with phpng.
I object against the proposal, because in my opinion, it makes significant
degradation even for master.
(Please don't argue about it again. You have your opinion, I have mine, we
already wrote a lot).
I also think, it makes sense to target at least IS_LONG part of this patch
to phpng.
Other changes are questionable. In phpng we may relatively easy check the
impact of 64-bit string size on performance and memory consumption to make
a decision. I don't have any special opinion right now.
I didn't get your position about zend_size_t in all core structures. And
this is the main question.
Thanks. Dmitry.
hi,
First of all thanks everyone for voicing your opinion on this topic.
Due to the mailing system issues, we extend the vote until nexts
Friday, to avoid any possible complains about it.Now, about the RFC itself. I would like to first remind some critical
points:
- The memory usage is not increase by 8% but ~4% in real life usage:
Wordpress master: 24.33Mb
Wordpress str_size_and_int64: 25.72Mb
delta 5.4%Symfony master: 26.59Mb
Symfony str_size_and_int64: 27.19Mb
delta 2.2%Drupal master: 23.46MB
Drupal str_size_and_int64: 24.60Mb
delta 4.63%
The main priorities we have been worked on are:
. Stability
This patch has been tested intensively since the very first proposal,
based on 5.5, 5.6 and master. Largely used frameworks and applications
have been used to valid the changes. Everything has been done
publicly, snapshots builds have been provided, tests results, updates,
etc. have been published and announced at regular intervals.. Correctness and safenes
Correct typing and reduce risks by removing magic casting,
warnings about time junglings, etc make PHP safer, cleaner and less
errorprone, while drastically reduce the work to catch new issues. I
highly recommend to read http://news.php.net/php.internals/74193 and
the references listed there. Performance
Performance is one of the key of PHP success. Everyone takes
performance seriously and so we do. The 64bit patch does not impact
performance. It is at worst as fast as before and in many cases it
even runs faster.Now that these points may be more clear, I would like to explain what
we plan to do given the sudden announce about phpng.First of all, I never repeat it enough, cooperation is the key to
success. If you did not notice we put effort too to get phpng in a
usable state as well, providing fixes, ports and tests (for what is
testable at this point).As the current voting results show, it makes sense to target phpng
instead of master for the 64bit patch. Not only because it is the
community will but also because it will reduce the amount of work on
both side while allowing more tweaks and improvements. We have been
explained that already and Nikita summarized it pretty well in this
post:http://news.php.net/php.internals/74284As I have told earlier, stability and correctness are the top
priorities in such work. It is then much easier to tweak a stable,
well tested and correct implementation than trying to tweak, optimize,
re arrange a stack of hacks. We have reached the stability and
correctness stand.However, asking to take a moving and still not proposed target into
account before voting on a finished, stable patch is a hard thing. It
is not possible to merge it now without basically redoing everything
from scratch, possibly many times.It is not uncommon to vote on a patch that will change. Let take
opcache as an example, by the time it was proposed it was months away
from being ready. We delayed the 5.5 release for it, much later than
what was told. The only difference here, and with phpng when it will
come to a RFC, is that we have ~2 years to get them rock solid. Please
keep that in mind while voting.What we propose is the following:
- get phpng in a alpha state, somehow testable in common scenarios
(should take 2-3 months max)- merge the 64bit patch
- do the improvements and tweaks we discussed (be ours or Nikita's),
they will be based on real results and not some random moving targets,
safer, cleaner, better (c)Doing so will bring the benefits of both patches to PHP without
increase the work load for any of us (well, except for my team, that
will cost us quite some time to do it). In the meantime, we will keep
our effort to get phpng ready as soon as possible, by porting missing
parts, fixing it, etc.I do think that this is a good and reasonable way of doing it.
Thanks for having read this mail until here and for your upcoming
votes or feedbacks.Cheers,
Pierre
@pierrejoye | http://www.libgd.org
Hi Pierre,
I appreciate your professionalism in helping with phpng.
I object against the proposal, because in my opinion, it makes significant
degradation even for master.
(Please don't argue about it again. You have your opinion, I have mine, we
already wrote a lot).I also think, it makes sense to target at least IS_LONG part of this patch
to phpng.Other changes are questionable. In phpng we may relatively easy check the
impact of 64-bit string size on performance and memory consumption to make a
decision. I don't have any special opinion right now.I didn't get your position about zend_size_t in all core structures. And
this is the main question.
I think I was again not clear. This is the part we will drop. We
simply can't do it now for the reasons I explained in the mail. But as
I and Nikita said earlier there are areas for improvements and tweaks,
only the right time to do it is debatable, I prefer to have a stable,
testable branch before going to change these parts again with the risk
of making the whole thing hard to test, debug or improve. Is it
something you can live with?
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
yes.
Thanks. Dmitry.
Hi Pierre,
I appreciate your professionalism in helping with phpng.
I object against the proposal, because in my opinion, it makes
significant
degradation even for master.
(Please don't argue about it again. You have your opinion, I have mine,
we
already wrote a lot).I also think, it makes sense to target at least IS_LONG part of this
patch
to phpng.Other changes are questionable. In phpng we may relatively easy check the
impact of 64-bit string size on performance and memory consumption to
make a
decision. I don't have any special opinion right now.I didn't get your position about zend_size_t in all core structures. And
this is the main question.I think I was again not clear. This is the part we will drop. We
simply can't do it now for the reasons I explained in the mail. But as
I and Nikita said earlier there are areas for improvements and tweaks,
only the right time to do it is debatable, I prefer to have a stable,
testable branch before going to change these parts again with the risk
of making the whole thing hard to test, debug or improve. Is it
something you can live with?Cheers,
Pierre
@pierrejoye | http://www.libgd.org
Hello internals!
I was following this 64 bit stuff from the get go, and been monitoring the
PHPNG stuff as it was announced, so I think I have a pretty clear picture
of what's happening. At least I belive so :) And I'm a userland developer,
so I'm not into compilers and low-level C stuff, just have the basic
understanding and principles at hand.
So, to the point of my message.
I have a question that is nugging me for weeks now about the 64 bit stuff
and how 64 bit processors are handling things. There is no debate that a 64
bit pointer or integer takes double the memory than a 32 bit one, and
actually direct comparation gives us a frightening answer - it's doubled!
In reality 10 years later we know that it's not that bad (it's common
knowlege).
My main question is - isn't 2 32 bit integers, stuck into one 64 bit
register, handled slower than 2 64 bit integer each occupying it's own
register?
I'm concerned with the fact, that some of you are so fixated on the memory
consumption aspect without any regard to the CPU cycle part of the
equasion. It was pointed out by Pierre, and largely ignored, by the vocal
advocates for the PHPNG, that despite the change of many 32 bit types to 64
bit it not just didn't degrade the performance (hello, 64 bit processors,
no? They by design are more efficient with 64 bit than with 2 32 bit
numbers stuck into a 64 bit register), but actually improved it. IMPROVED
THE PERFORMANCE. This is logical, this is what you expect optimizing for 64
bit mode. And because of the PHP 5.4 memory reduction (interned strings, if
i'm not mistaken?) memory growth is almost non-existent. A 2% to 5%
increase in memory usage agains 5.5 after a 30% to 70% memory usage drop in
5.4 is, from my point of view, a drop in the ocean.
And PHPNG is a PoC at this point: not optimized, no advanced features
implemented, etc. Hell, it's a JIT engine, maybe after initial release you
will find a way to minimize thouse ZVAL structures in a way, that it will
cut down the memory usage on them much more that you have introduced with
64 bit patch with those size_t types.
The future proofing side was mentioned only a few times and I actually want
to point this out - memory is growing like crazy - DDR4 is already being
shipped, server platforms for DDR4 are being introduced this summer. First
DDR4 modules are already surpasing the DDR3 capacity with ease. And data
amounts, we are handling in our PHP applications, also grows a lot. Yes, I
agree that an array of 2^64 bit elements probably will not be handled by
PHP anytime soon, but did you concider the fact that an array's key,
indexed by an ID column from the database, can easilly be bigger than 32
bit unsigned integer (I tend to put 64 bit integers as my primary keys
these days)? Will it error out if you use a 32 bit array length integer or
should you actually use a 64 bit one in this case? At least no one is
objecting that we need > 4GB file support.
For me, as a userland developer, this future proofing and concictensy, with
the slight performance speedup, is much more important that a 2 to 5%
memory usage increase. And PHPNG, when actually done and optimized, will
bring much more improvement than cutting down this 64 bit patch in half.
Most of us have memory to spare on our servers, especially when running a
nginx + php-fpm setup. And those, who have memory problems, probably use
wrong tool for the job (or poorly designed software). The history shows,
that PHP had it's major problems with half-baked solutions or cut down
implementations due to politics and obsessive behaviour over some parts of
it. I though PHP had it's share of trouble with register_globals and many
other bad decisions and now, at least from my point of view, another bad
decision is forced on, this time by the Zend itself (at least it looks that
way). For god's sake - you make a JIT for PHP. Implement god damn typehints
for scalars and optimize the memory usage with JIT optimizations, but leave
the room for the manouver for the future. Think also about the extensions -
from their point of view using a consistent type is much easier across the
board.
My 0.02$, thanks.
In few words today CPU may work much faster than main memory.
We may save one cycle, but a cache ot TLB miss would add hundred stall
cycles.
In this situation compact data structures designed with data locality in
mind may bring much more effect.
On very first steps phpng used more CPU cycles but worked faster out of the
box, see https://wiki.php.net/phpng#performance_evaluation
Thanks. Dmitry.
On Mon, May 19, 2014 at 3:59 PM, Arvids Godjuks arvids.godjuks@gmail.comwrote:
Hello internals!
I was following this 64 bit stuff from the get go, and been monitoring the
PHPNG stuff as it was announced, so I think I have a pretty clear picture
of what's happening. At least I belive so :) And I'm a userland developer,
so I'm not into compilers and low-level C stuff, just have the basic
understanding and principles at hand.So, to the point of my message.
I have a question that is nugging me for weeks now about the 64 bit stuff
and how 64 bit processors are handling things. There is no debate that a 64
bit pointer or integer takes double the memory than a 32 bit one, and
actually direct comparation gives us a frightening answer - it's doubled!
In reality 10 years later we know that it's not that bad (it's common
knowlege).My main question is - isn't 2 32 bit integers, stuck into one 64 bit
register, handled slower than 2 64 bit integer each occupying it's own
register?I'm concerned with the fact, that some of you are so fixated on the memory
consumption aspect without any regard to the CPU cycle part of the
equasion. It was pointed out by Pierre, and largely ignored, by the vocal
advocates for the PHPNG, that despite the change of many 32 bit types to 64
bit it not just didn't degrade the performance (hello, 64 bit processors,
no? They by design are more efficient with 64 bit than with 2 32 bit
numbers stuck into a 64 bit register), but actually improved it. IMPROVED
THE PERFORMANCE. This is logical, this is what you expect optimizing for 64
bit mode. And because of the PHP 5.4 memory reduction (interned strings, if
i'm not mistaken?) memory growth is almost non-existent. A 2% to 5%
increase in memory usage agains 5.5 after a 30% to 70% memory usage drop in
5.4 is, from my point of view, a drop in the ocean.And PHPNG is a PoC at this point: not optimized, no advanced features
implemented, etc. Hell, it's a JIT engine, maybe after initial release you
will find a way to minimize thouse ZVAL structures in a way, that it will
cut down the memory usage on them much more that you have introduced with
64 bit patch with those size_t types.The future proofing side was mentioned only a few times and I actually want
to point this out - memory is growing like crazy - DDR4 is already being
shipped, server platforms for DDR4 are being introduced this summer. First
DDR4 modules are already surpasing the DDR3 capacity with ease. And data
amounts, we are handling in our PHP applications, also grows a lot. Yes, I
agree that an array of 2^64 bit elements probably will not be handled by
PHP anytime soon, but did you concider the fact that an array's key,
indexed by an ID column from the database, can easilly be bigger than 32
bit unsigned integer (I tend to put 64 bit integers as my primary keys
these days)? Will it error out if you use a 32 bit array length integer or
should you actually use a 64 bit one in this case? At least no one is
objecting that we need > 4GB file support.For me, as a userland developer, this future proofing and concictensy, with
the slight performance speedup, is much more important that a 2 to 5%
memory usage increase. And PHPNG, when actually done and optimized, will
bring much more improvement than cutting down this 64 bit patch in half.
Most of us have memory to spare on our servers, especially when running a
nginx + php-fpm setup. And those, who have memory problems, probably use
wrong tool for the job (or poorly designed software). The history shows,
that PHP had it's major problems with half-baked solutions or cut down
implementations due to politics and obsessive behaviour over some parts of
it. I though PHP had it's share of trouble with register_globals and many
other bad decisions and now, at least from my point of view, another bad
decision is forced on, this time by the Zend itself (at least it looks that
way). For god's sake - you make a JIT for PHP. Implement god damn typehints
for scalars and optimize the memory usage with JIT optimizations, but leave
the room for the manouver for the future. Think also about the extensions -
from their point of view using a consistent type is much easier across the
board.My 0.02$, thanks.
Hi Arvids,
On Mon, May 19, 2014 at 8:59 PM, Arvids Godjuks arvids.godjuks@gmail.comwrote:
Yes, I
agree that an array of 2^64 bit elements probably will not be handled by
PHP anytime soon, but did you concider the fact that an array's key,
indexed by an ID column from the database, can easilly be bigger than 32
bit unsigned integer
Interesting point. I would not do this since I know the current limitation,
but
this is perfectly valid and practical usage.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Arvids,
On Mon, May 19, 2014 at 8:59 PM, Arvids Godjuks <arvids.godjuks@gmail.com
wrote:
Yes, I
agree that an array of 2^64 bit elements probably will not be handled by
PHP anytime soon, but did you concider the fact that an array's key,
indexed by an ID column from the database, can easilly be bigger than 32
bit unsigned integerInteresting point. I would not do this since I know the current limitation,
but
this is perfectly valid and practical usage.Regards,
You can already use a key larger than 32 bits to index an array. The
limitation only applies if you want to actually store more than 2^32 (or
2^31 depending on the exact implementation) elements. That's the beauty of
a hashtable: The allowed key domain is totally independent of the hashtable
size ;)
Nikita
Hi Nikita,
You can already use a key larger than 32 bits to index an array. The
limitation only applies if you want to actually store more than 2^32 (or
2^31 depending on the exact implementation) elements. That's the beauty of
a hashtable: The allowed key domain is totally independent of the hashtable
size ;)
Right and wrong.
When key is string, key can be anything. However, when it is int, it's not.
$ php -r '$arr[9999999999999999999999999999999]="abc";var_dump($arr);'
array(1) {
[-4571153621781053440]=>
string(3) "abc"
}
$ php -r '$arr["9999999999999999999999999999999"]="abc";var_dump($arr);'
array(1) {
["9999999999999999999999999999999"]=>
string(3) "abc"
}
This is the reason why convert DB data values and Web inputs should not be
converted to native PHP data type blindly :)
It could work, but it cannot with 64bit int.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
This is the reason why DB data values and Web inputs should not be
converted to native PHP data type blindly :)
It could work, but it cannot work with 64bit int.
It may be changed not to use int key (use string key) when int key
overflows,
then it would work.
However, many users are trying to use strict type. Users may experience
data validation errors for large int with 32bit int key. It would be better
to allow
64bit int if int is 64bit. I prefer consistency, so array int key is better
to support
64bit int key. IMHO.
Similar argument applies to string also. It would be WTF, when users try to
access string offset over 32bit values. Data dealt with PHP is getting
larger
and larger. It would be an issue sooner or later.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi!
64bit int if int is 64bit. I prefer consistency, so array int key is better
to support
64bit int key. IMHO.
Given that 99.9999% of PHP users will never need it, but 100% of PHP
users will pay in performance for each size increase, we need to be
careful here. "Consistency" is not more magic word than "security".
Similar argument applies to string also. It would be WTF, when users try to
access string offset over 32bit values. Data dealt with PHP is getting
larger
and larger. It would be an issue sooner or later.
Not likely, unless somehow PHP becomes language of choice for processing
big data. Which I don't see exactly happening. But if it ever happens,
we can deal with it then.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
64bit int if int is 64bit. I prefer consistency, so array int key is better
to support
64bit int key. IMHO.
Given that 99.9999% of PHP users will never need it, but 100% of PHP
users will pay in performance for each size increase, we need to be
careful here. "Consistency" is not more magic word than "security".
Consistency is the main problem here. Most database engines use 64bit
unique identifiers for SEQUENCE values. Upgrading systems that have been
using 32bit integer values for this to support 64bit ones is the problem
I've been living with. 32bit client apps get a string value where
previously it was a simple integer.
Similar argument applies to string also. It would be WTF, when users try to
access string offset over 32bit values. Data dealt with PHP is getting
larger
and larger. It would be an issue sooner or later.
Not likely, unless somehow PHP becomes language of choice for processing
big data. Which I don't see exactly happening. But if it ever happens,
we can deal with it then.
It is here today and has been for many years now. While the number of
records may not be large, using large offsets can result in passing the
32bit boundary and what was an integer flips ...
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Similar argument applies to string also. It would be WTF, when users try to
access string offset over 32bit values. Data dealt with PHP is getting
larger
and larger. It would be an issue sooner or later.
Not likely, unless somehow PHP becomes language of choice for processing
big data. Which I don't see exactly happening. But if it ever happens,
we can deal with it then.It is here today and has been for many years now. While the number of records may not be large, using large offsets can result in passing the 32bit boundary and what was an integer flips ...
Any app which deals with Twitter is dealing with large, 64-bit numbers.
--
Andrea Faulds
http://ajf.me/
Hi Stas,
On Wed, May 21, 2014 at 6:17 AM, Stas Malyshev smalyshev@sugarcrm.comwrote:
64bit int if int is 64bit. I prefer consistency, so array int key is
better
to support
64bit int key. IMHO.Given that 99.9999% of PHP users will never need it, but 100% of PHP
users will pay in performance for each size increase, we need to be
careful here. "Consistency" is not more magic word than "security".Similar argument applies to string also. It would be WTF, when users try
to
access string offset over 32bit values. Data dealt with PHP is getting
larger
and larger. It would be an issue sooner or later.Not likely, unless somehow PHP becomes language of choice for processing
big data. Which I don't see exactly happening. But if it ever happens,
we can deal with it then.
I think many people don't like PHP and/or switching to other languages are
concerned
about consistency as language. Users will just switch tool, so these would
be issues.
We just loose users(I mean we loose developers).
I'm really concerned about performance as much as security/consistency like
you.
However, average web developers are not. Most developers prefer large and
slow
web app frameworks. This is the reality. I think we must face our
users(developers).
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Stas,
On Wed, May 21, 2014 at 6:17 AM, Stas Malyshev smalyshev@sugarcrm.comwrote:
64bit int if int is 64bit. I prefer consistency, so array int key is
better
to support
64bit int key. IMHO.Given that 99.9999% of PHP users will never need it, but 100% of PHP
users will pay in performance for each size increase, we need to be
careful here. "Consistency" is not more magic word than "security".Similar argument applies to string also. It would be WTF, when users try
to
access string offset over 32bit values. Data dealt with PHP is getting
larger
and larger. It would be an issue sooner or later.Not likely, unless somehow PHP becomes language of choice for processing
big data. Which I don't see exactly happening. But if it ever happens,
we can deal with it then.I think many people don't like PHP and/or switching to other languages are
concerned
about consistency as language. Users will just switch tool, so these would
be issues.
We just loose users(I mean we loose developers).I'm really concerned about performance as much as security/consistency like
you.
However, average web developers are not. Most developers prefer large and
slow
web app frameworks. This is the reality. I think we must face our
users(developers).
While it is hard to argue with that given the number of large and slow
frameworks out there, there are also many large sites (I have worked for
or helped a bunch of them) that care a lot about performance. They
tend to be under-represented both here and in the blogosphere in general
because they don't like to draw attention to any specifics of their
particular technology stacks. Instead they tend to come to me or some of
the other core guys and ask for direct help.
So let's not assume that just because one segment of the userbase
screams the loudest that this segment is representative. We have a large
and diverse userbase and we need to make balanced decisions.
For this particular case of >32-bit string offsets, that would mean that
there is a string in memory that is longer than 4G. It's not like you
can have a sparse string, so you would actually have to fill it beyond
4G. That is an extremely rare use-case and chances are that manipulating
4G blobs of data is way more efficiently implemented through some sort
of streamed/chunked mechanism as opposed to pulling the entire thing
into memory at once. If we could support it with 0 performance penalty,
ok, but given that there is a penalty we have to weigh the likelihood of
someone needing this in the next 5-8 years against the cost. To me this
is an obvious case that I would balance towards performance.
-Rasmus
So let's not assume that just because one segment of the userbase
screams the loudest that this segment is representative. We have a large
and diverse userbase and we need to make balanced decisions.
Handling BIGINT variables as a single 64bit number rather than as a
string of characters would seem to be the more productive way of
working, but is incompatible with 32bit platforms? It's working out
exactly which path is the most productive which is the current question.
If a 64bit integer is supported as a native variable, would it be
practical to have it available on a 32bit platform? Having potentially
to have to code for two different scenarios is not the best way of
moving forward?
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
So let's not assume that just because one segment of the userbase
screams the loudest that this segment is representative. We have a large
and diverse userbase and we need to make balanced decisions.Handling BIGINT variables as a single 64bit number rather than as a
string of characters would seem to be the more productive way of
working, but is incompatible with 32bit platforms? It's working out
exactly which path is the most productive which is the current question.
If a 64bit integer is supported as a native variable, would it be
practical to have it available on a 32bit platform? Having potentially
to have to code for two different scenarios is not the best way of
moving forward?
That has absolutely nothing to do with 64-bit string offsets which is
what we are discussing here.
-Rasmus
So let's not assume that just because one segment of the userbase
screams the loudest that this segment is representative. We have a large
and diverse userbase and we need to make balanced decisions.Handling BIGINT variables as a single 64bit number rather than as a
string of characters would seem to be the more productive way of
working, but is incompatible with 32bit platforms? It's working out
exactly which path is the most productive which is the current question.
If a 64bit integer is supported as a native variable, would it be
practical to have it available on a 32bit platform? Having potentially
to have to code for two different scenarios is not the best way of
moving forward?That has absolutely nothing to do with 64-bit string offsets which is
what we are discussing here.
But integer is wrapped up in the very same RFC ... and the handling of
that is just as important ...
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
Hello Yasuo,
I know a lot of companies that switched (or plan to switch) to 5.5 only because of better performance.
Regards,
Thomas
Yasuo Ohgaki wrote on 21.05.2014 00:35:
Hi Stas,
On Wed, May 21, 2014 at 6:17 AM, Stas Malyshev smalyshev@sugarcrm.comwrote:
64bit int if int is 64bit. I prefer consistency, so array int key is
better
to support
64bit int key. IMHO.Given that 99.9999% of PHP users will never need it, but 100% of PHP
users will pay in performance for each size increase, we need to be
careful here. "Consistency" is not more magic word than "security".Similar argument applies to string also. It would be WTF, when users try
to
access string offset over 32bit values. Data dealt with PHP is getting
larger
and larger. It would be an issue sooner or later.Not likely, unless somehow PHP becomes language of choice for processing
big data. Which I don't see exactly happening. But if it ever happens,
we can deal with it then.I think many people don't like PHP and/or switching to other languages are
concerned
about consistency as language. Users will just switch tool, so these would
be issues.
We just loose users(I mean we loose developers).I'm really concerned about performance as much as security/consistency like
you.
However, average web developers are not. Most developers prefer large and
slow
web app frameworks. This is the reality. I think we must face our
users(developers).Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Thomas,
I know a lot of companies that switched (or plan to switch) to 5.5 only
because of better performance.
I know such companies, too. Company does not care much what
language is used for there application.
I'm worried about users who are willing to write PHP codes.
Especially for hackers who write great codes.
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
Hi Thomas,
I know a lot of companies that switched (or plan to switch) to 5.5 only
because of better performance.I know such companies, too. Company does not care much what
language is used for there application.
FWIW, I've never come across companies who cared about 64-bit
anything, except for integer sizes. Now array sizes, string offsets
or integer array offsets. Even those who care about 64-bit binaries
typically do it due to being misinformed that it's somehow help them,
when in reality it'll give them nothing but reduced performance and
increased memory footprint.
On the flip side, I've come across countless companies (and
developers) who care about performance. Actually it's rare to find
ones that don't.
Zeev
On the flip side, I've come across countless companies (and
developers) who care about performance. Actually it's rare to find
ones that don't.
Right.
And even more companies I meet care about clean APIs, stable releases,
BC, clean and safe code, documentation (internals APIs too), etc. much
more than performance as it costs them much more than adding hardware
from a development and update processes point of view.
Right now, you care only about performance, raw performance, this is
good but it is by far not only what companies and users look for.
--
Pierre
@pierrejoye | http://www.libgd.org
2014-05-21 9:23 GMT+03:00 Pierre Joye pierre.php@gmail.com:
On the flip side, I've come across countless companies (and
developers) who care about performance. Actually it's rare to find
ones that don't.Right.
And even more companies I meet care about clean APIs, stable releases,
BC, clean and safe code, documentation (internals APIs too), etc. much
more than performance as it costs them much more than adding hardware
from a development and update processes point of view.Right now, you care only about performance, raw performance, this is
good but it is by far not only what companies and users look for.--
Pierre
Performance always had and will have the flipside - you either make an
extremely fast, but somewhat messy code, or you sacrifise some performance
and make a better code. same goes for consistensy - you have to sacrifise
some performance for it. The hard thing always is to balance it all. No
mere 2%-5% performance increase or decrease will affect a poorly written
system - it will probably get a tripple number performance improvement if
just redisigned and rebuilt.
Actually, vk.com (thats russiansocial network, vkontakte and they, they are
just as facebook - PHP based) gone the messy way - they absolutely excluded
OOP from their code base - it's all procedural and they made something
similar to a HipHop translator based on all that to get their performance
with additions like type hinting and other stuff.
I agree with Pierre here - thinking only about performance is a slippery
road to take. All that "incredible" performance is worth nothing if on a
native 64 bit system I suddenly hit a 32 bit limit on some part of the
language (good thing I don't have Windows servers, because I hit some of
those issues on my development PC that runs on 64 bit Win7, but that's
irrelevant, just to show that 32 bit integers are a limitation right now
right here. I needed to do financial stuff on bitcoins, that meant I needed
to floor($bitcoin_sum * pow(10, 8)), sum some numbers up and at this point
I got out of 32 bit signed integer range). And at that point I can't wait
for next major release for it to be fixed (if the bug report even will be
taken into considiration and not dissmissed as a rare case).
Sp, to sum it all up, this is a big change that will come only in major
verion numbers, like PHP.next. So, if right some part of the 64 bit patch
is throttled back to 32 bit int usage, you need to be damn sure that it's
limits will not be hit in near 7-10 years, because next major language
update probably will not happen earlier (2 years at least to get the
current PHP.next, then 4-5 minor releases with yearly cycle and a 2-3 more
years for PHP.next.next).
What worries me, reading this list, is I don't see a future proofing
attitude at all. And mostly people just ignore what the userland developers
say - you get just dismissed on the grounds that some enterprise shadowy
guys come up to you and say that they need perormance at all costs (I
readed just like that, sorry). Ok, I can use that card to - I'm currently
building a geo-replicated system for a bussiness that does not really want
to share it's guts, but in 2 years it will have millions of clients accros
the EU, handling a lot of data with global id's and so on. And though I do
care about the performance, I care a lot more about the consistency. That I
don't hit some stupid 32 bit limitation somewhere in the comming years
because right now someone decided that they should save 32 bits of memory
on the datatype in a certain structure. And at that point i'm just forced
to switch instruments. And I may not need to do that because of the
performance - if PHP history shows anything, it is it's ability to speed up
and actually having much better performance than expected. PHP 5.4 was a
big surprise, 5.5 did even more. Unoptimized PoC PHPNG shows 20% to 30%
performance increase even further. PHP is one of the fastest scripting
languages right now. With the tools to seriously to speed it up with
additional tools like HHVM or Zephyr. BTW, what about the Zephyr? I bet it
would be affected by all this too. It's basically C code when compiled -
and C code can do much more much faster than PHP does. Will I be able to
handle an array bigger than 2^32 elements cleanly so I can access it from
the userland?
I would like to see some thoughts on all those questions by everyone.
Because to be fair, it's not all about the performance and memory. Test
show, that 64 bit patch does not really affect those in any major way and
properly designed application with optimizations made in 5.4 in mind is
affected on mesurment leavay level and even has better performance (I don't
remember if actual numbers where provided - I think we need a spreadsheet
where to consolidate all that info).
What I actually would like also to see is a phpng tested with full
integration of 64 bit patch as it is now and the optimized version by the
authors of phpng as they see it. And compare it. Because there is no hard
facts about the performance of 64 bit vs 32 bit type usage - just
speculation about CPU cache misses and so on. And I learned over the years
that people, designing those CPU's, are sneaky and sometimes results defy
the logic. And there are hints that actually full 64 bit types may be
faster than 32 bit usage for memory optimization.
Thanks,
Arvids.
And compare it. Because there is no hard
facts about the performance of 64 bit vs 32 bit type usage - just
speculation about CPU cache misses and so on. And I learned over the years
that people, designing those CPU's, are sneaky and sometimes results defy
the logic. And there are hints that actually full 64 bit types may be
faster than 32 bit usage for memory optimization.
Having been looking at the hardware side of things again, even older
32bit processors support the MMX/SSX registers and instructions which
provide 64bit maths on these processors and the latest AVX extensions
already provide 256bit maths on more modern processors since 2011. These
registers also provide parallel processing of smaller data sizes being
part of the SIMD facility,
Compatibility across processors is a problem, but even the higher spec
VIA processors support it ... along with support for hardware SHA
hashing. If performance is critical, then targeting a specific hardware
configuration may help, but there may also be opportunities for
selective improvements targeting the available hardware.
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
This is the reason why DB data values and Web inputs should not be
converted to native PHP data type blindly :)
It could work, but it cannot work with 64bit int.It may be changed not to use int key (use string key) when int key
overflows,
then it would work.However, many users are trying to use strict type. Users may experience
data validation errors for large int with 32bit int key. It would be
better to allow
64bit int if int is 64bit. I prefer consistency, so array int key is
better to support
64bit int key. IMHO.Similar argument applies to string also. It would be WTF, when users try to
access string offset over 32bit values. Data dealt with PHP is getting
larger
and larger. It would be an issue sooner or later.
As I have already said: 64bit (or rather 63bit) integer keys are already
supported on LP64 and ILP64 platforms. This proposal supports them on LLP64
as well. The issue you're hitting has nothing to do with 64bit integers -
you are using a key that is way larger than that, i.e. a double key.
Double keys are handled by converting them to integers frist. This
conversion is a wraparound conversion, not a clamp conversion, so you end
up with some meaningless negative number.
If you want to change the handling of double keys, that is - as far as I
can see - an entirely different proposal, that's only tangentially related
to 64bit support.
Nikita
Hi Nikita,
As I have already said: 64bit (or rather 63bit) integer keys are already
supported on LP64 and ILP64 platforms. This proposal supports them on LLP64
as well. The issue you're hitting has nothing to do with 64bit integers -
you are using a key that is way larger than that, i.e. a double key.
Double keys are handled by converting them to integers frist. This
conversion is a wraparound conversion, not a clamp conversion, so you end
up with some meaningless negative number.
It was wrong example. However, if array int key size is 32bit, 64bit value
would never fit.
I agree Rasmus's argument about string offset. If anyone would
like to deal with extremely large data, they should use stream or like.
It just wouldn't work well without it now.
Regardless of this RFC acceptance or not, it's nice if all of these
limitations
are documented in the manual under single section. Perhaps, new
appendix?
Regards,
--
Yasuo Ohgaki
yohgaki@ohgaki.net
As I have already said: 64bit (or rather 63bit) integer keys are already
supported on LP64 and ILP64 platforms. This proposal supports them on LLP64
as well. The issue you're hitting has nothing to do with 64bit integers -
you are using a key that isway larger than that, i.e. a double key.
Double keys are handled by converting them to integers frist. This
conversion is a wraparound conversion, not a clamp conversion, so you end
up with some meaningless negative number.It was wrong example. However, if array int key size is 32bit, 64bit value
would never fit.I agree Rasmus's argument about string offset. If anyone would
like to deal with extremely large data, they should use stream or like.
It just wouldn't work well without it now.Regardless of this RFC acceptance or not, it's nice if all of these
limitations
are documented in the manual under single section. Perhaps, new
appendix?
Of cause what adds to the fun here is the mixing of 'array' keys and
'string' keys in some of the recent 'improvements' to the language. If
the array keys allow for 64bit integer values because 'integer' is a 64
bit value, then the mixing of string location as an additional dimension
of the array should ideally match that? Personally I don't think the two
have any place in the same context, but this has been added and now the
additional fallout needs documenting? That a string integer index is
different to an array integer index is the potential problem.
Personally I would prefer that the use of the string index in this way
was deprecated, and the nature of the string object was maintained
within that object. Which would allow additional performance
improvements if a string object limited to 16bit was also allowed? The
string object would also then allow for UTF-8 support because the
problem of string offsets are eliminated from the array indexing
calculations?
Will discussions in isolation are all very well, many of these elements
are all interrelated ...
--
Lester Caine - G8HFL
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk