Hi internals!
The discussions around the patch for improved 64bit support have
deteriorated from a semi-reasonable discussions to pointless accusations
and name-calling. I'd like to get back to discussing this issue from a
technical point of view and see which parts of the patch can be salvaged
and which can not.
As the current voting results show, the 64bit changes, if accepted, will be
integrated into phpng rather than into master. As such I will argue purely
from a phpng perspective.
Before going any further, it should be established that two aspects of the
64 bit patch are, as far as I can see, acceptable to everyone: The
introduction of 64bit integers on Win64 and other LLP64 architectures and
the use of an unsigned integer type for lengths and sizes.
The disputed aspect of the patch is the use of 64 bit lengths and sizes. It
has been argued that the use of 64bit sizes will significantly increase
memory consumption of the phpng branch and may also negatively impact
performance.
If you take a look at an early patch for porting the 64bit changes for
phpng 1 you'll find that these are very real concerns, as the changes
increase the sizes of many commonly used data structures.
However, this is only the case if 64bit sizes are blindly used in every
possible place. In the following I'll argue how a more careful choice of
the used size type in key places can make this patch a lot more realistic.
First of all it should be noted that the patch introduces size_t usage in
several places where, as far as I can see, it is not necessary. For example
it changes the storage type of line numbers from zend_uint to zend_size_t:
From zend_opline:
- uint lineno;
+ zend_size_t lineno;
From zend_class_entry:
- zend_uint line_start;
- zend_uint line_end;
+ zend_size_t line_start;
+ zend_size_t line_end;
I don't think we need to concern ourselves about files with more than 4
billion lines.
Similarly the patch also changes the length of argument names and class
typehints to 64bit:
From zend_arg_info:
- zend_uint name_len;
+ zend_size_t name_len;
const char *class_name;
- zend_uint class_name_len;
+ zend_size_t class_name_len;
This once again seem rather unnecessary and wasteful. Another case are the
number of arguments of a function:
From zend_op_array:
- zend_uint num_args;
- zend_uint required_num_args;
+ zend_size_t num_args;
+ zend_size_t required_num_args;
Again it seems doubtful whether functions with more than 4 billion
arguments are quite relevant.
While there are more examples than these, I'll not go into this point any
further, as the pattern should be clear. Instead we'll be moving on to the
zend_string and HashTable structures.
For the HashTable structure, the diff currently looks as follows:
- zend_uint nTableSize;
- zend_uint nTableMask;
- zend_uint nNumUsed;
- zend_uint nNumOfElements;
- long nNextFreeElement;
+ zend_uint_t nTableSize;
+ zend_uint_t nTableMask;
+ zend_uint_t nNumUsed;
+ zend_uint_t nNumOfElements;
+ zend_int_t nNextFreeElement;
Bucket *arData;
- zend_uint *arHash;
+ zend_uint_t *arHash;
dtor_func_t pDestructor;
zend_uint nInternalPointer;
This actually misses changing nInternalPointer. If we change that one as
well and account for differences in alignment, the total increase for the
HashTable structure is 6*4 = 24 bytes. For a heavily used structure, that's
a lot. However, even worse are the per-bucket memory increases:
From the diff above you can already see that the arHash array will be twice
as large, i.e. you'll get an additional 4 bytes per bucket. However - and
this is currently not represented in the patch stub - changing hashtables
to 64bit lengths will make the use of Z_NEXT(bucket->val) to store the next
collision resolution index impossible. As such another 8 bytes per bucket
will be needed.
This brings us to a total increase of 24 bytes per hashtable and 12 bytes
per bucket. Especially the latter seems unacceptable to me.
As such, I would suggest to not change sizes for the hashtable structure
and limit these changes to other parts of the engine (strings in
particular).
To put this into context, consider what it actually means to have an array
with more than 4 billion elements (which is the point where the different
size type becomes relevant). With hashtables as implemented in master with
144 bytes per bucket, this would require 2^32 * 144 = 576 GB of memory. If
we introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of
memory for to create an array that can make use of those sizes.
I hope this visualizes that supporting 64bit sizes for hashtables is not
necessary. We can easily implement a size restriction in
zend_hash_do_resize. As hashtables go through a centralized API controlling
the size limitation is a lot more straightforward than for strings.
Now that we have covered the HashTable changes, we're only left with one
significant change. The diff for the zend_string structure is as follows:
zend_refcounted gc;
- zend_ulong h; /* hash value */
- int len;
+ zend_uint_t h; /* hash value */
+ zend_size_t len;
char val[1];
This means that a zend_string will on average be approximately 4 bytes
longer. (Approximately because the exact difference is subject to details
of the allocation process and depends on the string length distribution.)
I think that this difference might be acceptable, if this is the cost of
uniform string size usage in the codebase. I tested the impact that adding
4 bytes on the string structure has on one application and the result was a
change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That
difference is acceptable to me.
However I don't know how representative that result is and I don't
currently have time to set up a testing environment for something like
wordpress on a 64bit vm. I would appreciate it if somebody who already has
such a setup could run a test for this, to confirm that there is indeed no
large difference in memory consumption. Just add a "int unused" after the
len member and compare memory usage.
So, what I'm suggesting here is the following:
We implement 64bit integers on LLP64, we implement unsigned sizes, we
implement size_t for strings lengths. I honestly still don't fully
understand the necessity for the latter, but if the impact in phpng is
indeed low (in the 1% range), then I think I can compromise on that.
What we don't implement is indiscriminate usage of size_t or other 64bit
types all over the place (like for linenos, argument counts, argument
lengths etc) and usage of 64bit sizes for hashtables.
This would be acceptable for me and I hope that it would also be acceptable
for others.
While we're at it, I would also like to quickly touch on the topic of
naming and renaming. I hope it is clear now that we need both 64bit types
for some things and 32bit types for others (like HT sizes, linenos, etc).
As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit
platforms of course) and zend_(u)int_t as the 32bit type. This more or less
matches our current naming and hopefully avoids confusion.
With that naming convention, we should also continue using IS_LONG and
Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially as
they conflict with other zpp changes in phpng. I understand why these
changes were beneficial in the original 64bit patch, however considering
that the usage of many APIs changed in phpng and extensions need to be
carefully reviewed anyway, I don't think these renames make a lot of sense
anymore.
So, that's my opinion on how we should proceed with the 64bit patch. I very
much hope that we can reach some consensus about this.
Credits: This mail is the result of a discussion between Anthony, Bob and
myself.
Nikita
Nikita,
I think your email is spot on, except for one thing. Thankfully, there
haven't any accusations or name calling - at least none that I've seen. And
that's a good thing, even when we have a heated debate.
I'll vote Yes on an RFC-translated version of this email in a heartbeat.
Zeev
-----Original Message-----
From: Nikita Popov [mailto:nikita.ppv@gmail.com]
Sent: Saturday, May 17, 2014 9:27 PM
To: PHP internals
Subject: [PHP-DEV] Rethinking 64bit sizes and PHP-NGHi internals!
The discussions around the patch for improved 64bit support have
deteriorated
from a semi-reasonable discussions to pointless accusations and
name-calling.
I'd like to get back to discussing this issue from a technical point of
view and see
which parts of the patch can be salvaged and which can not.As the current voting results show, the 64bit changes, if accepted, will
be
integrated into phpng rather than into master. As such I will argue purely
from a
phpng perspective.Before going any further, it should be established that two aspects of the
64 bit patch are, as far as I can see, acceptable to everyone: The
introduction of
64bit integers on Win64 and other LLP64 architectures and the use of an
unsigned integer type for lengths and sizes.The disputed aspect of the patch is the use of 64 bit lengths and sizes.
It has
been argued that the use of 64bit sizes will significantly increase memory
consumption of the phpng branch and may also negatively impact
performance.If you take a look at an early patch for porting the 64bit changes for
phpng 1
you'll find that these are very real concerns, as the changes increase the
sizes of
many commonly used data structures.However, this is only the case if 64bit sizes are blindly used in every
possible
place. In the following I'll argue how a more careful choice of the used
size type
in key places can make this patch a lot more realistic.First of all it should be noted that the patch introduces size_t usage in
several
places where, as far as I can see, it is not necessary. For example it
changes the
storage type of line numbers from zend_uint to zend_size_t:From zend_opline:
- uint lineno;
+ zend_size_t lineno;From zend_class_entry:
- zend_uint line_start;
- zend_uint line_end;
+ zend_size_t line_start;
+ zend_size_t line_end;I don't think we need to concern ourselves about files with more than 4
billion
lines.Similarly the patch also changes the length of argument names and class
typehints to 64bit:From zend_arg_info:
- zend_uint name_len;
+ zend_size_t name_len;
const char *class_name;
- zend_uint class_name_len;
+ zend_size_t class_name_len;This once again seem rather unnecessary and wasteful. Another case are the
number of arguments of a function:From zend_op_array:
- zend_uint num_args;
- zend_uint required_num_args;
+ zend_size_t num_args;
+ zend_size_t required_num_args;Again it seems doubtful whether functions with more than 4 billion
arguments
are quite relevant.While there are more examples than these, I'll not go into this point any
further,
as the pattern should be clear. Instead we'll be moving on to the
zend_string and
HashTable structures.For the HashTable structure, the diff currently looks as follows:
- zend_uint nTableSize; - zend_uint nTableMask; - zend_uint nNumUsed; - zend_uint nNumOfElements; - long nNextFreeElement; + zend_uint_t nTableSize; + zend_uint_t nTableMask; + zend_uint_t nNumUsed; + zend_uint_t nNumOfElements; + zend_int_t nNextFreeElement; Bucket *arData; - zend_uint *arHash; + zend_uint_t *arHash; dtor_func_t pDestructor; zend_uint nInternalPointer;
This actually misses changing nInternalPointer. If we change that one as
well
and account for differences in alignment, the total increase for the
HashTable
structure is 6*4 = 24 bytes. For a heavily used structure, that's a lot.
However,
even worse are the per-bucket memory increases:From the diff above you can already see that the arHash array will be
twice as
large, i.e. you'll get an additional 4 bytes per bucket. However - and
this is
currently not represented in the patch stub - changing hashtables to 64bit
lengths will make the use of Z_NEXT(bucket->val) to store the next
collision
resolution index impossible. As such another 8 bytes per bucket will be
needed.This brings us to a total increase of 24 bytes per hashtable and 12 bytes
per
bucket. Especially the latter seems unacceptable to me.As such, I would suggest to not change sizes for the hashtable structure
and
limit these changes to other parts of the engine (strings in particular).To put this into context, consider what it actually means to have an array
with
more than 4 billion elements (which is the point where the different size
type
becomes relevant). With hashtables as implemented in master with
144 bytes per bucket, this would require 2^32 * 144 = 576 GB of memory. If
we
introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of
memory for
to create an array that can make use of those sizes.I hope this visualizes that supporting 64bit sizes for hashtables is not
necessary.
We can easily implement a size restriction in zend_hash_do_resize. As
hashtables go through a centralized API controlling the size limitation is
a lot
more straightforward than for strings.Now that we have covered the HashTable changes, we're only left with one
significant change. The diff for the zend_string structure is as follows:zend_refcounted gc; - zend_ulong h; /* hash value */ - int len; + zend_uint_t h; /* hash value */ + zend_size_t len; char val[1];
This means that a zend_string will on average be approximately 4 bytes
longer.
(Approximately because the exact difference is subject to details of the
allocation process and depends on the string length distribution.)I think that this difference might be acceptable, if this is the cost of
uniform
string size usage in the codebase. I tested the impact that adding
4 bytes on the string structure has on one application and the result was
a
change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That
difference is acceptable to me.However I don't know how representative that result is and I don't
currently
have time to set up a testing environment for something like wordpress on
a
64bit vm. I would appreciate it if somebody who already has such a setup
could
run a test for this, to confirm that there is indeed no large difference
in memory
consumption. Just add a "int unused" after the len member and compare
memory usage.So, what I'm suggesting here is the following:
We implement 64bit integers on LLP64, we implement unsigned sizes, we
implement size_t for strings lengths. I honestly still don't fully
understand the
necessity for the latter, but if the impact in phpng is indeed low (in the
1%
range), then I think I can compromise on that.What we don't implement is indiscriminate usage of size_t or other 64bit
types
all over the place (like for linenos, argument counts, argument lengths
etc) and
usage of 64bit sizes for hashtables.This would be acceptable for me and I hope that it would also be
acceptable for
others.While we're at it, I would also like to quickly touch on the topic of
naming and
renaming. I hope it is clear now that we need both 64bit types for some
things
and 32bit types for others (like HT sizes, linenos, etc).
As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit
platforms of course) and zend_(u)int_t as the 32bit type. This more or
less
matches our current naming and hopefully avoids confusion.With that naming convention, we should also continue using IS_LONG and
Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially
as
they conflict with other zpp changes in phpng. I understand why these
changes
were beneficial in the original 64bit patch, however considering that the
usage
of many APIs changed in phpng and extensions need to be carefully reviewed
anyway, I don't think these renames make a lot of sense anymore.So, that's my opinion on how we should proceed with the 64bit patch. I
very
much hope that we can reach some consensus about this.Credits: This mail is the result of a discussion between Anthony, Bob and
myself.Nikita
On Sat, 17 May 2014 22:27:02 +0400, Nikita Popov nikita.ppv@gmail.com
wrote:
Hi internals!
The discussions around the patch for improved 64bit support have
deteriorated from a semi-reasonable discussions to pointless accusations
and name-calling. I'd like to get back to discussing this issue from a
technical point of view and see which parts of the patch can be salvaged
and which can not.
Hey,
First thank you for arranging all this points.
I just wanted to ask: could we also make nNextFreeElement be unsigned?
Because with your proposal it would be possible to overflow it with the
code like this: $array[pow(2, 32) + 1] = 123; (if I understand it
correctly)
And besides that we can't have negative keys anyway.
Hey,
First thank you for arranging all this points.
I just wanted to ask: could we also make nNextFreeElement be unsigned?
Because with your proposal it would be possible to overflow it with the
code like this: $array[pow(2, 32) + 1] = 123; (if I understand it correctly)And besides that we can't have negative keys anyway.
PHP arrays store integral keys larger than PHP_INT_MAX
as a string rather
than an integer. Auto-incrementing does not work past that point. If you
write something like this...
$array[PHP_INT_MAX] = 42;
$array[] = 43;
...you'll receive an error (this is intentional). You get that error
because the code explicitly makes sure that nNextFreeElement can never
overflow. (Here is an example of such an overflow check:
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_hash.c#435).
Nikita
On Sat, 17 May 2014 23:11:13 +0400, Nikita Popov nikita.ppv@gmail.com
wrote:
On Sat, May 17, 2014 at 9:01 PM, Nikita Nefedov inefedor@gmail.com
wrote:Hey,
First thank you for arranging all this points.
I just wanted to ask: could we also make nNextFreeElement be unsigned?
Because with your proposal it >>would be possible to overflow it with
the code like this: $array[pow(2, 32) + 1] = 123; (if I understand >>it
correctly)And besides that we can't have negative keys anyway.
PHP arrays store integral keys larger than
PHP_INT_MAX
as a string
rather than an integer. Auto->incrementing does not work past that
point. If you write something like this...$array[PHP_INT_MAX] = 42;
$array[] = 43;...you'll receive an error (this is intentional). You get that error
because the code explicitly makes >sure that nNextFreeElement can never
overflow. (Here is an example of such an overflow check:
http://>lxr.php.net/xref/PHP_TRUNK/Zend/zend_hash.c#435).Nikita
Ah well, my fault didn't see it :)
Hi internals!
The discussions around the patch for improved 64bit support have
deteriorated from a semi-reasonable discussions to pointless accusations
and name-calling. I'd like to get back to discussing this issue from a
technical point of view and see which parts of the patch can be salvaged
and which can not.As the current voting results show, the 64bit changes, if accepted, will be
integrated into phpng rather than into master. As such I will argue purely
from a phpng perspective.Before going any further, it should be established that two aspects of the
64 bit patch are, as far as I can see, acceptable to everyone: The
introduction of 64bit integers on Win64 and other LLP64 architectures and
the use of an unsigned integer type for lengths and sizes.The disputed aspect of the patch is the use of 64 bit lengths and sizes. It
has been argued that the use of 64bit sizes will significantly increase
memory consumption of the phpng branch and may also negatively impact
performance.If you take a look at an early patch for porting the 64bit changes for
phpng 1 you'll find that these are very real concerns, as the changes
increase the sizes of many commonly used data structures.However, this is only the case if 64bit sizes are blindly used in every
possible place. In the following I'll argue how a more careful choice of
the used size type in key places can make this patch a lot more realistic.First of all it should be noted that the patch introduces size_t usage in
several places where, as far as I can see, it is not necessary. For example
it changes the storage type of line numbers from zend_uint to zend_size_t:From zend_opline:
- uint lineno;
+ zend_size_t lineno;From zend_class_entry:
- zend_uint line_start;
- zend_uint line_end;
+ zend_size_t line_start;
+ zend_size_t line_end;I don't think we need to concern ourselves about files with more than 4
billion lines.Similarly the patch also changes the length of argument names and class
typehints to 64bit:From zend_arg_info:
- zend_uint name_len;
+ zend_size_t name_len;
const char *class_name;
- zend_uint class_name_len;
+ zend_size_t class_name_len;This once again seem rather unnecessary and wasteful. Another case are the
number of arguments of a function:From zend_op_array:
- zend_uint num_args;
- zend_uint required_num_args;
+ zend_size_t num_args;
+ zend_size_t required_num_args;Again it seems doubtful whether functions with more than 4 billion
arguments are quite relevant.While there are more examples than these, I'll not go into this point any
further, as the pattern should be clear. Instead we'll be moving on to the
zend_string and HashTable structures.For the HashTable structure, the diff currently looks as follows:
- zend_uint nTableSize; - zend_uint nTableMask; - zend_uint nNumUsed; - zend_uint nNumOfElements; - long nNextFreeElement; + zend_uint_t nTableSize; + zend_uint_t nTableMask; + zend_uint_t nNumUsed; + zend_uint_t nNumOfElements; + zend_int_t nNextFreeElement; Bucket *arData; - zend_uint *arHash; + zend_uint_t *arHash; dtor_func_t pDestructor; zend_uint nInternalPointer;
This actually misses changing nInternalPointer. If we change that one as
well and account for differences in alignment, the total increase for the
HashTable structure is 6*4 = 24 bytes. For a heavily used structure, that's
a lot. However, even worse are the per-bucket memory increases:From the diff above you can already see that the arHash array will be twice
as large, i.e. you'll get an additional 4 bytes per bucket. However - and
this is currently not represented in the patch stub - changing hashtables
to 64bit lengths will make the use of Z_NEXT(bucket->val) to store the next
collision resolution index impossible. As such another 8 bytes per bucket
will be needed.This brings us to a total increase of 24 bytes per hashtable and 12 bytes
per bucket. Especially the latter seems unacceptable to me.As such, I would suggest to not change sizes for the hashtable structure
and limit these changes to other parts of the engine (strings in
particular).To put this into context, consider what it actually means to have an array
with more than 4 billion elements (which is the point where the different
size type becomes relevant). With hashtables as implemented in master with
144 bytes per bucket, this would require 2^32 * 144 = 576 GB of memory. If
we introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of
memory for to create an array that can make use of those sizes.I hope this visualizes that supporting 64bit sizes for hashtables is not
necessary. We can easily implement a size restriction in
zend_hash_do_resize. As hashtables go through a centralized API controlling
the size limitation is a lot more straightforward than for strings.Now that we have covered the HashTable changes, we're only left with one
significant change. The diff for the zend_string structure is as follows:zend_refcounted gc; - zend_ulong h; /* hash value */ - int len; + zend_uint_t h; /* hash value */ + zend_size_t len; char val[1];
This means that a zend_string will on average be approximately 4 bytes
longer. (Approximately because the exact difference is subject to details
of the allocation process and depends on the string length distribution.)I think that this difference might be acceptable, if this is the cost of
uniform string size usage in the codebase. I tested the impact that adding
4 bytes on the string structure has on one application and the result was a
change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That
difference is acceptable to me.However I don't know how representative that result is and I don't
currently have time to set up a testing environment for something like
wordpress on a 64bit vm. I would appreciate it if somebody who already has
such a setup could run a test for this, to confirm that there is indeed no
large difference in memory consumption. Just add a "int unused" after the
len member and compare memory usage.So, what I'm suggesting here is the following:
We implement 64bit integers on LLP64, we implement unsigned sizes, we
implement size_t for strings lengths. I honestly still don't fully
understand the necessity for the latter, but if the impact in phpng is
indeed low (in the 1% range), then I think I can compromise on that.What we don't implement is indiscriminate usage of size_t or other 64bit
types all over the place (like for linenos, argument counts, argument
lengths etc) and usage of 64bit sizes for hashtables.This would be acceptable for me and I hope that it would also be acceptable
for others.While we're at it, I would also like to quickly touch on the topic of
naming and renaming. I hope it is clear now that we need both 64bit types
for some things and 32bit types for others (like HT sizes, linenos, etc).
As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit
platforms of course) and zend_(u)int_t as the 32bit type. This more or less
matches our current naming and hopefully avoids confusion.With that naming convention, we should also continue using IS_LONG and
Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially as
they conflict with other zpp changes in phpng. I understand why these
changes were beneficial in the original 64bit patch, however considering
that the usage of many APIs changed in phpng and extensions need to be
carefully reviewed anyway, I don't think these renames make a lot of sense
anymore.So, that's my opinion on how we should proceed with the 64bit patch. I very
much hope that we can reach some consensus about this.Credits: This mail is the result of a discussion between Anthony, Bob and
myself.Nikita
First of all: thank you for bringing back the discussion to the technical
level.
I would really like if Pierre and Anatol could reply what do they think
about this, as based on the previous discussion, the changes proposed here
would be acceptable for most people voting no for the original rfc, and it
is only sacrifices parts which was already considered as a "side-effect"
not a goal of the rfc anyways.
--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
Hi Nikita,
Thanks for this deep analyzes.
I hope, this will lead us to agreement.
I see no problems with accepting IS_LONG part of the patch.
Then we may think about string length.
As you pointed, it makes no sense to use size_t everywhere, and may be
using it only in "right" places won't make serious harm.
Thanks. Dmitry.
Hi internals!
The discussions around the patch for improved 64bit support have
deteriorated from a semi-reasonable discussions to pointless accusations
and name-calling. I'd like to get back to discussing this issue from a
technical point of view and see which parts of the patch can be salvaged
and which can not.As the current voting results show, the 64bit changes, if accepted, will be
integrated into phpng rather than into master. As such I will argue purely
from a phpng perspective.Before going any further, it should be established that two aspects of the
64 bit patch are, as far as I can see, acceptable to everyone: The
introduction of 64bit integers on Win64 and other LLP64 architectures and
the use of an unsigned integer type for lengths and sizes.The disputed aspect of the patch is the use of 64 bit lengths and sizes. It
has been argued that the use of 64bit sizes will significantly increase
memory consumption of the phpng branch and may also negatively impact
performance.If you take a look at an early patch for porting the 64bit changes for
phpng 1 you'll find that these are very real concerns, as the changes
increase the sizes of many commonly used data structures.However, this is only the case if 64bit sizes are blindly used in every
possible place. In the following I'll argue how a more careful choice of
the used size type in key places can make this patch a lot more realistic.First of all it should be noted that the patch introduces size_t usage in
several places where, as far as I can see, it is not necessary. For example
it changes the storage type of line numbers from zend_uint to zend_size_t:From zend_opline:
- uint lineno;
+ zend_size_t lineno;From zend_class_entry:
- zend_uint line_start;
- zend_uint line_end;
+ zend_size_t line_start;
+ zend_size_t line_end;I don't think we need to concern ourselves about files with more than 4
billion lines.Similarly the patch also changes the length of argument names and class
typehints to 64bit:From zend_arg_info:
- zend_uint name_len;
+ zend_size_t name_len;
const char *class_name;
- zend_uint class_name_len;
+ zend_size_t class_name_len;This once again seem rather unnecessary and wasteful. Another case are the
number of arguments of a function:From zend_op_array:
- zend_uint num_args;
- zend_uint required_num_args;
+ zend_size_t num_args;
+ zend_size_t required_num_args;Again it seems doubtful whether functions with more than 4 billion
arguments are quite relevant.While there are more examples than these, I'll not go into this point any
further, as the pattern should be clear. Instead we'll be moving on to the
zend_string and HashTable structures.For the HashTable structure, the diff currently looks as follows:
- zend_uint nTableSize; - zend_uint nTableMask; - zend_uint nNumUsed; - zend_uint nNumOfElements; - long nNextFreeElement; + zend_uint_t nTableSize; + zend_uint_t nTableMask; + zend_uint_t nNumUsed; + zend_uint_t nNumOfElements; + zend_int_t nNextFreeElement; Bucket *arData; - zend_uint *arHash; + zend_uint_t *arHash; dtor_func_t pDestructor; zend_uint nInternalPointer;
This actually misses changing nInternalPointer. If we change that one as
well and account for differences in alignment, the total increase for the
HashTable structure is 6*4 = 24 bytes. For a heavily used structure, that's
a lot. However, even worse are the per-bucket memory increases:From the diff above you can already see that the arHash array will be twice
as large, i.e. you'll get an additional 4 bytes per bucket. However - and
this is currently not represented in the patch stub - changing hashtables
to 64bit lengths will make the use of Z_NEXT(bucket->val) to store the next
collision resolution index impossible. As such another 8 bytes per bucket
will be needed.This brings us to a total increase of 24 bytes per hashtable and 12 bytes
per bucket. Especially the latter seems unacceptable to me.As such, I would suggest to not change sizes for the hashtable structure
and limit these changes to other parts of the engine (strings in
particular).To put this into context, consider what it actually means to have an array
with more than 4 billion elements (which is the point where the different
size type becomes relevant). With hashtables as implemented in master with
144 bytes per bucket, this would require 2^32 * 144 = 576 GB of memory. If
we introduced 64bit hashtable sizes in phpng, you'd still need 206 GB of
memory for to create an array that can make use of those sizes.I hope this visualizes that supporting 64bit sizes for hashtables is not
necessary. We can easily implement a size restriction in
zend_hash_do_resize. As hashtables go through a centralized API controlling
the size limitation is a lot more straightforward than for strings.Now that we have covered the HashTable changes, we're only left with one
significant change. The diff for the zend_string structure is as follows:zend_refcounted gc; - zend_ulong h; /* hash value */ - int len; + zend_uint_t h; /* hash value */ + zend_size_t len; char val[1];
This means that a zend_string will on average be approximately 4 bytes
longer. (Approximately because the exact difference is subject to details
of the allocation process and depends on the string length distribution.)I think that this difference might be acceptable, if this is the cost of
uniform string size usage in the codebase. I tested the impact that adding
4 bytes on the string structure has on one application and the result was a
change from 38.2 to 38.5 MiB, which translates to a 0.7% difference. That
difference is acceptable to me.However I don't know how representative that result is and I don't
currently have time to set up a testing environment for something like
wordpress on a 64bit vm. I would appreciate it if somebody who already has
such a setup could run a test for this, to confirm that there is indeed no
large difference in memory consumption. Just add a "int unused" after the
len member and compare memory usage.So, what I'm suggesting here is the following:
We implement 64bit integers on LLP64, we implement unsigned sizes, we
implement size_t for strings lengths. I honestly still don't fully
understand the necessity for the latter, but if the impact in phpng is
indeed low (in the 1% range), then I think I can compromise on that.What we don't implement is indiscriminate usage of size_t or other 64bit
types all over the place (like for linenos, argument counts, argument
lengths etc) and usage of 64bit sizes for hashtables.This would be acceptable for me and I hope that it would also be acceptable
for others.While we're at it, I would also like to quickly touch on the topic of
naming and renaming. I hope it is clear now that we need both 64bit types
for some things and 32bit types for others (like HT sizes, linenos, etc).
As such I would suggest to use zend_(u)long_t as the 64bit type (on 64bit
platforms of course) and zend_(u)int_t as the 32bit type. This more or less
matches our current naming and hopefully avoids confusion.With that naming convention, we should also continue using IS_LONG and
Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially as
they conflict with other zpp changes in phpng. I understand why these
changes were beneficial in the original 64bit patch, however considering
that the usage of many APIs changed in phpng and extensions need to be
carefully reviewed anyway, I don't think these renames make a lot of sense
anymore.So, that's my opinion on how we should proceed with the 64bit patch. I very
much hope that we can reach some consensus about this.Credits: This mail is the result of a discussion between Anthony, Bob and
myself.Nikita
Hi,
Hi Nikita,
Thanks for this deep analyzes.
I hope, this will lead us to agreement.I see no problems with accepting IS_LONG part of the patch.
Then we may think about string length.
As you pointed, it makes no sense to use size_t everywhere, and may be
using it only in "right" places won't make serious harm.Thanks. Dmitry.
On Sat, May 17, 2014 at 10:27 PM, Nikita Popov nikita.ppv@gmail.com
wrote:Hi internals!
The discussions around the patch for improved 64bit support have
deteriorated from a semi-reasonable discussions to pointless accusations
and name-calling. I'd like to get back to discussing this issue from a
technical point of view and see which parts of the patch can be
salvaged and which can not.As the current voting results show, the 64bit changes, if accepted,
will be integrated into phpng rather than into master. As such I will
argue purely from a phpng perspective.Before going any further, it should be established that two aspects of
the 64 bit patch are, as far as I can see, acceptable to everyone: The
introduction of 64bit integers on Win64 and other LLP64 architectures
and the use of an unsigned integer type for lengths and sizes.The disputed aspect of the patch is the use of 64 bit lengths and
sizes. It has been argued that the use of 64bit sizes will significantly
increase memory consumption of the phpng branch and may also negatively
impact performance.If you take a look at an early patch for porting the 64bit changes for
phpng [1] you'll find that these are very real concerns, as the changes
increase the sizes of many commonly used data structures.However, this is only the case if 64bit sizes are blindly used in every
possible place. In the following I'll argue how a more careful choice
of the used size type in key places can make this patch a lot more
realistic.First of all it should be noted that the patch introduces size_t usage
in several places where, as far as I can see, it is not necessary. For
example it changes the storage type of line numbers from zend_uint to
zend_size_t:From zend_opline:
- uint lineno;
- zend_size_t lineno;
From zend_class_entry:
- zend_uint line_start;
- zend_uint line_end;
- zend_size_t line_start;
- zend_size_t line_end;
I don't think we need to concern ourselves about files with more than 4
billion lines.Similarly the patch also changes the length of argument names and class
typehints to 64bit:From zend_arg_info:
- zend_uint name_len;
- zend_size_t name_len;
const char *class_name; - zend_uint class_name_len;- zend_size_t class_name_len;
This once again seem rather unnecessary and wasteful. Another case are
the number of arguments of a function:From zend_op_array:
- zend_uint num_args;
- zend_uint required_num_args;
- zend_size_t num_args;
- zend_size_t required_num_args;
Again it seems doubtful whether functions with more than 4 billion
arguments are quite relevant.While there are more examples than these, I'll not go into this point
any further, as the pattern should be clear. Instead we'll be moving on
to the zend_string and HashTable structures.For the HashTable structure, the diff currently looks as follows:
- zend_uint nTableSize;
- zend_uint nTableMask;
- zend_uint nNumUsed;
- zend_uint nNumOfElements;
- long nNextFreeElement;
- zend_uint_t nTableSize;
- zend_uint_t nTableMask;
- zend_uint_t nNumUsed;
- zend_uint_t nNumOfElements;
- zend_int_t nNextFreeElement;
Bucket *arData;
- zend_uint *arHash;
- zend_uint_t *arHash;
dtor_func_t pDestructor; zend_uint nInternalPointer;This actually misses changing nInternalPointer. If we change that one
as well and account for differences in alignment, the total increase for
the HashTable structure is 6*4 = 24 bytes. For a heavily used structure,
that's a lot. However, even worse are the per-bucket memory increases:From the diff above you can already see that the arHash array will be
twice as large, i.e. you'll get an additional 4 bytes per bucket.
However - and
this is currently not represented in the patch stub - changing
hashtables to 64bit lengths will make the use of Z_NEXT(bucket->val) to
store the next collision resolution index impossible. As such another 8
bytes per bucket will be needed.This brings us to a total increase of 24 bytes per hashtable and 12
bytes per bucket. Especially the latter seems unacceptable to me.As such, I would suggest to not change sizes for the hashtable
structure and limit these changes to other parts of the engine (strings
in particular).To put this into context, consider what it actually means to have an
array with more than 4 billion elements (which is the point where the
different size type becomes relevant). With hashtables as implemented in
master with 144 bytes per bucket, this would require 2^32 * 144 = 576 GB
of memory. If we introduced 64bit hashtable sizes in phpng, you'd still
need 206 GB of memory for to create an array that can make use of those
sizes.I hope this visualizes that supporting 64bit sizes for hashtables is
not necessary. We can easily implement a size restriction in
zend_hash_do_resize. As hashtables go through a centralized API
controlling the size limitation is a lot more straightforward than for
strings.Now that we have covered the HashTable changes, we're only left with
one significant change. The diff for the zend_string structure is as
follows:zend_refcounted gc; - zend_ulong h; /* hash
value */ - int len;
- zend_uint_t h; /* hash value */
- zend_size_t len;
char val[1];This means that a zend_string will on average be approximately 4 bytes
longer. (Approximately because the exact difference is subject to
details of the allocation process and depends on the string length
distribution.)I think that this difference might be acceptable, if this is the cost
of uniform string size usage in the codebase. I tested the impact that
adding 4 bytes on the string structure has on one application and the
result was a change from 38.2 to 38.5 MiB, which translates to a 0.7%
difference. That difference is acceptable to me.However I don't know how representative that result is and I don't
currently have time to set up a testing environment for something like
wordpress on a 64bit vm. I would appreciate it if somebody who already
has such a setup could run a test for this, to confirm that there is
indeed no large difference in memory consumption. Just add a "int
unused" after the len member and compare memory usage.So, what I'm suggesting here is the following:
We implement 64bit integers on LLP64, we implement unsigned sizes, we
implement size_t for strings lengths. I honestly still don't fully
understand the necessity for the latter, but if the impact in phpng is
indeed low (in the 1% range), then I think I can compromise on that.What we don't implement is indiscriminate usage of size_t or other
64bit
types all over the place (like for linenos, argument counts, argument
lengths etc) and usage of 64bit sizes for hashtables.This would be acceptable for me and I hope that it would also be
acceptable for others.While we're at it, I would also like to quickly touch on the topic of
naming and renaming. I hope it is clear now that we need both 64bit
types for some things and 32bit types for others (like HT sizes,
linenos, etc). As such I would suggest to use zend_(u)long_t as the
64bit type (on 64bit
platforms of course) and zend_(u)int_t as the 32bit type. This more or
less matches our current naming and hopefully avoids confusion.With that naming convention, we should also continue using IS_LONG and
Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially
as they conflict with other zpp changes in phpng. I understand why these
changes were beneficial in the original 64bit patch, however
considering that the usage of many APIs changed in phpng and extensions
need to be carefully reviewed anyway, I don't think these renames make a
lot of sense anymore.So, that's my opinion on how we should proceed with the 64bit patch. I
very much hope that we can reach some consensus about this.Credits: This mail is the result of a discussion between Anthony, Bob
and myself.Nikita
I really appreciate you guys have looked inside the str_size_and_int64
patch. Dmitry even has compiled and tested it.
While the Nikitas notes sound pretty meaningful, let me ask you a one
simple question before proceeding with any details - how much time you
personally would spend on the integration?
Thanks
Anatol
I may take a part of the work. It's not a problem.
Thanks. Dmitry.
On Sun, May 18, 2014 at 12:37 AM, Anatol Belski anatol.php@belski.netwrote:
Hi,
Hi Nikita,
Thanks for this deep analyzes.
I hope, this will lead us to agreement.I see no problems with accepting IS_LONG part of the patch.
Then we may think about string length.
As you pointed, it makes no sense to use size_t everywhere, and may be
using it only in "right" places won't make serious harm.Thanks. Dmitry.
On Sat, May 17, 2014 at 10:27 PM, Nikita Popov nikita.ppv@gmail.com
wrote:Hi internals!
The discussions around the patch for improved 64bit support have
deteriorated from a semi-reasonable discussions to pointless accusations
and name-calling. I'd like to get back to discussing this issue from a
technical point of view and see which parts of the patch can be
salvaged and which can not.As the current voting results show, the 64bit changes, if accepted,
will be integrated into phpng rather than into master. As such I will
argue purely from a phpng perspective.Before going any further, it should be established that two aspects of
the 64 bit patch are, as far as I can see, acceptable to everyone: The
introduction of 64bit integers on Win64 and other LLP64 architectures
and the use of an unsigned integer type for lengths and sizes.The disputed aspect of the patch is the use of 64 bit lengths and
sizes. It has been argued that the use of 64bit sizes will significantly
increase memory consumption of the phpng branch and may also negatively
impact performance.If you take a look at an early patch for porting the 64bit changes for
phpng [1] you'll find that these are very real concerns, as the changes
increase the sizes of many commonly used data structures.However, this is only the case if 64bit sizes are blindly used in every
possible place. In the following I'll argue how a more careful choice
of the used size type in key places can make this patch a lot more
realistic.First of all it should be noted that the patch introduces size_t usage
in several places where, as far as I can see, it is not necessary. For
example it changes the storage type of line numbers from zend_uint to
zend_size_t:From zend_opline:
- uint lineno;
- zend_size_t lineno;
From zend_class_entry:
- zend_uint line_start;
- zend_uint line_end;
- zend_size_t line_start;
- zend_size_t line_end;
I don't think we need to concern ourselves about files with more than 4
billion lines.Similarly the patch also changes the length of argument names and class
typehints to 64bit:From zend_arg_info:
- zend_uint name_len;
- zend_size_t name_len;
const char *class_name; - zend_uint class_name_len;- zend_size_t class_name_len;
This once again seem rather unnecessary and wasteful. Another case are
the number of arguments of a function:From zend_op_array:
- zend_uint num_args;
- zend_uint required_num_args;
- zend_size_t num_args;
- zend_size_t required_num_args;
Again it seems doubtful whether functions with more than 4 billion
arguments are quite relevant.While there are more examples than these, I'll not go into this point
any further, as the pattern should be clear. Instead we'll be moving on
to the zend_string and HashTable structures.For the HashTable structure, the diff currently looks as follows:
- zend_uint nTableSize;
- zend_uint nTableMask;
- zend_uint nNumUsed;
- zend_uint nNumOfElements;
- long nNextFreeElement;
- zend_uint_t nTableSize;
- zend_uint_t nTableMask;
- zend_uint_t nNumUsed;
- zend_uint_t nNumOfElements;
- zend_int_t nNextFreeElement;
Bucket *arData;
- zend_uint *arHash;
- zend_uint_t *arHash;
dtor_func_t pDestructor; zend_uint nInternalPointer;This actually misses changing nInternalPointer. If we change that one
as well and account for differences in alignment, the total increase for
the HashTable structure is 6*4 = 24 bytes. For a heavily used structure,
that's a lot. However, even worse are the per-bucket memory increases:From the diff above you can already see that the arHash array will be
twice as large, i.e. you'll get an additional 4 bytes per bucket.
However - and
this is currently not represented in the patch stub - changing
hashtables to 64bit lengths will make the use of Z_NEXT(bucket->val) to
store the next collision resolution index impossible. As such another 8
bytes per bucket will be needed.This brings us to a total increase of 24 bytes per hashtable and 12
bytes per bucket. Especially the latter seems unacceptable to me.As such, I would suggest to not change sizes for the hashtable
structure and limit these changes to other parts of the engine (strings
in particular).To put this into context, consider what it actually means to have an
array with more than 4 billion elements (which is the point where the
different size type becomes relevant). With hashtables as implemented in
master with 144 bytes per bucket, this would require 2^32 * 144 = 576 GB
of memory. If we introduced 64bit hashtable sizes in phpng, you'd still
need 206 GB of memory for to create an array that can make use of those
sizes.I hope this visualizes that supporting 64bit sizes for hashtables is
not necessary. We can easily implement a size restriction in
zend_hash_do_resize. As hashtables go through a centralized API
controlling the size limitation is a lot more straightforward than for
strings.Now that we have covered the HashTable changes, we're only left with
one significant change. The diff for the zend_string structure is as
follows:zend_refcounted gc; - zend_ulong h; /* hash
value */ - int len;
- zend_uint_t h; /* hash value */
- zend_size_t len;
char val[1];This means that a zend_string will on average be approximately 4 bytes
longer. (Approximately because the exact difference is subject to
details of the allocation process and depends on the string length
distribution.)I think that this difference might be acceptable, if this is the cost
of uniform string size usage in the codebase. I tested the impact that
adding 4 bytes on the string structure has on one application and the
result was a change from 38.2 to 38.5 MiB, which translates to a 0.7%
difference. That difference is acceptable to me.However I don't know how representative that result is and I don't
currently have time to set up a testing environment for something like
wordpress on a 64bit vm. I would appreciate it if somebody who already
has such a setup could run a test for this, to confirm that there is
indeed no large difference in memory consumption. Just add a "int
unused" after the len member and compare memory usage.So, what I'm suggesting here is the following:
We implement 64bit integers on LLP64, we implement unsigned sizes, we
implement size_t for strings lengths. I honestly still don't fully
understand the necessity for the latter, but if the impact in phpng is
indeed low (in the 1% range), then I think I can compromise on that.What we don't implement is indiscriminate usage of size_t or other
64bit
types all over the place (like for linenos, argument counts, argument
lengths etc) and usage of 64bit sizes for hashtables.This would be acceptable for me and I hope that it would also be
acceptable for others.While we're at it, I would also like to quickly touch on the topic of
naming and renaming. I hope it is clear now that we need both 64bit
types for some things and 32bit types for others (like HT sizes,
linenos, etc). As such I would suggest to use zend_(u)long_t as the
64bit type (on 64bit
platforms of course) and zend_(u)int_t as the 32bit type. This more or
less matches our current naming and hopefully avoids confusion.With that naming convention, we should also continue using IS_LONG and
Z_LVAL instead of the suggested IS_INT and Z_IVAL. I'd also keep using
Z_STRLEN instead of Z_STRSIZE and wouldn't do changes to zpp, especially
as they conflict with other zpp changes in phpng. I understand why these
changes were beneficial in the original 64bit patch, however
considering that the usage of many APIs changed in phpng and extensions
need to be carefully reviewed anyway, I don't think these renames make a
lot of sense anymore.So, that's my opinion on how we should proceed with the 64bit patch. I
very much hope that we can reach some consensus about this.Credits: This mail is the result of a discussion between Anthony, Bob
and myself.Nikita
I really appreciate you guys have looked inside the str_size_and_int64
patch. Dmitry even has compiled and tested it.While the Nikitas notes sound pretty meaningful, let me ask you a one
simple question before proceeding with any details - how much time you
personally would spend on the integration?Thanks
Anatol
Hi!
Before going any further, it should be established that two aspects of the
64 bit patch are, as far as I can see, acceptable to everyone: The
introduction of 64bit integers on Win64 and other LLP64 architectures and
the use of an unsigned integer type for lengths and sizes.
I would add a third aspect - using a dedicated set of types to signify
those, instead of relying on (potentially unportable) mishmash of
standard ones. This is being overlooked, but I'd like to emphasize that
I actually like the idea very much - with all the disruption that it
brings - as a chance to weed out all these false assumptions and sloppy
types still lurking in the guts of the engine and the extensions. I
wonder if we can also make clang to tell us on such type conversions
even if they match on specific system (I know there are hacks to do this
but they are way uglier than I'd like to use).
As for the rest, I think this is an excellent proposal and I agree with
pretty much all of it. I am still not convinced of the need for 64-bit
string sizes, but if we will be able to deal with much more important
issues - such as hashtables, internal name lengths (function/class),
etc. - I don't care, the difference is too small. I am not sure how the
technical details with zend_string will work out but if we agree that's
what we're doing and try to find the way to make it happen I'd think it
worth a serious try.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227
Hi Nikita,
I really like the way you write because you put realistic approaches when
considering facts. I do miss a lot of this in here.
Like you said, it is completely feasible to add 64bit types where
applicable and not degrade performance that much as some people are flaming
internals@ lately.
As Pierre said on emails earlier on other threads, that initial patch was a
possible initial approach, not the actual final one since lots of tweaks on
memory improvement could/would be required to make it optimized as phpng.
For this exact reason I voted +1 on the initial idea, but not as a final
patch.
Cheers,
On Sat, May 17, 2014 at 5:09 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:
Hi!
Before going any further, it should be established that two aspects of
the
64 bit patch are, as far as I can see, acceptable to everyone: The
introduction of 64bit integers on Win64 and other LLP64 architectures and
the use of an unsigned integer type for lengths and sizes.I would add a third aspect - using a dedicated set of types to signify
those, instead of relying on (potentially unportable) mishmash of
standard ones. This is being overlooked, but I'd like to emphasize that
I actually like the idea very much - with all the disruption that it
brings - as a chance to weed out all these false assumptions and sloppy
types still lurking in the guts of the engine and the extensions. I
wonder if we can also make clang to tell us on such type conversions
even if they match on specific system (I know there are hacks to do this
but they are way uglier than I'd like to use).As for the rest, I think this is an excellent proposal and I agree with
pretty much all of it. I am still not convinced of the need for 64-bit
string sizes, but if we will be able to deal with much more important
issues - such as hashtables, internal name lengths (function/class),
etc. - I don't care, the difference is too small. I am not sure how the
technical details with zend_string will work out but if we agree that's
what we're doing and try to find the way to make it happen I'd think it
worth a serious try.Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227--
--
Guilherme Blanco
MSN: guilhermeblanco@hotmail.com
GTalk: guilhermeblanco
Toronto - ON/Canada
So, that's my opinion on how we should proceed with the 64bit patch. I very
much hope that we can reach some consensus about this.
Thanks for pulling this together Nikita. People have been asking me
about this issue and this was exactly what I told them we would likely
end up with as it is a sane and reasonable implementation of the two
efforts. I would guess an rfc vote between the alternatives, as I see
them, would favour this one by a wide margin.
As for >4G strings, it does seem unlikely, on current hardware, that you
would stick that much data in a variable. You might be able to get one
in there, but if you then do anything with it, the tmp copy is going to
make things fall over pretty quickly. But who knows in 4-5 years maybe
having a TB of ram in servers will be the norm, IO channels have become
much wider and we are manipulating 4K video files directly in PHP so
perhaps you can make a future-proofing argument there.
-Rasmus
So, that's my opinion on how we should proceed with the 64bit patch. I very
much hope that we can reach some consensus about this.Thanks for pulling this together Nikita. People have been asking me
about this issue and this was exactly what I told them we would likely
end up with as it is a sane and reasonable implementation of the two
efforts. I would guess an rfc vote between the alternatives, as I see
them, would favour this one by a wide margin.
It is exactly what we have been advocating during the whole process.
More about that in a mail I'm writing.
As for >4G strings, it does seem unlikely, on current hardware, that you
would stick that much data in a variable. You might be able to get one
in there, but if you then do anything with it, the tmp copy is going to
make things fall over pretty quickly. But who knows in 4-5 years maybe
having a TB of ram in servers will be the norm, IO channels have become
much wider and we are manipulating 4K video files directly in PHP so
perhaps you can make a future-proofing argument there.
The string length is a side effect, not a goal per se, never was. But
variables, buffers dealing with external libraries or functions are
sensible areas. A safe implementation will prevent many issues we
already had, many times. Also correct typing and less (if not none)
magic casting guessed by the compiler will also drastically increase
PHP code quality and safety. These two points, and 64bit integer in a
portable manner, are the top goals of this RFC.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
So, that's my opinion on how we should proceed with the 64bit patch. I very
much hope that we can reach some consensus about this.Thanks for pulling this together Nikita. People have been asking me
about this issue and this was exactly what I told them we would likely
end up with as it is a sane and reasonable implementation of the two
efforts. I would guess an rfc vote between the alternatives, as I see
them, would favour this one by a wide margin.It is exactly what we have been advocating during the whole process.
More about that in a mail I'm writing.As for >4G strings, it does seem unlikely, on current hardware, that you
would stick that much data in a variable. You might be able to get one
in there, but if you then do anything with it, the tmp copy is going to
make things fall over pretty quickly. But who knows in 4-5 years maybe
having a TB of ram in servers will be the norm, IO channels have become
much wider and we are manipulating 4K video files directly in PHP so
perhaps you can make a future-proofing argument there.The string length is a side effect, not a goal per se, never was. But
variables, buffers dealing with external libraries or functions are
sensible areas. A safe implementation will prevent many issues we
already had, many times. Also correct typing and less (if not none)
magic casting guessed by the compiler will also drastically increase
PHP code quality and safety. These two points, and 64bit integer in a
portable manner, are the top goals of this RFC.
You keep talking about side effects of the patch. It is these side
effects that we have a problem with. We need a version of the patch that
doesn't have all these undesired side effects.
You put the patch as it stands up for a vote and it is this patch we
have to vote on which is why I am voting no on it. It isn't some future
version with some unknown number of side effects removed.
-Rasmus
You keep talking about side effects of the patch.
No, what I have to keep saying, because this argument keeps coming, is
that the ability to store >2GB string is a side effect and not a goal
of this patch.
It is these side
effects that we have a problem with. We need a version of the patch that
doesn't have all these undesired side effects.
The only one, and it is disputable, is the memory usage, which is
around 4% under real world usage (WP, Symfony, Drupal, and some other
apps). It even performs better in many common situations.
You put the patch as it stands up for a vote and it is this patch we
have to vote on which is why I am voting no on it. It isn't some future
version with some unknown number of side effects removed.
I am not sure how to formulate it in a better way but let me try again.
- 64bit patch has been worked on and stabilized since months
- It is continuously tested using master, 5.5 and 5.6, using our full
tests suite (which is much more than just running phpt in cli if you
remember) - the patch is stable
Given these points, and as it is the case with every new
features/addition/improvements we have made, tweaks, optimization,
etc. will happen after it gets approved and applied. Stability and
correctness have the priority.
Asking to take care of phpng was hardly possible given its sudden
public availability. However, we know that phpng can bring PHP a lot
and that is why we gave the option to merge the 64bit patch to phpng,
once it is gets somehow usable in common situations (estimation: 1-2
months if not critical bugs exist in common extensions, but can't test
them in a good way as of now).
I will post a mail about what we plan to do and how, this pretty much
matches what Nikita said and what I have been said since the very day
after we proposed the 64 bit patch again. Most likely tonight or
tomorrow morning.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
On Sun, May 18, 2014 at 1:34 PM, Rasmus Lerdorf rasmus@lerdorf.com
wrote:You keep talking about side effects of the patch.
No, what I have to keep saying, because this argument keeps coming, is
that the ability to store >2GB string is a side effect and not a goal
of this patch.It is these side
effects that we have a problem with. We need a version of the patch that
doesn't have all these undesired side effects.The only one, and it is disputable, is the memory usage, which is
around 4% under real world usage (WP, Symfony, Drupal, and some other
apps). It even performs better in many common situations.You put the patch as it stands up for a vote and it is this patch we
have to vote on which is why I am voting no on it. It isn't some future
version with some unknown number of side effects removed.I am not sure how to formulate it in a better way but let me try again.
- 64bit patch has been worked on and stabilized since months
- It is continuously tested using master, 5.5 and 5.6, using our full
tests suite (which is much more than just running phpt in cli if you
remember)- the patch is stable
Given these points, and as it is the case with every new
features/addition/improvements we have made, tweaks, optimization,
etc. will happen after it gets approved and applied. Stability and
correctness have the priority.Asking to take care of phpng was hardly possible given its sudden
public availability. However, we know that phpng can bring PHP a lot
and that is why we gave the option to merge the 64bit patch to phpng,
once it is gets somehow usable in common situations (estimation: 1-2
months if not critical bugs exist in common extensions, but can't test
them in a good way as of now).I will post a mail about what we plan to do and how, this pretty much
matches what Nikita said and what I have been said since the very day
after we proposed the 64 bit patch again. Most likely tonight or
tomorrow morning.Cheers,
Pierre
@pierrejoye | http://www.libgd.org
--
Hi Pierre,
Thanks for the reply, based on your promise, I've voted yes to the RFC.
I think that we could have saved a bunch of arguing back and forth if we
would have sorted out the side-effects first (or at least explicitly state
that you guys are ok with dropping/willing to drop those for measurable
performance gains), but I'm glad that we were able to reach an agreement
eventually.
--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
You keep talking about side effects of the patch.
No, what I have to keep saying, because this argument keeps coming, is
that the ability to store >2GB string is a side effect and not a goal
of this patch.It is these side
effects that we have a problem with. We need a version of the patch that
doesn't have all these undesired side effects.The only one, and it is disputable, is the memory usage, which is
around 4% under real world usage (WP, Symfony, Drupal, and some other
apps). It even performs better in many common situations.You put the patch as it stands up for a vote and it is this patch we
have to vote on which is why I am voting no on it. It isn't some future
version with some unknown number of side effects removed.I am not sure how to formulate it in a better way but let me try again.
- 64bit patch has been worked on and stabilized since months
- It is continuously tested using master, 5.5 and 5.6, using our full
tests suite (which is much more than just running phpt in cli if you
remember)- the patch is stable
Given these points, and as it is the case with every new
features/addition/improvements we have made, tweaks, optimization,
etc. will happen after it gets approved and applied. Stability and
correctness have the priority.
But that is for minor tweaks and optimizations. In this case the way to
optimize the patch is to undo the 64-bitness in a number of places where
it doesn't make sense. Putting in a software-imposed limit on class size
names while still keeping it a 64-bit value in the struct makes no
sense, for example. Same goes for lineno, line_start, line_end, num_args
and a couple more that Nikita pointed out.
And as far as I am concerned this has nothing to do with phpng. I'd
still be voting no on it as a 4% memory increase, which, by the way, you
don't even mention in the impacts section, is still too high for me when
I know parts of the 4% are completely unnecessary.
-Rasmus
But that is for minor tweaks and optimizations. In this case the way to
optimize the patch is to undo the 64-bitness in a number of places where
it doesn't make sense. Putting in a software-imposed limit on class size
names while still keeping it a 64-bit value in the struct makes no
sense, for example. Same goes for lineno, line_start, line_end, num_args
and a couple more that Nikita pointed out.
That's not what we discussed.
And as far as I am concerned this has nothing to do with phpng. I'd
still be voting no on it as a 4% memory increase, which, by the way, you
don't even mention in the impacts section, is still too high for me when
I know parts of the 4% are completely unnecessary.
We answer that already, be from Nikita, Dmitry or I. And yes, we agree
on these points already.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
But that is for minor tweaks and optimizations. In this case the way to
optimize the patch is to undo the 64-bitness in a number of places where
it doesn't make sense. Putting in a software-imposed limit on class size
names while still keeping it a 64-bit value in the struct makes no
sense, for example. Same goes for lineno, line_start, line_end, num_args
and a couple more that Nikita pointed out.That's not what we discussed.
And as far as I am concerned this has nothing to do with phpng. I'd
still be voting no on it as a 4% memory increase, which, by the way, you
don't even mention in the impacts section, is still too high for me when
I know parts of the 4% are completely unnecessary.We answer that already, be from Nikita, Dmitry or I. And yes, we agree
on these points already.
Ok, then please update the RFC with what you see as the way forward,
including adding actual memory impacts to it and restart the vote when
the RFC is ready.
-Rasmus
Ok, then please update the RFC with what you see as the way forward,
including adding actual memory impacts to it and restart the vote when
the RFC is ready.
Asking us, after months of discussions, work, tests, publications,
etc. to do change that could be done post acceptation easily, as it is
always the case for any change affecting more than a couple of LoC.
And with which guarantee? So far none, even the contrary, all
possible ways are used too block this RFC, while we keep trying to
find compromises and keep helping phpng moving forward in the
meantime. I hope you can understand that I have hard time to justify
more efforts on this patch at this stage, we are not an endless free
resource.
At some point trust, compromise and common sense must be followed. And
we are at this point here. We said numerous time on this list and IRC
what are the goals and what are the possibilities of improving both
the 64bit patch and phpng in the coming months/years. Let end this
unproductive debate and let us back to code, for the good of PHP.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org
On Mon, May 19, 2014 at 6:48 PM, Rasmus Lerdorf rasmus@lerdorf.com
wrote:But that is for minor tweaks and optimizations. In this case the way
to optimize the patch is to undo the 64-bitness in a number of places
where it doesn't make sense. Putting in a software-imposed limit on
class size names while still keeping it a 64-bit value in the struct
makes no sense, for example. Same goes for lineno, line_start,
line_end, num_args and a couple more that Nikita pointed out.That's not what we discussed.
And as far as I am concerned this has nothing to do with phpng. I'd
still be voting no on it as a 4% memory increase, which, by the way,
you don't even mention in the impacts section, is still too high for
me when I know parts of the 4% are completely unnecessary.We answer that already, be from Nikita, Dmitry or I. And yes, we agree
on these points already.Ok, then please update the RFC with what you see as the way forward,
including adding actual memory impacts to it and restart the vote when the
RFC is ready.-Rasmus
Confused about what is happening. I thought we reached the agreement based
on what Nikita has suggested, which is pretty simple.
The points of the current patch which have to stay
- 64 bit int in zval(no mem issue, no perf issue)
- size_t in zend_string (.8% mem issue, no perf issue)
- use 32 bit string length as much as possible everywhere else
for myself i'd add also
- several ini options (like content, post, etc length, memory_limit and
several other) - file cursor and stream positions (off_t family)
That doesn't affect mem consumption much as Dmitry already mentioned.
Everything else, say
literal
ini
stack
header
lineno
class names in structs
path length in structs
hash
etc.
has to be adjusted with 32 bit string length while merging with the phpng
branch.
So question - why there are voices like "you have to implement it for
phpng and rewrite the RFC"? If there's the consent to take this RFC with
regard to improvement suggestion listed in short above, so what is wrong
again? Wasn't it said we do it collectively?
Have a nice day.
Anatol
On Mon, May 19, 2014 at 6:48 PM, Rasmus Lerdorf rasmus@lerdorf.com
wrote:But that is for minor tweaks and optimizations. In this case the way
to optimize the patch is to undo the 64-bitness in a number of places
where it doesn't make sense. Putting in a software-imposed limit on
class size names while still keeping it a 64-bit value in the struct
makes no sense, for example. Same goes for lineno, line_start,
line_end, num_args and a couple more that Nikita pointed out.That's not what we discussed.
And as far as I am concerned this has nothing to do with phpng. I'd
still be voting no on it as a 4% memory increase, which, by the way,
you don't even mention in the impacts section, is still too high for
me when I know parts of the 4% are completely unnecessary.We answer that already, be from Nikita, Dmitry or I. And yes, we agree
on these points already.Ok, then please update the RFC with what you see as the way forward,
including adding actual memory impacts to it and restart the vote when the
RFC is ready.-Rasmus
Confused about what is happening. I thought we reached the agreement based
on what Nikita has suggested, which is pretty simple.The points of the current patch which have to stay
- 64 bit int in zval(no mem issue, no perf issue)
- size_t in zend_string (.8% mem issue, no perf issue)
- use 32 bit string length as much as possible everywhere else
for myself i'd add also
- several ini options (like content, post, etc length, memory_limit and
several other)- file cursor and stream positions (off_t family)
That doesn't affect mem consumption much as Dmitry already mentioned.Everything else, say
literal
ini
stack
header
lineno
class names in structs
path length in structs
hash
etc.has to be adjusted with 32 bit string length while merging with the phpng
branch.So question - why there are voices like "you have to implement it for
phpng and rewrite the RFC"? If there's the consent to take this RFC with
regard to improvement suggestion listed in short above, so what is wrong
again? Wasn't it said we do it collectively?
I am fine saying we have a consensus on doing it this way, but it is
quite different from what the current RFC says. At least make a note of
this in the RFC then since many people use accepted RFCs as a resource
later on to figure out what has changed.
-Rasmus
Confused about what is happening. I thought we reached the agreement
based on what Nikita has suggested, which is pretty simple.
I'm confused too, mostly because with the ML problems, I missed a lot of
the discussion. I think it would be best to close the current RFC and
voting, make sure there is a new "correct" RFC and restart the vote. If
all the concerns that are raised by various people have been included,
then, I think that's a much clearer way to figure out what we'd be
voting for.
cheers,
Derick
--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine
Derick Rethans derick@php.net schrieb:
Confused about what is happening. I thought we reached the agreement
based on what Nikita has suggested, which is pretty simple.I'm confused too, mostly because with the ML problems, I missed a lot of
the discussion. I think it would be best to close the current RFC and
voting, make sure there is a new "correct" RFC and restart the vote. If
all the concerns that are raised by various people have been included,
then, I think that's a much clearer way to figure out what we'd be
voting for.
I agree. It is rather confusing and I have a hard time keeping up with what
the current agreement is. Can we please stop the votes on the current RFC
work on it and vote again. I might want to remind people that it is a
Request for Comments. Comments we got and now it's time to integrate them
before rushing into a vote, which is why I haven't voted neither yes or no
so far. However I cannot vote "yes" on the current state of the RFC.
Confused about what is happening. I thought we reached the agreement
based on what Nikita has suggested, which is pretty simple.I'm confused too, mostly because with the ML problems, I missed a lot of
the discussion. I think it would be best to close the current RFC and
voting, make sure there is a new "correct" RFC and restart the vote. If
all the concerns that are raised by various people have been included,
then, I think that's a much clearer way to figure out what we'd be
voting for.
+1
I think we have pretty much consensus but not around the current RFC text.
Confused about what is happening. I thought we reached the agreement
based on what Nikita has suggested, which is pretty simple.I'm confused too, mostly because with the ML problems, I missed a lot
of the discussion. I think it would be best to close the current RFC
and voting, make sure there is a new "correct" RFC and restart the
vote. If all the concerns that are raised by various people have been
included, then, I think that's a much clearer way to figure out what
we'd be voting for.
Did anything happen with this?
cheers,
Derick
--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine
Did anything happen with this?
Looks like it passed, 24 votes in favor to 21 against (53%). Voting
expired on May 26th.
--Kris
Confused about what is happening. I thought we reached the agreement
based on what Nikita has suggested, which is pretty simple.I'm confused too, mostly because with the ML problems, I missed a lot
of the discussion. I think it would be best to close the current RFC
and voting, make sure there is a new "correct" RFC and restart the
vote. If all the concerns that are raised by various people have been
included, then, I think that's a much clearer way to figure out what
we'd be voting for.Did anything happen with this?
Waiting that the ML is fixed to push an updated rfc and patch.
Confused about what is happening. I thought we reached the
agreement based on what Nikita has suggested, which is pretty
simple.I'm confused too, mostly because with the ML problems, I missed a
lot of the discussion. I think it would be best to close the
current RFC and voting, make sure there is a new "correct" RFC and
restart the vote. If all the concerns that are raised by various
people have been included, then, I think that's a much clearer way
to figure out what we'd be voting for.Did anything happen with this?
Waiting that the ML is fixed to push an updated rfc and patch.
That makes sense :-) I pinged Wez, lets see if he can help fixing the
ML.
cheers,
Derick
--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine
Waiting that the ML is fixed to push an updated rfc and patch.
That makes sense :-) I pinged Wez, lets see if he can help fixing the
ML.
Is it the ML or the MTA? I just got an RDNS failure back from a test
message I sent a few days back.
--
Regards,
Mike
Waiting that the ML is fixed to push an updated rfc and patch.
That makes sense :-) I pinged Wez, lets see if he can help fixing the
ML.Is it the ML or the MTA? I just got an RDNS failure back from a test
message I sent a few days back.
Not sure if it's ML or MTA—sorry.
cheers,
Derick
--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine
Waiting that the ML is fixed to push an updated rfc and patch.
That makes sense :-) I pinged Wez, lets see if he can help fixing the
ML.Is it the ML or the MTA? I just got an RDNS failure back from a test
message I sent a few days back.Not sure if it's ML or MTA—sorry.
cheers,
Derick--
http://derickrethans.nl | http://xdebug.org
Like Xdebug? Consider a donation: http://xdebug.org/donate.php
twitter: @derickr and @xdebug
Posted with an email client that doesn't mangle email: alpine--
the last info was that the mailing list on lists.php.net tries a local
delivery for @php.net addresses instead of sending it back to the php.netMX.
--
Ferenc Kovács
@Tyr43l - http://tyrael.hu
You keep talking about side effects of the patch. It is these side
effects that we have a problem with. We need a version of the patch that
doesn't have all these undesired side effects.You put the patch as it stands up for a vote and it is this patch we
have to vote on which is why I am voting no on it. It isn't some future
version with some unknown number of side effects removed.
Well put. In the strangeness of voting on an implementation RFC, I
don't see how it makes sense to talk about "side effects". They're
the very essence of the patch and the reason it's being opposed.
If there's a better way to so it, let's see it. The sensible thing to
do would be implementing it for phpng, seeing a final version of the
patch and it's consequences, and then making an informed decision.
Even without it, Nikita's proposal seems as a much better baseline
everyone can agree on.
Zeev
Well put. In the strangeness of voting on an implementation RFC, I
don't see how it makes sense to talk about "side effects". They're
the very essence of the patch and the reason it's being opposed.If there's a better way to so it, let's see it. The sensible thing to
do would be implementing it for phpng, seeing a final version of the
patch and it's consequences, and then making an informed decision.
If I get your point correctly, you are asking to provide the final
version of a patch against phpng, which is not even in alpha stage,
mvoing target and APIs, etc. That's not possible.
Even without it, Nikita's proposal seems as a much better baseline
everyone can agree on.
It would help if you actually read replies instead of systematically
negate this proposal and how things are supposed to work here. Let
alone the calls for -1 based on wrong numbers and facts. This is wrong
in so many ways.
Now, what would have happened if we did what you are saying for
opcache? It would most likely not be in the voting phase if we
consider all supported platforms or changes happening since the votes
began. It would be time to be a little bit serious and consistent if
you really want to agree on something.
Cheers,
Pierre
@pierrejoye | http://www.libgd.org