Hi there,
I didn't know whether to make an official bug report or send to the list --
so I'm trying this first. :-)
var_dump(array_count_values(array(' 001', 1, ' 1 ', '1')));
Expected result (and what PHP 4 gives):
array(3) {
[" 001"]=>
int(1)
[1]=>
int(2)
[" 1 "]=>
int(1)
}
Actual result:
array(2) {
[1]=>
int(3)
[" 1 "]=>
int(1)
}
Can this patch please be applied for 5.2's release? To maintain what I'm
sure is the correct behavior from PHP 4, the function needs to use the
zend_[u_]symtable* functions, which take care of correctly handling
numeric strings, as zend_hash_[find|update] did in 4.x. If they would've
been used in 5+ in the first place, there wouldn't have been bugs like
#34723 (still present with leading whitespace, as you can see), #30833,
#29808, etc. (I guess that's all, actually). :-)
The updated code should be faster, too...
http://realplain.com/php/array_count_values_bug.diff
http://realplain.com/php/array_count_values_bug_5_2.diff
Thanks,
Matt
Hello,
Hi there,
I didn't know whether to make an official bug report or send to the list --
so I'm trying this first. :-)
You already tried in the previous thread about is_numeric.
var_dump(array_count_values(array(' 001', 1, ' 1 ', '1')));
Expected result (and what PHP 4 gives):
array(3) {
[" 001"]=>
int(1)
[1]=>
int(2)
[" 1 "]=>
int(1)
}Actual result:
array(2) {
[1]=>
int(3)
[" 1 "]=>
int(1)
}
Can this patch please be applied for 5.2's release? To maintain what I'm
sure is the correct behavior from PHP 4, the function needs to use the
zend_[u_]symtable* functions, which take care of correctly handling
numeric strings, as zend_hash_[find|update] did in 4.x. If they would've
been used in 5+ in the first place, there wouldn't have been bugs like
#34723 (still present with leading whitespace, as you can see), #30833,
#29808, etc. (I guess that's all, actually). :-)
The updated code should be faster, too...
As I said in the previous discussion (why in the world have you
splitted it out?), I don't think it is reasonable or safe to apply
this patch in 5.2. Secondly it seems to wrong address the problem. You
work around the actual is_numeric "problems" by suppressing its usage.
Doing so reintroduce the numeric string keys problem (what addresses
the removed code).
http://realplain.com/php/array_count_values_bug.diff
http://realplain.com/php/array_count_values_bug_5_2.diff
I think it is much more safer to first determine the general policy
for is_numeric and then address this problem. It is too late in the
game to change such behaviors in 5.x (I care little about 4.x
compatibility).
Cheers,
--Pierre
Hi Pierre,
----- Original Message -----
From: "Pierre"
Sent: Sunday, August 13, 2006
Hello,
Hi there,
I didn't know whether to make an official bug report or send to the
list --
so I'm trying this first. :-)You already tried in the previous thread about is_numeric.
I was [originally] just mentioning it as an aside, since it uses
is_numeric_string(). But the "is_numeric..." subject isn't right IMO for
this separate problem (even though it uses is_numeric...). :-) And I
wouldn't send a patch for the array function in the other thread. Oh, and
finally, when I first mentioned it there I had just discovered it, and
didn't realize the difference with 4.x. Now I know the behavior is
wrong -- the fact that it was changed in 5 is the bug which caused the
previous reports.
var_dump(array_count_values(array(' 001', 1, ' 1 ', '1')));
Expected result (and what PHP 4 gives):
array(3) {
[" 001"]=>
int(1)
[1]=>
int(2)
[" 1 "]=>
int(1)
}Actual result:
array(2) {
[1]=>
int(3)
[" 1 "]=>
int(1)
}
Can this patch please be applied for 5.2's release? To maintain what
I'm
sure is the correct behavior from PHP 4, the function needs to use the
zend_[u_]symtable* functions, which take care of correctly handling
numeric strings, as zend_hash_[find|update] did in 4.x. If they
would've
been used in 5+ in the first place, there wouldn't have been bugs like
#34723 (still present with leading whitespace, as you can see), #30833,
#29808, etc. (I guess that's all, actually). :-)
The updated code should be faster, too...As I said in the previous discussion (why in the world have you
splitted it out?), I don't think it is reasonable or safe to apply
this patch in 5.2. Secondly it seems to wrong address the problem. You
work around the actual is_numeric "problems" by suppressing its usage.
Not reasonable or safe? Then why were the other bugs fixed instead of being
marked "Bogus" if array_count_values()
was supposed to behave differently
in 5? Because it's not supposed to! There should never have been a
change -- just simply changing the word "hash" to "symtable". That's it.
Functionality would've remained the same, as it should. But for some
reason, this wasn't realized, and instead of trying to figure out how the
function worked as it did in 4.x, this hack with is_numeric()
was added,
which still doesn't work correctly. (Also makes things a good amount
slower, which I verified since the last e-mail.)
And of course I'm suppressing is_numeric's usage, because it shouldn't be
there. You see that leading 0's are still being lost if preceded by
whitespace?
Doing so reintroduce the numeric string keys problem (what addresses
the removed code).
No, why would it?! :-) 4.x didn't have the problem. With the updated code,
the HANDLE_NUMERIC() macro (in symtable functions) will take care of true
numeric string keys (which is exactly what happened in 4.x), just as it
does with $array['123'] -- is_numeric isn't used there because it would
screw up stuff like it is in array_count_values()
.
I think it is much more safer to first determine the general policy
for is_numeric and then address this problem. It is too late in the
game to change such behaviors in 5.x (I care little about 4.x
compatibility).
There is nothing unsafe about this change, as it will simply be back to
4.x's bug-free behavior. Run the existing array_count_values()
tests and
they'll pass. That's what they're there for, right? This has nothing to do
with "4.x compatibility," but correct operation, which happened to be there
in 4.x. :-)
is_numeric should have nothing to do with this, because leading whitespace
can't be ignored, just like it isn't with $array[' 123']. HANDLE_NUMERIC()
does everything that's trying to be done now with "hacks" and then some (and
much faster and more efficiently).
Just compare my updated 5.2 code to 4.x's and you'll see it's the same
(except for layout changes, etc. and my using the ZVAL_LONG() macro in 2
places :-)) other than the word "symtable" instead of "hash". Then, compare
5.2's "symtable" functions to 4.x's "hash" ones, and you'll see that their
end result is exactly the same.
Cheers,
--Pierre
Matt
Hi,
Not reasonable or safe? Then why were the other bugs fixed instead of being
marked "Bogus" ifarray_count_values()
was supposed to behave differently
in 5?
Safe as it will not break working php5 code. Changing this behavior
now can break many things out there (for php 5 users, php4 users are
out of my worries for this problem). What I mean is that we should
first check the impact and see why this change has been done (I
remember something like a valid reason, but did not get the time to
verify).
In short, it is just what I said in the previous discussion, such
changes must be done carefully and with a huge amount of test cases
(and stick to them). I'm not choosing one solution or another, only
saying that the decision should be taken carefully and generally.
--Pierre
Hi Pierre,
sigh I'm thinking I better go through the official channels (Bug report)
then and see what happens in case nobody else really sees this here. (Since
that's what a regular user would do. :-) Although after seeing the repeated
disregard for Bug #37846, I don't know.) Have to see now that it's
Monday... More below.
----- Original Message -----
From: "Pierre"
Sent: Sunday, August 13, 2006
Hi,
Not reasonable or safe? Then why were the other bugs fixed instead of
being
marked "Bogus" ifarray_count_values()
was supposed to behave
differently
in 5?Safe as it will not break working php5 code. Changing this behavior
now can break many things out there (for php 5 users, php4 users are
out of my worries for this problem). What I mean is that we should
first check the impact and see why this change has been done (I
remember something like a valid reason, but did not get the time to
verify).In short, it is just what I said in the previous discussion, such
changes must be done carefully and with a huge amount of test cases
(and stick to them). I'm not choosing one solution or another, only
saying that the decision should be taken carefully and generally.
This "breaking things for PHP 5 users" concern doesn't make sense to me --
for normal changes, certainly, but for bug fixes? I didn't know there was
so much concern when fixing bugs that it may break things for someone
relying on a bug. :-/ And you're also saying that you don't care if code
from PHP 4 no longer works the same? Again, I point to the other bugs that
were fixed, which were accidentally caused by PHP 5's internal function
changes. I seriously doubt there was a valid reason for the behavior
change (after checking CVS changes; see below), and it should be documented
in the manual if there was (could've been forgotten, but still).
After a quick check of
http://cvs.php.net/viewvc.cgi/php-src/ext/standard/array.c?view=log I found
many examples (with other changes) to support my idea that this is all a
mistake of not doing a global search and replace of zend_hash_[find|update]
with zend_symtable_* when the "symtable" functions were added. I'm shocked
that with such concern over not breaking things that every use of the
"hash" functions wasn't carefully checked to see what would break by
removing the HANDLE_NUMERIC() macro and moving it to the "symtable"
functions. Oh, actually I don't think that would've been needed with a
simple search and replace in all files. :-) I guess somebody didn't
realize the differences that I did when I first looked at them a couple
months ago...
Now, let's look at what I found in the changelog for array.c:
*) 1.231 (Jun 5 '03): "fix array_key_exists()
from HANDLE_NUMERIC() changes"
(Came up with a "hack," zend_is_numeric_key(), before eventually changing to
the correct symtable function); and "need to go through this file and find
any stuff that relies on this change" (Obviously we know at least one place
that's still been missed until now ;-))
) 1.235 (Jul 22 '03): "Use the new infrastructure of zend_symtable_()"
(Oh, there's the correct array_key_exists()
fix!)
*) 1.237 (Aug 4 '03): "Fix bug #24652 - Sterling, do you begin to think that
maybe it wasn't such a good idea?" (Another switch to the correct
"symtable" function for array_flip()
. And no, the idea was fine, just
should've done a search and replace.)
*) 1.273 (Aug 26 '04): "Fixed bug #29808 (array_count_values() breaks with
numeric strings)" (First of the wrong fixes (is_numeric_string()) that were
unfortunately not realized like the others (that eventually used
"symtable"). This was for 5.0.2/5.1, so must not have been intentional
(changing in 5.0) or with good reason.)
*) 1.297 (Apr 12 '05): "fix #30833 (array_count_values modifying input
array)" (More unnecessary code added as a result of not fixing the root of
the problem ("hash" instead of "symtable"), which is the majority of what my
patch removes.)
*) 1.327 (Oct 4 '05): "fix #34723 (array_count_values() strips leading
zeroes)" (The last of the mess. And still didn't fix the problem when
leading zeros are preceded by whitespace.)
Seems pretty obvious...
--Pierre
Matt
Hello,
Seems pretty obvious...
It is pretty obvious that we miss the cases, informations and feedback
to take a final decisions. What this changelog tells me is that we
already messed too much with this part. With the risk to repeat myself
a last time, I do not think we should change again this behavior, not
now and not only this one. We need to have all required info and tests
before (read my previous mails for the details :) ).
--Pierre
Not reasonable or safe? Then why were the other bugs fixed instead
of being
marked "Bogus" ifarray_count_values()
was supposed to behave
differently
in 5?Safe as it will not break working php5 code. Changing this behavior
now can break many things out there (for php 5 users, php4 users are
out of my worries for this problem). What I mean is that we should
first check the impact and see why this change has been done (I
remember something like a valid reason, but did not get the time to
verify).In short, it is just what I said in the previous discussion, such
changes must be done carefully and with a huge amount of test cases
(and stick to them). I'm not choosing one solution or another, only
saying that the decision should be taken carefully and generally.
While I agree the change should be carefully reviewed, I must wonder
why the millions upon millions of lines of code in 4.x are going to be
ignored here...
What rationale is there for breaking the BC?
Obviously you think it should be broken -- I just don't understand why...
--
Like Music?
http://l-i-e.com/artists.htm
Hello,
While I agree the change should be carefully reviewed, I must wonder
why the millions upon millions of lines of code in 4.x are going to be
ignored here...What rationale is there for breaking the BC?
Obviously you think it should be broken -- I just don't understand why...
I did not say it should be broken. All I said (and confirmed by
Andrei) is that we should first sit down and figure out what, how and
when we can/should change something. It seems to me as the sanest
attitude given the insane amount of senseless changes.
There is no need to push a patch just to bring another entry to this
changelog. As I already suggested, we can work together on a proposal
and bring it to a conclusion.
--Pierre
Hi Pierre,
I will go ahead and enter a Bug report for this in a bit ('cause there's
definitely a bug), just so it's in the official place (preferred, I guess?)
and doesn't keep going here on the list. :-)
----- Original Message -----
From: "Pierre"
Sent: Monday, August 14, 2006
Hello,
While I agree the change should be carefully reviewed, I must wonder
why the millions upon millions of lines of code in 4.x are going to be
ignored here...What rationale is there for breaking the BC?
Obviously you think it should be broken -- I just don't understand
why...I did not say it should be broken. All I said (and confirmed by
Andrei) is that we should first sit down and figure out what, how and
when we can/should change something. It seems to me as the sanest
attitude given the insane amount of senseless changes.There is no need to push a patch just to bring another entry to this
changelog. As I already suggested, we can work together on a proposal
and bring it to a conclusion.
Just to be sure, you are just talking about the array_count_values()
function and not the is_numeric... stuff? Since the array function bug has
nothing to do with is_numeric -- since as I've said, it shouldn't be there.
Andrei confirmed about changing (or, not) array_count_values or
is_numeric...? I only saw the list e-mail about is_numeric...
This discussion is unbelievable to me, with being concerned about changing
clearly buggy behavior. Related to what I asked previously, will you guys
"un-fix" any bug that someone says broke their code by being fixed? In that
case, why fix any (non-security) bugs, since doing so can obviously change
existing PHP code?
Again, if it wasn't clear previously, YOU (not Pierre! :-)) caused the bug 2
years ago in file version 1.273 (PHP 5.0.2) by improperly fixing another
bug! Not 3 months before that, the changelog comment stated that you "need
to go through this file and find any stuff that relies on this change" (abou
t "hash" function behavior moving to "symtable"). Very simple to do, and it
would've prevented these other bug reports (and this thread...).
No matter what is done (or nothing), the function will behave differently
than either 4.0-5.0.1, or 5.0.2-present. No, this bug didn't exist before
5.0.2 (though the other ones did in 5.0.0).
Why do you guys think the HANDLE_NUMERIC() macro, that's used in "symtable"
functions, is named what it is? Because it does what the name suggests
perfectly/error-free.
Here is even more stuff that's getting screwed up (signs):
var_dump(array_count_values(array('-000', '-01', '+123')));
array(3) {
[0]=>
int(1)
[-1]=>
int(1)
[123]=>
int(1)
}
So the current mess of code, DOES handle 1) leading 0's; DOESN'T handle 1)
leading whitespace, 2) leading "-" with one or more 0's, 3) leading "+"
"symtable" functions with HANDLE_NUMERIC(), DOES handle... Oh wow,
everything! :-)
Out of array keys/indexes, array_combine, array_flip, array_fill_keys,
array_key_exists, and array_count_values, this is the only one that handles
these values differently -- WHY?! (Oh, I can answer: the others use the
"symtable" functions instead of is_numeric... + other crap. Geez.)
--Pierre
Matt
Hello,
Hi Pierre,
I will go ahead and enter a Bug report for this in a bit ('cause there's
definitely a bug), just so it's in the official place (preferred, I guess?)
and doesn't keep going here on the list. :-)
Please don't. I have no more ways or words to explain you what needs
to be done.... Please ask me if you still don't get it.
----- Original Message -----
From: "Pierre"
Sent: Monday, August 14, 2006Hello,
While I agree the change should be carefully reviewed, I must wonder
why the millions upon millions of lines of code in 4.x are going to be
ignored here...What rationale is there for breaking the BC?
Obviously you think it should be broken -- I just don't understand
why...I did not say it should be broken. All I said (and confirmed by
Andrei) is that we should first sit down and figure out what, how and
when we can/should change something. It seems to me as the sanest
attitude given the insane amount of senseless changes.There is no need to push a patch just to bring another entry to this
changelog. As I already suggested, we can work together on a proposal
and bring it to a conclusion.
Just to be sure, you are just talking about the
array_count_values()
function and not the is_numeric... stuff?
No, I'm talking about all cases :-). This function or the array
keys in general is only one issue, there is some more.
Which part of "We need to sit down and come up with a proposal?" did
you not understand? seriously?
--Pierre
Hi Pierre,
----- Original Message -----
From: "Pierre"
Sent: Tuesday, August 15, 2006
Hello,
Hi Pierre,
I will go ahead and enter a Bug report for this in a bit ('cause there's
definitely a bug), just so it's in the official place (preferred, I
guess?)
and doesn't keep going here on the list. :-)Please don't. I have no more ways or words to explain you what needs
to be done.... Please ask me if you still don't get it.
I was actually thinking the same thing. ;-) Don't what, don't enter a Bug
report? Sorry, I already had awhile before getting your reply (but you've
probably seen it). I didn't know it was bad to record the bug there -- so
anyone affected could find it easier, or it's noted in the PHP changelog,
assuming it's eventually fixed.
----- Original Message -----
From: "Pierre"
Sent: Monday, August 14, 2006Hello,
...
I did not say it should be broken. All I said (and confirmed by
Andrei) is that we should first sit down and figure out what, how and
when we can/should change something. It seems to me as the sanest
attitude given the insane amount of senseless changes.There is no need to push a patch just to bring another entry to this
changelog. As I already suggested, we can work together on a proposal
and bring it to a conclusion.Just to be sure, you are just talking about the
array_count_values()
function and not the is_numeric... stuff?No, I'm talking about all cases :-). This function or the array
keys in general is only one issue, there is some more.
But I've only been talking about this one specific array_count_values bug in
this thread... As I've said, I haven't seen "some more" places where this
specific behavior exists.
Which part of "We need to sit down and come up with a proposal?" did
you not understand? seriously?
Nothing. :-) But the patch is all I know of, so that was my proposal.
Maybe with the "sitting down," "if" to change this function would be the
decision, but not "what" to change, as I only see 2 options: Do nothing; or
fix the bug, which if there's a change, I'm pretty sure the behavior my
patch provides will be the final result. I hope you guys DO see that
there's a bug, and are simply wanting to be careful with changes.
BTW, I think my Bug report (#38464) shows the problem well (I mean, better
than my previous mails). I use this function to count words from text, and
other than it being slower than it could with my patch, the leading spaces +
numbers issue wouldn't really affect me (since I split on spaces), BUT after
also realizing that a leading sign is also an issue, then it obviously could
affect my code (if signs are allowed as part of a "word"), which is a
problem, as the "word" characters have been modified. :-(
Hmm, I haven't tried it, but I wonder if a userland foreach loop would be
faster than using the function (and it would work correctly), unlike in PHP
--Pierre
Matt
Hello Matt,
Which part of "We need to sit down and come up with a proposal?" did
you not understand? seriously?Nothing. :-) But the patch is all I know of, so that was my proposal.
Maybe with the "sitting down," "if" to change this function would be the
decision, but not "what" to change, as I only see 2 options: Do nothing; or
fix the bug, which if there's a change, I'm pretty sure the behavior my
patch provides will be the final result. I hope you guys DO see that
there's a bug, and are simply wanting to be careful with changes.
We are runing in circles, I do not have the times for that sorry. I
answered many times the questions regarding your hope, explaining the
best way to take to solve this specific issue and all other related to
numeric strings. Andrei did too. have nothing else to say for now :)
I will start the draft next week, let me know if you like to help.
--Pierre
Hi Pierre,
----- Original Message -----
From: "Pierre"
Sent: Wednesday, August 16, 2006
Hello Matt,
Which part of "We need to sit down and come up with a proposal?" did
you not understand? seriously?Nothing. :-) But the patch is all I know of, so that was my proposal.
Maybe with the "sitting down," "if" to change this function would be the
decision, but not "what" to change, as I only see 2 options: Do nothing;
or
fix the bug, which if there's a change, I'm pretty sure the behavior my
patch provides will be the final result. I hope you guys DO see that
there's a bug, and are simply wanting to be careful with changes.We are runing in circles, I do not have the times for that sorry. I
I know, I'm sorry! :-)
answered many times the questions regarding your hope, explaining the
best way to take to solve this specific issue and all other related to
numeric strings. Andrei did too. have nothing else to say for now :)I will start the draft next week, let me know if you like to help.
Sure, I could, but I'm not sure how much I'd help. :-/ If it's on the list,
I can offer comments or such regarding anything I think (if this is mostly
is_numeric* stuff you will draft) -- I thought I'd have a quick commented
example of what I thought of for is_numeric* by now, but didn't get to it
yet. I wanted to reply to your message from last week in the is_numeric*
thread...
One last thing about the array function I've not mentioned... ('Course I
realized after the last mail -- I tried this the other day, but it was my
patched code (oops), so it didn't mess up.) On 32-bit, using string value
of 2^32-1:
$a['2147483647']=1;
$b=array_count_values(array('2147483647'));
var_dump($a, $b);
array(1) {
["2147483647"]=>
int(1)
}
array(1) {
[2147483647]=>
int(1)
}
Again, it's behaving differently by converting that value to number. (Note:
To be fair, I'd say in this case, it's a tiny bug in regular array key
handling ("symtable" functions), but I'll keep that for later -- super
simple to fix, I think. And safe. :-))
--Pierre
Don't need to waste time replying to this. ;-) Thanks for your time,
Matt
BC has already been broken, and while BC breaks are OK in major
version changes (4 -> 5), they are not in minor version changes... and
if you change this behaviour again you are not "undoing" a BC break
but you are breaking it again.
Regards,
Stefan
Hi Stefan,
----- Original Message -----
From: "Stefan Walk"
Sent: Tuesday, August 15, 2006
BC has already been broken
Yes, and forget BC, it's a bug.
and while BC breaks are OK in major
version changes (4 -> 5)
OK, but 1) this wasn't intentionally changed, and 2) it wasn't in a major
version.
they are not in minor version changes...
This was really bad then since they "changed it" (broke it) in 5.0.2. And
they fixed other bugs (or "broke BC") in the same version.
and
if you change this behaviour again you are not "undoing" a BC break
Yeah, you're putting it back to how it was in 5.0.0 and 5.0.1 (and 4.x of
course), which is how it should be.
but you are breaking it again.
No. And they did more in 5.0.2 anyway.
Regards,
Stefan
Matt
Ilia,
Just wanted to thank you and anyone else for applying my patch and closing
Bug #38464. :-)
BTW, not a big deal, but in the NEWS entry about this, could someone replace
my e-mail address with "Matthew Wilmas" instead? So it's the same as 2
other places where Marcus put my name... ;-)
Thanks,
Matt
----- Original Message -----
From: "Matt W"
Sent: Sunday, August 13, 2006
[...]
http://realplain.com/php/array_count_values_bug.diff
http://realplain.com/php/array_count_values_bug_5_2.diff