Hello all.
I would like to propose a replacement for current zend_u_strtod() implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diff
According to my tests, new implementation is faster in about 40 (forty) times.
The simple script below takes ~1 sec to run with the patch and ~40 seconds without.
<?php
$a = array(
"0.1",
"1.",
"1.0",
"23423423.234234",
"0.00000E-2",
"0.000002E+3",
"121231312.1111",
"000.11111111",
);
$start = microtime(true);
for ($i = 0; $i<100000; $i++) {
foreach ($a as $d) { $double = (double)$d; }
}
var_dump(microtime(true) - $start);
?>
The only question here is which locale to use for number parsing/formatting.
I used "en_US_POSIX" and it doesn't seem to create any new problems, though I'd like to hear your comments.
--
Wbr,
Antony Dovgal
Hi Antony,
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006
Hello all.
I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diffAccording to my tests, new implementation is faster in about 40 (forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40 seconds
without.
Cool. :-) I'd just been thinking about zend_u_strtod() again -- if you see
my thread asking Andrei about the Unicode characters allowed as numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it was just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diff
BTW, your version would actually be 4000% faster, no? :-O
The only question here is which locale to use for number
parsing/formatting.
I used "en_US_POSIX" and it doesn't seem to create any new problems,
though I'd like to hear your comments.
Since Andrei and README.UNICODE say only ASCII characters in en_US_POSIX
locale, I'd think you're right. :-)
I was thinking, now that I know only basic ASCII digits are allowed,
zend_u_strtod() could manually grab the valid [prefix of] numbers, put them
in a char[some_size] buffer and use the regular zend_strtod(), etc. But it
doesn't seem like that would beat your version!
Just looking at the code again, I don't think it sets *endptr correctly with
a partial match (123.4foo)? Is "pos" set to the number of chars scanned?
--
Wbr,
Antony Dovgal
Matt
Hi Antony,
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006Hello all.
I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diffAccording to my tests, new implementation is faster in about 40 (forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40 seconds
without.Cool. :-) I'd just been thinking about zend_u_strtod() again -- if you see
my thread asking Andrei about the Unicode characters allowed as numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it was just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diff
Hmm.. Actually, your version seems to be slightly faster (~15%) than mine and there are several advantages,
like no formatter is required, avoiding parsing non-numeric strings and less TSRMLS_FETCH() calls,
which leads to even better results on non-numeric strings.
Also, it looks like a good idea to use zend_strtod() in both native and unicode modes.
BTW, your version would actually be 4000% faster, no? :-O
Yeah, not enough blood in my coffein =)
--
Wbr,
Antony Dovgal
Hi Antony,
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006
Hi Antony,
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006Hello all.
I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diffAccording to my tests, new implementation is faster in about 40 (forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40
seconds
without.Cool. :-) I'd just been thinking about zend_u_strtod() again -- if you
see
my thread asking Andrei about the Unicode characters allowed as
numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it was
just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diffHmm.. Actually, your version seems to be slightly faster (~15%) than mine
and there are several advantages, like no formatter is required, avoiding
parsing non-numeric strings and less TSRMLS_FETCH() calls, which leads to
even better results on non-numeric strings.Also, it looks like a good idea to use zend_strtod() in both native and
unicode modes.
Hmm, I don't understand how mine was faster if yours was 40x faster than the
current version, as when I tested mine, it was only ~8x. :-/
Well anyway, as I mentioned in my first reply, I had another idea since I
learned (from README.UNICODE and Andrei) that only ASCII digits are allowed
in Unicode. I just put this together:
http://realplain.com/php/zend_u_strtod.c I didn't test it (or even try to
compile), but you get the idea -- "manually" copying the relevant Unicode
chars to a char * (am I doing that right?). I've got the "char buf[64]"
there, assuming it's more efficient than using emalloc() (?), when the
number will fit. I didn't know what size to make it, just whatever catches
the vast majority of cases...
I'd really like to see a faster version implemented soon. :-)
Matt
Matt and Antony,
I this we should go with Matt's patch, since we care only about ASCII
characters in strings.
-Andrei
Hi Antony,
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006Hi Antony,
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006Hello all.
I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diffAccording to my tests, new implementation is faster in about 40
(forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40
seconds
without.Cool. :-) I'd just been thinking about zend_u_strtod() again -- if
you
see
my thread asking Andrei about the Unicode characters allowed as
numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it
was
just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diffHmm.. Actually, your version seems to be slightly faster (~15%) than
mine
and there are several advantages, like no formatter is required,
avoiding
parsing non-numeric strings and less TSRMLS_FETCH() calls, which
leads to
even better results on non-numeric strings.Also, it looks like a good idea to use zend_strtod() in both native
and
unicode modes.Hmm, I don't understand how mine was faster if yours was 40x faster
than the
current version, as when I tested mine, it was only ~8x. :-/Well anyway, as I mentioned in my first reply, I had another idea
since I
learned (from README.UNICODE and Andrei) that only ASCII digits are
allowed
in Unicode. I just put this together:
http://realplain.com/php/zend_u_strtod.c I didn't test it (or even
try to
compile), but you get the idea -- "manually" copying the relevant
Unicode
chars to a char * (am I doing that right?). I've got the "char
buf[64]"
there, assuming it's more efficient than using emalloc() (?), when the
number will fit. I didn't know what size to make it, just whatever
catches
the vast majority of cases...I'd really like to see a faster version implemented soon. :-)
Matt
Matt and Antony,
I this we should go with Matt's patch, since we care only about ASCII
characters in strings.
Agree.
I'm going to spend some time on this, re-test it once again and commit it after that.
Hi Antony,
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006Hi Antony,
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006Hello all.
I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diffAccording to my tests, new implementation is faster in about 40
(forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40
seconds
without.Cool. :-) I'd just been thinking about zend_u_strtod() again -- if
you
see
my thread asking Andrei about the Unicode characters allowed as
numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it
was
just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diffHmm.. Actually, your version seems to be slightly faster (~15%) than
mine
and there are several advantages, like no formatter is required,
avoiding
parsing non-numeric strings and less TSRMLS_FETCH() calls, which
leads to
even better results on non-numeric strings.Also, it looks like a good idea to use zend_strtod() in both native
and
unicode modes.Hmm, I don't understand how mine was faster if yours was 40x faster
than the
current version, as when I tested mine, it was only ~8x. :-/Well anyway, as I mentioned in my first reply, I had another idea
since I
learned (from README.UNICODE and Andrei) that only ASCII digits are
allowed
in Unicode. I just put this together:
http://realplain.com/php/zend_u_strtod.c I didn't test it (or even
try to
compile), but you get the idea -- "manually" copying the relevant
Unicode
chars to a char * (am I doing that right?). I've got the "char
buf[64]"
there, assuming it's more efficient than using emalloc() (?), when the
number will fit. I didn't know what size to make it, just whatever
catches
the vast majority of cases...I'd really like to see a faster version implemented soon. :-)
Matt
--
Wbr,
Antony Dovgal
Okay, I got some test results.
First of all, both patches seem to be fine, they both fix several failed tests:
Zend/tests/zend_strtod.phpt
ext/standard/tests/array/range.phpt
ext/standard/tests/general_functions/001.phpt
ext/standard/tests/math/abs.phpt
ext/standard/tests/math/bug30069.phpt
Now the test results (average time in seconds spent on running the test script):
Platform current | Matt | Tony
Linux intel64 (ICU 3.6, non-ZTS) 104.20 18.93 29.93
Linux intel64 (ICU 3.6, ZTS) 106.38 19.78 31.97
Linux i386 (ICU 3.6, non-ZTS) 809.21 25.49 59.78
Linux i386 (ICU 3.6, ZTS) 708.43 30.22 59.90
Linux i386 (ICU 3.4, non-ZTS) 526.71 22.39 36.96
Linux i386 (ICU 3.4, ZTS) 435.27 26.20 37.87
FreeBSD i386 (ICU 3.6, non-ZTS) -- 20.66 33.47
(Yes, I'm too lazy to rebuild PHP on FreeBSD 4 times more, as the result is clear).
So we got a clear winner here - the patch by Matt outperforms the patch of mine by ~30%.
The patches:
http://tony2001.phpclub.net/dev/tmp/u_strtod.diff - my patch
http://tony2001.phpclub.net/dev/tmp/u_strtod1.diff - Matt's patch
The test script used:
<?php
$a = array(
"0.1",
"1.",
"1.0",
"23423423.234234",
"0.00000E-2",
"0.000002E+3",
"121231312.1111",
"000.11111111",
"",
"text",
str_repeat("text", 1000)
);
$start = microtime(true);
for ($i = 0; $i<1000000; $i++) {
foreach ($a as $d) { $double = (double)$d; }
}
var_dump(microtime(true) - $start);
?>
--
Wbr,
Antony Dovgal
Hi Antony!
Wow, lots of testing ya did there. :-) Nice. But, just wanted to mention
that "Matt's patch" is the one I did back in Aug. Did you see my last
message [1] the other day with a new version that only checks for ASCII
chars, doesn't do conversion from Unicode, etc.?
http://realplain.com/php/zend_u_strtod.c (Untested)
BTW, I see ICU 3.6 is a lot faster on the "current" code (saw a hint about
that in the changelog); though still not close to the manual methods, of
course.
[1] http://news.php.net/php.internals/26538
Matt
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 17, 2006
Okay, I got some test results.
First of all, both patches seem to be fine, they both fix several failed
tests:
Zend/tests/zend_strtod.phpt
ext/standard/tests/array/range.phpt
ext/standard/tests/general_functions/001.phpt
ext/standard/tests/math/abs.phpt
ext/standard/tests/math/bug30069.phptNow the test results (average time in seconds spent on running the test
script):Platform current | Matt | Tony
Linux intel64 (ICU 3.6, non-ZTS) 104.20 18.93 29.93
Linux intel64 (ICU 3.6, ZTS) 106.38 19.78 31.97
Linux i386 (ICU 3.6, non-ZTS) 809.21 25.49 59.78
Linux i386 (ICU 3.6, ZTS) 708.43 30.22 59.90
Linux i386 (ICU 3.4, non-ZTS) 526.71 22.39 36.96
Linux i386 (ICU 3.4, ZTS) 435.27 26.20 37.87
FreeBSD i386 (ICU 3.6, non-ZTS) -- 20.66 33.47(Yes, I'm too lazy to rebuild PHP on FreeBSD 4 times more, as the result
is clear).So we got a clear winner here - the patch by Matt outperforms the patch of
mine by ~30%.The patches:
http://tony2001.phpclub.net/dev/tmp/u_strtod.diff - my patch
http://tony2001.phpclub.net/dev/tmp/u_strtod1.diff - Matt's patchThe test script used:
[...]--
Wbr,
Antony Dovgal
Hi Antony!
Wow, lots of testing ya did there. :-) Nice. But, just wanted to mention
that "Matt's patch" is the one I did back in Aug. Did you see my last
message [1] the other day with a new version that only checks for ASCII
chars, doesn't do conversion from Unicode, etc.?
Nope, I must have missed it.
http://realplain.com/php/zend_u_strtod.c (Untested)
Could you plz upload (or post) unified diff? Thanks.
BTW, I see ICU 3.6 is a lot faster on the "current" code (saw a hint about
that in the changelog); though still not close to the manual methods, of
course.
--
Wbr,
Antony Dovgal
Antony,
Sure, sorry. :-) I thought the straight code would be easier for you guys
to simply copy to a local file or such to test, but I don't really know
anything about it. :-P
http://realplain.com/php/zend_u_strtod-v2.diff
Matt
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 17, 2006
Hi Antony!
Wow, lots of testing ya did there. :-) Nice. But, just wanted to
mention
that "Matt's patch" is the one I did back in Aug. Did you see my last
message [1] the other day with a new version that only checks for ASCII
chars, doesn't do conversion from Unicode, etc.?Nope, I must have missed it.
http://realplain.com/php/zend_u_strtod.c (Untested)
Could you plz upload (or post) unified diff? Thanks.
BTW, I see ICU 3.6 is a lot faster on the "current" code (saw a hint
about
that in the changelog); though still not close to the manual methods, of
course.--
Wbr,
Antony Dovgal
Antony,
Sure, sorry. :-) I thought the straight code would be easier for you guys
to simply copy to a local file or such to test, but I don't really know
anything about it. :-P
Ok, this one looks a bit better:
Platform current | Matt | Tony | New patch by Matt
Linux intel64 (ICU 3.6, non-ZTS) 104.20 18.93 29.93 13.98
Linux intel64 (ICU 3.6, ZTS) 106.38 19.78 31.97 14.39
Linux i386 (ICU 3.6, non-ZTS) 809.21 25.49 59.78 18.76
Linux i386 (ICU 3.6, ZTS) 708.43 30.22 59.90 18.96
Linux i386 (ICU 3.4, non-ZTS) 526.71 22.39 36.96 --
Linux i386 (ICU 3.4, ZTS) 435.27 26.20 37.87 --
FreeBSD i386 (ICU 3.6, non-ZTS) -- 20.66 33.47 15.38
--
Wbr,
Antony Dovgal
Hi Antony,
Just to let you know, I made a couple minor layout changes to the code, and
also switched to the *_alloca() functions if memory needs allocating, which
I understand should be faster on systems that support it. (Patch file (v2)
is updated of course. :-))
Matt
----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 17, 2006
Ok, this one looks a bit better:
Platform current | Matt | Tony | New
patch by Matt
Linux intel64 (ICU 3.6, non-ZTS) 104.20 18.93 29.93
13.98
Linux intel64 (ICU 3.6, ZTS) 106.38 19.78 31.97
14.39
Linux i386 (ICU 3.6, non-ZTS) 809.21 25.49 59.78
18.76
Linux i386 (ICU 3.6, ZTS) 708.43 30.22 59.90
18.96
Linux i386 (ICU 3.4, non-ZTS) 526.71 22.39
--
Linux i386 (ICU 3.4, ZTS) 435.27 26.20
--
FreeBSD i386 (ICU 3.6, non-ZTS) -- 20.66 33.47
15.38
--
Wbr,
Antony Dovgal
Hi Antony,
Just to let you know, I made a couple minor layout changes to the code, and
also switched to the *_alloca() functions if memory needs allocating, which
I understand should be faster on systems that support it. (Patch file (v2)
is updated of course. :-))
I've just committed the patch, thanks a lot.
--
Wbr,
Antony Dovgal
Hi Antony,
----- Original Message -----
From: "Antony Dovgal"
Sent: Wednesday, December 06, 2006
Hi Antony,
Just to let you know, I made a couple minor layout changes to the code,
and
also switched to the *_alloca() functions if memory needs allocating,
which
I understand should be faster on systems that support it. (Patch file
(v2)
is updated of course. :-))I've just committed the patch, thanks a lot.
I see. Cool, thanks. :-) No problem.
Wbr,
Antony Dovgal
Matt