zend_u_strtod() 400% speed up

19 years ago by Antony Dovgal — view source — reply

unread

Hello all.

I would like to propose a replacement for current zend_u_strtod() implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diff

According to my tests, new implementation is faster in about 40 (forty) times.
The simple script below takes ~1 sec to run with the patch and ~40 seconds without.

<?php
$a = array(
"0.1",
"1.",
"1.0",
"23423423.234234",
"0.00000E-2",
"0.000002E+3",
"121231312.1111",
"000.11111111",
);

$start = microtime(true);
for ($i = 0; $i<100000; $i++) {
foreach ($a as $d) { $double = (double)$d; }
}
var_dump(microtime(true) - $start);
?>

The only question here is which locale to use for number parsing/formatting.
I used "en_US_POSIX" and it doesn't seem to create any new problems, though I'd like to hear your comments.

--
Wbr,
Antony Dovgal

19 years ago by Matt Wilmas — view source — reply

unread

Hi Antony,

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006

Hello all.

I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diff

According to my tests, new implementation is faster in about 40 (forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40 seconds
without.

Cool. :-) I'd just been thinking about zend_u_strtod() again -- if you see
my thread asking Andrei about the Unicode characters allowed as numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it was just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diff

BTW, your version would actually be 4000% faster, no? :-O

The only question here is which locale to use for number
parsing/formatting.
I used "en_US_POSIX" and it doesn't seem to create any new problems,
though I'd like to hear your comments.

Since Andrei and README.UNICODE say only ASCII characters in en_US_POSIX
locale, I'd think you're right. :-)

I was thinking, now that I know only basic ASCII digits are allowed,
zend_u_strtod() could manually grab the valid [prefix of] numbers, put them
in a char[some_size] buffer and use the regular zend_strtod(), etc. But it
doesn't seem like that would beat your version!

Just looking at the code again, I don't think it sets *endptr correctly with
a partial match (123.4foo)? Is "pos" set to the number of chars scanned?

--
Wbr,
Antony Dovgal

Matt

19 years ago by Antony Dovgal — view source — reply

unread

Hi Antony,

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006

Hello all.

I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diff

According to my tests, new implementation is faster in about 40 (forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40 seconds
without.

Cool. :-) I'd just been thinking about zend_u_strtod() again -- if you see
my thread asking Andrei about the Unicode characters allowed as numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it was just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diff

Hmm.. Actually, your version seems to be slightly faster (~15%) than mine and there are several advantages,
like no formatter is required, avoiding parsing non-numeric strings and less TSRMLS_FETCH() calls,
which leads to even better results on non-numeric strings.

Also, it looks like a good idea to use zend_strtod() in both native and unicode modes.

BTW, your version would actually be 4000% faster, no? :-O

Yeah, not enough blood in my coffein =)

--
Wbr,
Antony Dovgal

19 years ago by Matt Wilmas — view source — reply

unread

Hi Antony,

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006

Hi Antony,

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006

Hello all.

I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diff

According to my tests, new implementation is faster in about 40 (forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40
seconds
without.

Cool. :-) I'd just been thinking about zend_u_strtod() again -- if you
see
my thread asking Andrei about the Unicode characters allowed as
numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it was
just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diff

Hmm.. Actually, your version seems to be slightly faster (~15%) than mine
and there are several advantages, like no formatter is required, avoiding
parsing non-numeric strings and less TSRMLS_FETCH() calls, which leads to
even better results on non-numeric strings.

Also, it looks like a good idea to use zend_strtod() in both native and
unicode modes.

Hmm, I don't understand how mine was faster if yours was 40x faster than the
current version, as when I tested mine, it was only ~8x. :-/

Well anyway, as I mentioned in my first reply, I had another idea since I
learned (from README.UNICODE and Andrei) that only ASCII digits are allowed
in Unicode. I just put this together:
http://realplain.com/php/zend_u_strtod.c I didn't test it (or even try to
compile), but you get the idea -- "manually" copying the relevant Unicode
chars to a char * (am I doing that right?). I've got the "char buf[64]"
there, assuming it's more efficient than using emalloc() (?), when the
number will fit. I didn't know what size to make it, just whatever catches
the vast majority of cases...

I'd really like to see a faster version implemented soon. :-)

Matt

19 years ago by Andrei Zmievski — view source — reply

unread

Matt and Antony,

I this we should go with Matt's patch, since we care only about ASCII
characters in strings.

-Andrei

Hi Antony,

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006

Hi Antony,

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006

Hello all.

I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diff

According to my tests, new implementation is faster in about 40
(forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40
seconds
without.

Cool. :-) I'd just been thinking about zend_u_strtod() again -- if
you
see
my thread asking Andrei about the Unicode characters allowed as
numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it
was
just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diff

Hmm.. Actually, your version seems to be slightly faster (~15%) than
mine
and there are several advantages, like no formatter is required,
avoiding
parsing non-numeric strings and less TSRMLS_FETCH() calls, which
leads to
even better results on non-numeric strings.

Also, it looks like a good idea to use zend_strtod() in both native
and
unicode modes.

Hmm, I don't understand how mine was faster if yours was 40x faster
than the
current version, as when I tested mine, it was only ~8x. :-/

Well anyway, as I mentioned in my first reply, I had another idea
since I
learned (from README.UNICODE and Andrei) that only ASCII digits are
allowed
in Unicode. I just put this together:
http://realplain.com/php/zend_u_strtod.c I didn't test it (or even
try to
compile), but you get the idea -- "manually" copying the relevant
Unicode
chars to a char * (am I doing that right?). I've got the "char
buf[64]"
there, assuming it's more efficient than using emalloc() (?), when the
number will fit. I didn't know what size to make it, just whatever
catches
the vast majority of cases...

I'd really like to see a faster version implemented soon. :-)

Matt

19 years ago by Antony Dovgal — view source — reply

unread

Matt and Antony,

I this we should go with Matt's patch, since we care only about ASCII
characters in strings.

Agree.
I'm going to spend some time on this, re-test it once again and commit it after that.

Hi Antony,

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006

Hi Antony,

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 10, 2006

Hello all.

I would like to propose a replacement for current zend_u_strtod()
implementation.
The patch: http://tony2001.phpclub.net/dev/tmp/u_strtod.diff

According to my tests, new implementation is faster in about 40
(forty)
times.
The simple script below takes ~1 sec to run with the patch and ~40
seconds
without.

Cool. :-) I'd just been thinking about zend_u_strtod() again -- if
you
see
my thread asking Andrei about the Unicode characters allowed as
numbers...
I changed the function back in the summer to be about 8x faster by
converting FROM Unicode and using the regular zend_strtod(), but it
was
just
a quick hack, so maybe better that it wasn't applied!
http://realplain.com/php/zend_u_strtod.diff

Hmm.. Actually, your version seems to be slightly faster (~15%) than
mine
and there are several advantages, like no formatter is required,
avoiding
parsing non-numeric strings and less TSRMLS_FETCH() calls, which
leads to
even better results on non-numeric strings.

Also, it looks like a good idea to use zend_strtod() in both native
and
unicode modes.

Hmm, I don't understand how mine was faster if yours was 40x faster
than the
current version, as when I tested mine, it was only ~8x. :-/

Well anyway, as I mentioned in my first reply, I had another idea
since I
learned (from README.UNICODE and Andrei) that only ASCII digits are
allowed
in Unicode. I just put this together:
http://realplain.com/php/zend_u_strtod.c I didn't test it (or even
try to
compile), but you get the idea -- "manually" copying the relevant
Unicode
chars to a char * (am I doing that right?). I've got the "char
buf[64]"
there, assuming it's more efficient than using emalloc() (?), when the
number will fit. I didn't know what size to make it, just whatever
catches
the vast majority of cases...

I'd really like to see a faster version implemented soon. :-)

Matt

--
Wbr,
Antony Dovgal

19 years ago by Antony Dovgal — view source — reply

unread

Okay, I got some test results.
First of all, both patches seem to be fine, they both fix several failed tests:
Zend/tests/zend_strtod.phpt
ext/standard/tests/array/range.phpt
ext/standard/tests/general_functions/001.phpt
ext/standard/tests/math/abs.phpt
ext/standard/tests/math/bug30069.phpt

Now the test results (average time in seconds spent on running the test script):

 Platform                         current  |  Matt  |  Tony

Linux intel64 (ICU 3.6, non-ZTS) 104.20 18.93 29.93
Linux intel64 (ICU 3.6, ZTS) 106.38 19.78 31.97
Linux i386 (ICU 3.6, non-ZTS) 809.21 25.49 59.78
Linux i386 (ICU 3.6, ZTS) 708.43 30.22 59.90
Linux i386 (ICU 3.4, non-ZTS) 526.71 22.39 36.96
Linux i386 (ICU 3.4, ZTS) 435.27 26.20 37.87
FreeBSD i386 (ICU 3.6, non-ZTS) -- 20.66 33.47

(Yes, I'm too lazy to rebuild PHP on FreeBSD 4 times more, as the result is clear).

So we got a clear winner here - the patch by Matt outperforms the patch of mine by ~30%.

The patches:
http://tony2001.phpclub.net/dev/tmp/u_strtod.diff - my patch
http://tony2001.phpclub.net/dev/tmp/u_strtod1.diff - Matt's patch

The test script used:
<?php

$a = array(
"0.1",
"1.",
"1.0",
"23423423.234234",
"0.00000E-2",
"0.000002E+3",
"121231312.1111",
"000.11111111",
"",
"text",
str_repeat("text", 1000)
);

$start = microtime(true);
for ($i = 0; $i<1000000; $i++) {
foreach ($a as $d) { $double = (double)$d; }
}
var_dump(microtime(true) - $start);

--
Wbr,
Antony Dovgal

19 years ago by Matt Wilmas — view source — reply

unread

Hi Antony!

Wow, lots of testing ya did there. :-) Nice. But, just wanted to mention
that "Matt's patch" is the one I did back in Aug. Did you see my last
message [1] the other day with a new version that only checks for ASCII
chars, doesn't do conversion from Unicode, etc.?
http://realplain.com/php/zend_u_strtod.c (Untested)

BTW, I see ICU 3.6 is a lot faster on the "current" code (saw a hint about
that in the changelog); though still not close to the manual methods, of
course.

[1] http://news.php.net/php.internals/26538

Matt

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 17, 2006

Okay, I got some test results.
First of all, both patches seem to be fine, they both fix several failed
tests:
Zend/tests/zend_strtod.phpt
ext/standard/tests/array/range.phpt
ext/standard/tests/general_functions/001.phpt
ext/standard/tests/math/abs.phpt
ext/standard/tests/math/bug30069.phpt

Now the test results (average time in seconds spent on running the test
script):
 Platform                         current  |  Matt  |  Tony
Linux intel64 (ICU 3.6, non-ZTS) 104.20 18.93 29.93
Linux intel64 (ICU 3.6, ZTS) 106.38 19.78 31.97
Linux i386 (ICU 3.6, non-ZTS) 809.21 25.49 59.78
Linux i386 (ICU 3.6, ZTS) 708.43 30.22 59.90
Linux i386 (ICU 3.4, non-ZTS) 526.71 22.39 36.96
Linux i386 (ICU 3.4, ZTS) 435.27 26.20 37.87
FreeBSD i386 (ICU 3.6, non-ZTS) -- 20.66 33.47

(Yes, I'm too lazy to rebuild PHP on FreeBSD 4 times more, as the result
is clear).

So we got a clear winner here - the patch by Matt outperforms the patch of
mine by ~30%.

The patches:
http://tony2001.phpclub.net/dev/tmp/u_strtod.diff - my patch
http://tony2001.phpclub.net/dev/tmp/u_strtod1.diff - Matt's patch

The test script used:
[...]

--
Wbr,
Antony Dovgal

19 years ago by Antony Dovgal — view source — reply

unread

Hi Antony!

Wow, lots of testing ya did there. :-) Nice. But, just wanted to mention
that "Matt's patch" is the one I did back in Aug. Did you see my last
message [1] the other day with a new version that only checks for ASCII
chars, doesn't do conversion from Unicode, etc.?

Nope, I must have missed it.

http://realplain.com/php/zend_u_strtod.c (Untested)

Could you plz upload (or post) unified diff? Thanks.

BTW, I see ICU 3.6 is a lot faster on the "current" code (saw a hint about
that in the changelog); though still not close to the manual methods, of
course.

[1] http://news.php.net/php.internals/26538

--
Wbr,
Antony Dovgal

19 years ago by Matt Wilmas — view source — reply

unread

Antony,

Sure, sorry. :-) I thought the straight code would be easier for you guys
to simply copy to a local file or such to test, but I don't really know
anything about it. :-P

http://realplain.com/php/zend_u_strtod-v2.diff

Matt

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 17, 2006

Hi Antony!

Wow, lots of testing ya did there. :-) Nice. But, just wanted to
mention
that "Matt's patch" is the one I did back in Aug. Did you see my last
message [1] the other day with a new version that only checks for ASCII
chars, doesn't do conversion from Unicode, etc.?

Nope, I must have missed it.

http://realplain.com/php/zend_u_strtod.c (Untested)

Could you plz upload (or post) unified diff? Thanks.

BTW, I see ICU 3.6 is a lot faster on the "current" code (saw a hint
about
that in the changelog); though still not close to the manual methods, of
course.

[1] http://news.php.net/php.internals/26538

--
Wbr,
Antony Dovgal

19 years ago by Antony Dovgal — view source — reply

unread

Antony,

Sure, sorry. :-) I thought the straight code would be easier for you guys
to simply copy to a local file or such to test, but I don't really know
anything about it. :-P

http://realplain.com/php/zend_u_strtod-v2.diff

Ok, this one looks a bit better:

 Platform                         current  |  Matt  |  Tony | New patch by Matt

Linux intel64 (ICU 3.6, non-ZTS) 104.20 18.93 29.93 13.98
Linux intel64 (ICU 3.6, ZTS) 106.38 19.78 31.97 14.39
Linux i386 (ICU 3.6, non-ZTS) 809.21 25.49 59.78 18.76
Linux i386 (ICU 3.6, ZTS) 708.43 30.22 59.90 18.96
Linux i386 (ICU 3.4, non-ZTS) 526.71 22.39 36.96 --
Linux i386 (ICU 3.4, ZTS) 435.27 26.20 37.87 --
FreeBSD i386 (ICU 3.6, non-ZTS) -- 20.66 33.47 15.38

--
Wbr,
Antony Dovgal

18 years ago by Matt Wilmas — view source — reply

unread

Hi Antony,

Just to let you know, I made a couple minor layout changes to the code, and
also switched to the *_alloca() functions if memory needs allocating, which
I understand should be faster on systems that support it. (Patch file (v2)
is updated of course. :-))

Matt

----- Original Message -----
From: "Antony Dovgal"
Sent: Friday, November 17, 2006

[...]
http://realplain.com/php/zend_u_strtod-v2.diff

Ok, this one looks a bit better:

 Platform                         current  |  Matt  |  Tony | New

patch by Matt

Linux intel64 (ICU 3.6, non-ZTS) 104.20 18.93 29.93
13.98
Linux intel64 (ICU 3.6, ZTS) 106.38 19.78 31.97
14.39
Linux i386 (ICU 3.6, non-ZTS) 809.21 25.49 59.78
18.76
Linux i386 (ICU 3.6, ZTS) 708.43 30.22 59.90
18.96
Linux i386 (ICU 3.4, non-ZTS) 526.71 22.39
--
Linux i386 (ICU 3.4, ZTS) 435.27 26.20
--
FreeBSD i386 (ICU 3.6, non-ZTS) -- 20.66 33.47
15.38

--
Wbr,
Antony Dovgal

18 years ago by Antony Dovgal — view source — reply

unread

Hi Antony,

Just to let you know, I made a couple minor layout changes to the code, and
also switched to the *_alloca() functions if memory needs allocating, which
I understand should be faster on systems that support it. (Patch file (v2)
is updated of course. :-))

I've just committed the patch, thanks a lot.

--
Wbr,
Antony Dovgal

18 years ago by Matt Wilmas — view source — reply

unread

Hi Antony,

----- Original Message -----
From: "Antony Dovgal"
Sent: Wednesday, December 06, 2006

Hi Antony,

Just to let you know, I made a couple minor layout changes to the code,
and
also switched to the *_alloca() functions if memory needs allocating,
which
I understand should be faster on systems that support it. (Patch file
(v2)
is updated of course. :-))

I've just committed the patch, thanks a lot.

I see. Cool, thanks. :-) No problem.

Wbr,
Antony Dovgal

Matt