[RFC] Big Integer Support

10 years ago by Andrea Faulds — view source

unread

Good evening,

Since I don’t want this to languish as a ‘Draft’ forever, despite the patch being incomplete, I am finally putting the Big Integer Support RFC “Under Discussion”.

The RFC can be found here: https://wiki.php.net/rfc/bigint

The patch is, as I mentioned, incomplete. Additionally, there are quite a few matters left to be decided (see Open Questions). However, I think I should put this formally under discussion now.

Any help with the patch (largely just updating extensions and the mountains of tests these changes break, though later I will need to deal with opcache) would be appreciated.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

Since I don’t want this to languish as a ‘Draft’ forever, despite the
patch being incomplete, I am finally putting the Big Integer Support
RFC “Under Discussion”.

The RFC can be found here: https://wiki.php.net/rfc/bigint

This introduces new type, IS_BIGINT. However, given that GMP now
supports arithmetical operations, I wonder if it won't be easier to do
it in slightly different way, specifically create a hook that is going
to be called when an operation is about to cause over/underflow and let
GMP hook there and produce a GMP number (I'm not sure about the exact
details how to actually do it, so it's just an idea now, but if it makes
sense we can try to work out technical details).

Of course, this would require some rough edges to be polished, such as
what happens if you try to use it as int, or convert, etc. but this is
already present with IS_BIGINT too, and additionally we already have
conversion handlers for objects, which aren't consistently used in all
cases but can be made so. The benefit is we're not creating anything
completely new, we just improving how objects work.

This would also allow anybody who doesn't like GMP big integers easily
implement their own module to replace them.

Moreover, this also allows to make the support for bigints optional -
i.e., if you don't need bigints, you don't have to carry GMP and thus do
not have to be bound by its license.

What do you think?

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Andrea Faulds — view source

unread

Hi!

Since I don’t want this to languish as a ‘Draft’ forever, despite the
patch being incomplete, I am finally putting the Big Integer Support
RFC “Under Discussion”.

The RFC can be found here: https://wiki.php.net/rfc/bigint

This introduces new type, IS_BIGINT. However, given that GMP now
supports arithmetical operations, I wonder if it won't be easier to do
it in slightly different way, specifically create a hook that is going
to be called when an operation is about to cause over/underflow and let
GMP hook there and produce a GMP number (I'm not sure about the exact
details how to actually do it, so it's just an idea now, but if it makes
sense we can try to work out technical details).

Of course, this would require some rough edges to be polished, such as
what happens if you try to use it as int, or convert, etc. but this is
already present with IS_BIGINT too, and additionally we already have
conversion handlers for objects, which aren't consistently used in all
cases but can be made so. The benefit is we're not creating anything
completely new, we just improving how objects work.

This would also allow anybody who doesn't like GMP big integers easily
implement their own module to replace them.

Moreover, this also allows to make the support for bigints optional -
i.e., if you don't need bigints, you don't have to carry GMP and thus do
not have to be bound by its license.

What do you think?

I'm not sure what this would solve. Sure, you could just use objects instead of a new type, but both present exactly the same challenges. Adding a new type isn't hard in itself. The problem is updating everything which handles numbers and their associated tests. This doesn't make my job any easier. It also wouldn't cover a few places that a new type can, like constants. Another problem is this means that bigints are a separate thing from ints, meaning users have to worry about a new type which sometimes behaves differently. This isn't good. Under this RFC's proposal, however, bigints are a mere implementation detail. So far as the user cares, there are just ints.

Making it optional destroys most of the benefits of the RFC. Instead of reducing platform differences, it adds a massive new one. Now developers have to check whether or not bigints are enabled and have two different code paths. That's much worse than the status quo.

--
Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

I'm not sure what this would solve. Sure, you could just use objects
instead of a new type, but both present exactly the same challenges.
Adding a new type isn't hard in itself. The problem is updating
everything which handles numbers and their associated tests. This

Exactly. Since objects are convertable to numbers (and to anything, in
fact) we get double profit here - we make objects work better and we
achieve big integer support. And we don't need to handle new type where
we don't need numbers

doesn't make my job any easier. It also wouldn't cover a few places
that a new type can, like constants. Another problem is this means

I'm not sure I see much case for bigint constants. Would be pretty hard
for me to come up with a case where you need such a constant, and if you
do, you could just have a string constant and convert it to GMP in runtime.

that bigints are a separate thing from ints, meaning users have to
worry about a new type which sometimes behaves differently. This
isn't good. Under this RFC's proposal, however, bigints are a mere
implementation detail. So far as the user cares, there are just
ints.

No, they are not implementation detail - they are whole new type, which
means every extension and every piece of PHP code aware of types now
needs to know about it and needs special code to handle it. I.e. you
pass it to mysql - mysql needs to handle this type. You pass it to SOAP

SOAP needs to handle this type. Etc. But if it's an object, they
already deal with objects, one way or another.

Making it optional destroys most of the benefits of the RFC. Instead
of reducing platform differences, it adds a massive new one. Now

I'm not saying we have to make it optional. I'm just saying it's possible.

developers have to check whether or not bigints are enabled and have
two different code paths. That's much worse than the status quo.

I don't see why you'd have two code paths. If you need bigints and they
are not there, then you just fail, like with any extension your code
needs and is not installed. If it's there, you just continue working.
All the code existing now doesn't need bigints, and even in the future
most code won't need it. But for some code it would just work like
before, only with unlimited range now.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Lester Caine — view source

unread

I don't see why you'd have two code paths. If you need bigints and they
are not there, then you just fail, like with any extension your code
needs and is not installed. If it's there, you just continue working.
All the code existing now doesn't need bigints, and even in the future
most code won't need it. But for some code it would just work like
before, only with unlimited range now.

'bitinteger!'
I'm still waiting to see how we handle 'BIGINT' under this rfc since
that is something every database driver does need to handle.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Andrea Faulds — view source

unread

I don't see why you'd have two code paths. If you need bigints and they
are not there, then you just fail, like with any extension your code
needs and is not installed. If it's there, you just continue working.
All the code existing now doesn't need bigints, and even in the future
most code won't need it. But for some code it would just work like
before, only with unlimited range now.

'bitinteger!'
I'm still waiting to see how we handle 'BIGINT' under this rfc since
that is something every database driver does need to handle.

If you mean 64-bit ints, this RFC enables them to work on 32-bit too with exactly the same semantics. No more float overflow. On a 64-bit machine, they’re IS_LONG internally, and on 32-bit machines they’re IS_BIGINT, but the user doesn’t need to worry, they both act the same.

Assuming I actually get round to updating the DB drivers.

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Hi!

I'm not sure what this would solve. Sure, you could just use objects
instead of a new type, but both present exactly the same challenges.
Adding a new type isn't hard in itself. The problem is updating
everything which handles numbers and their associated tests. This

Exactly. Since objects are convertable to numbers (and to anything, in
fact) we get double profit here - we make objects work better and we
achieve big integer support. And we don't need to handle new type where
we don't need numbers

Handling a new type in cases where we don’t need numbers isn’t really a problem.

doesn't make my job any easier. It also wouldn't cover a few places
that a new type can, like constants. Another problem is this means

I'm not sure I see much case for bigint constants. Would be pretty hard
for me to come up with a case where you need such a constant, and if you
do, you could just have a string constant and convert it to GMP in runtime.

Still, it’s inconvenient. More for developers to worry about.

that bigints are a separate thing from ints, meaning users have to
worry about a new type which sometimes behaves differently. This
isn't good. Under this RFC's proposal, however, bigints are a mere
implementation detail. So far as the user cares, there are just
ints.

No, they are not implementation detail - they are whole new type, which
means every extension and every piece of PHP code aware of types now
needs to know about it and needs special code to handle it.

No, only extensions. It is completely transparent to userland. That’s the whole point.

I.e. you
pass it to mysql - mysql needs to handle this type. You pass it to SOAP

SOAP needs to handle this type. Etc. But if it's an object, they
already deal with objects, one way or another.

Yes, but they don’t handle large integer objects already. So you pass it a GMP object, it converts to a string, then that overflows and you end up with a float when it converts it to a number. Which isn’t what you wanted. Or, it handles it as a string, which is also not ideal, as while a string and an int may be the same thing to some extensions, they are not to others.

developers have to check whether or not bigints are enabled and have
two different code paths. That's much worse than the status quo.

I don't see why you'd have two code paths. If you need bigints and they
are not there, then you just fail, like with any extension your code
needs and is not installed.

It’s not about “extensions your code needs”. If you need ext/gmp, you can already require it. This RFC is about removing cross-platform integer handling differences.

All the code existing now doesn't need bigints, and even in the future
most code won't need it. But for some code it would just work like
before, only with unlimited range now.

No, but existing code does have to handle float overflow. If you allow that to optionally be int overflow, you now need to worry about handling both.

--
Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

Still, it’s inconvenient. More for developers to worry about.

I still have no idea why one would need a bigint constant, could you
give an common example where you would do that?

No, only extensions. It is completely transparent to userland.
That’s the whole point.

I'm not sure how it can be completely transparent if it's a different
type. Is it still identifying as int? In this case, this is dangerous as
some functions may not be able to accept big integers when accepting int
arguments, but checks for is_int, etc. would pass.

Yes, but they don’t handle large integer objects already. So you pass
it a GMP object, it converts to a string, then that overflows and you
end up with a float when it converts it to a number. Which isn’t what
you wanted. Or, it handles it as a string, which is also not ideal,
as while a string and an int may be the same thing to some
extensions, they are not to others.

If it's not, the extension has to handle it, the same way it has to
handle bigint anyway if it makes difference for it. The point is many
common cases are already covered, e.g. if the extension just needs a
string, or if the bigint actually represents a small int, etc.

It’s not about “extensions your code needs”. If you need ext/gmp, you
can already require it. This RFC is about removing cross-platform
integer handling differences.

But nothing changes there - it is still removing the diffs and it still
requires GMP. The only change is you're not paying for it if you don't
need it.

No, but existing code does have to handle float overflow. If you
allow that to optionally be int overflow, you now need to worry about
handling both.

What's "float overflow"? I'm not sure I'm getting your point here. You
don't need to handle anything - if your code doesn't care about big
ints, you just do math as usual. If it does, then you have to check big
ints are there, then do math as usual but be aware that int can be now
of two different types. I don't see any difference from the RFC here.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Andrea Faulds — view source

unread

Hi!

Still, it’s inconvenient. More for developers to worry about.

I still have no idea why one would need a bigint constant, could you
give an common example where you would do that?

The main point is why you should prohibit it. The point of bigints is to remove cross-platform integer differences. Why shouldn’t you here? Why should I conditionally do different things on 64-bit and 32-bit?

No, only extensions. It is completely transparent to userland.
That’s the whole point.

I'm not sure how it can be completely transparent if it's a different
type. Is it still identifying as int?

Yes.

In this case, this is dangerous as
some functions may not be able to accept big integers when accepting int
arguments, but checks for is_int, etc. would pass.

We already have this danger for another type: boolean. phpng got rid of IS_BOOL in favour of IS_TRUE and IS_FALSE. If we can update everything to handle the IS_BOOL change, surely we can update everything to handle bigints, too.

If it's not, the extension has to handle it, the same way it has to
handle bigint anyway if it makes difference for it. The point is many
common cases are already covered, e.g. if the extension just needs a
string, or if the bigint actually represents a small int, etc.

Many common cases are easily covered by a new type anyway. You overestimate the effort, I have already done the work, it’s not much. Objects make nothing easier.

No, but existing code does have to handle float overflow. If you
allow that to optionally be int overflow, you now need to worry about
handling both.

What's "float overflow”?

Beyond PHP_INT_MAX, integers magically become floats in PHP. They have done so for a long time.

I'm not sure I'm getting your point here. You
don't need to handle anything - if your code doesn't care about big
ints, you just do math as usual.

Then get weird results when someone passes a large number in.

If it does, then you have to check big
ints are there, then do math as usual but be aware that int can be now
of two different types. I don't see any difference from the RFC here.

The main point of the RFC is to make integers completely consistent across platforms and to remove the need to worry about overflow. Adding optional overflow to GMP means you still have to worry about it. It doesn’t solve anything. You can already use GMP for applications which explicitly need to use large numbers. This RFC doesn’t exist for that purpose.

--
Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

We already have this danger for another type: boolean. phpng got rid
of IS_BOOL in favour of IS_TRUE and IS_FALSE. If we can update
everything to handle the IS_BOOL change, surely we can update
everything to handle bigints, too.

No, it's not the same thing at all. For bool, you still have only true
and false. For bigint, your function now should be able to handle
infinite integers internally, but what if it has fixed resources that
assume integers have fixed range? For extensions, it's a commonplace,
but even for user code that can happen. That means, any call that you do
to an internal function with int argument now could fail since the
internal function is unable to support bigint, and you can't even guard
for this since your code can not distinguish regular int from bigint. I
don't think it is a good situation.

Then get weird results when someone passes a large number in.

Why would you get weird results? You describe some vague dangers but I
didn't see any concrete example of what is different.

The main point of the RFC is to make integers completely consistent
across platforms and to remove the need to worry about overflow.

Which does not change with my proposal.

Adding optional overflow to GMP means you still have to worry about
it. It doesn’t solve anything. You can already use GMP for

You seem to misunderstand what my proposal is. It doesn't add any
additional overflow, it just changes from using separate type
masquerading as int to using objects. All the rest stays the same.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Andrea Faulds — view source

unread

We already have this danger for another type: boolean. phpng got rid
of IS_BOOL in favour of IS_TRUE and IS_FALSE. If we can update
everything to handle the IS_BOOL change, surely we can update
everything to handle bigints, too.

No, it's not the same thing at all. For bool, you still have only true
and false. For bigint, your function now should be able to handle
infinite integers internally, but what if it has fixed resources that
assume integers have fixed range?

You throw an error. Just as plenty of functions already can’t handle ridiculously large integer arguments.

For extensions, it's a commonplace,
but even for user code that can happen. That means, any call that you do
to an internal function with int argument now could fail since the
internal function is unable to support bigint, and you can't even guard
for this since your code can not distinguish regular int from bigint. I
don't think it is a good situation.

If a function can’t support a large integer argument, this is usually for an obvious reason. I am not tormented daily in Python by the fact that I can’t seek by 2^69 bytes in a file, and I doubt any PHP developer would be.

Then get weird results when someone passes a large number in.

Why would you get weird results? You describe some vague dangers but I
didn't see any concrete example of what is different.

Integers beyond 2^64 (on 64-bit systems) or 2^32 (on 32-bit systems) overflow to floats and lose accuracy. Then, if they’re casted back to integers, are truncated silently and wrap around.

The main point of the RFC is to make integers completely consistent
across platforms and to remove the need to worry about overflow.

Which does not change with my proposal.

No, it does: There are now integers, and objects that represent large integers, which behave differently.

--
Andrea Faulds
http://ajf.me/

10 years ago by Stas Malyshev — view source

unread

Hi!

You throw an error. Just as plenty of functions already can’t handle
ridiculously large integer arguments.

The problem is, if you function can handle the int range and you checked
for is_int() and everything worked fine - now it's broken because
is_int() no longer implies fixed range and there's no way to check if
you're dealing with fixed-range number or infinite-range number.

No, it does: There are now integers, and objects that represent large
integers, which behave differently.

IS_INT and IS_BIGINT would necessarily behave differently too - since
some functions may support both and some only integers. Again, no change
here.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/

10 years ago by Andrea Faulds — view source

unread

You throw an error. Just as plenty of functions already can’t handle
ridiculously large integer arguments.

The problem is, if you function can handle the int range and you checked
for is_int() and everything worked fine - now it's broken because
is_int() no longer implies fixed range and there's no way to check if
you're dealing with fixed-range number or infinite-range number.

Yes, but you can check if the function errors. I don’t really think this is a massive problem. People will probably not realistically expect that all functions can accept really large numbers, whether that range cuts off at 2**64-1 or something more arbitrary. It’s a problem if abs() or sign() don’t work for bigints. It isn’t if str_repeat() doesn’t, because a similarly-sized non-bigint would error too.

No, it does: There are now integers, and objects that represent large
integers, which behave differently.

IS_INT and IS_BIGINT would necessarily behave differently too - since
some functions may support both and some only integers.

All functions would “support” both for integer arguments. But some might choose to reject bigints which are larger than the internal integer type the function uses… much like a function written for PHP currently might reject longs larger than the internal integer type the function uses.

Again, no change
here.

The point is the degree to which they can act the same. Objects can only go so far.

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

The RFC can be found here: https://wiki.php.net/rfc/bigint

The patch is, as I mentioned, incomplete. Additionally, there are quite a few matters left to be decided (see Open Questions). However, I think I should put this formally under discussion now.

I promise not to mail the list for every change I make to this RFC. ;)

But I do have quite a big one to announce. Previously, some issues with the GNU Multiple Precision Arithmetic Library (GMP) had been discovered. In particular, it is not liberally licensed (LGPL), it only has one set of custom allocators, which causes segfaults from other libraries which use it because PHP defines its own allocators, and it immediately calls an un-hookable abort() in certain failure cases.

I was unaware of any good alternatives, however today I was pointed by Chris Wright (DaveRandom) on StackOverflow towards a new possibility: LibTomMath. It is liberally licensed (dual-licensed as Public Domain and WTFPL), written in pure C, packaged for multiple platforms, and it lacks the immediate abort() problem to the best of my knowledge. Plus, it will not cause any segfaults when we use custom allocators, as I do not believe PHP uses any libraries which use LibTomMath at present. If you’re worried about whether it’s battle-tested, it’s used by another dynamic language, Tcl.

Because it appears to solve all three major issues with GMP, I am currently porting my bigint branch to use it. This is possible because the entire implementation of bigints is abstracted, meaning you can swap out back-ends. If we wished to, we could quite simply allow the choice of GMP at compile-time, or indeed any other back-end.

I should note that LibTomMath certainly isn’t perfect. I don’t believe it is optimised to the same degree GMP is. That being said, again, it does seem to solve all the major problems I had with GMP. So I have few qualms in making the patch use it, especially given that it is easy to swap out the back-end.

I’ve updated the RFC to reflect this new state of affairs: https://wiki.php.net/rfc/bigint

Thoughts?

Andrea Faulds
http://ajf.me/

10 years ago by Dmitry Stogov — view source

unread

Hi Andrea,

Why don't you use the ability of operator overloading? (it's in the engine
since 5.6).

BIGINT don't have to be completely transparent. If user would like to work
with BIGINT, let them crate PHP objects explicitly and then use operator
overloading. e.g.

<?php
function print_powers_of_two($bits) {
$bit = BIGINT(1);
$last = BIGINT(2) ** $bits;
while ($bit < $last) {
$bit *= 2;
echo "$bit\n";
}
}
print_powers_of_two(256);
?>

Your solution would allows writing the same without BIGINT, but not for
free.
I expect, it'll make some slowdown for all PHP scripts, independently, if
they use BIGINT or not.
I'll try to take a deeper look into the patch later...

Could you provide some benchmark results, comparing your patch with master?

Thanks. Dmitry.

Good evening,

Since I don’t want this to languish as a ‘Draft’ forever, despite the
patch being incomplete, I am finally putting the Big Integer Support RFC
“Under Discussion”.

The RFC can be found here: https://wiki.php.net/rfc/bigint

The patch is, as I mentioned, incomplete. Additionally, there are quite a
few matters left to be decided (see Open Questions). However, I think I
should put this formally under discussion now.

Any help with the patch (largely just updating extensions and the
mountains of tests these changes break, though later I will need to deal
with opcache) would be appreciated.

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Hi Andrea,

Why don't you use the ability of operator overloading? (it's in the engine since 5.6).

I've already answered this in this thread, but I'll answer it again if I must.

BIGINT don't have to be completely transparent. If user would like to work with BIGINT, let them crate PHP objects explicitly and then use operator overloading. e.g.

Well, they already can. ext/gmp exists.

Your solution would allows writing the same without BIGINT, but not for free.
I expect, it'll make some slowdown for all PHP scripts, independently, if they use BIGINT or not.
I'll try to take a deeper look into the patch later...

Could you provide some benchmark results, comparing your patch with master?

So, the point of this RFC is basically to make PHP a language where, like Python, Haskell, Prolog or (de jure but not de facto) Dart, integers can be arbitrarily large and you never have to worry about overflow. Instead of applications which definitely need bigints using them explicitly, all applications can now support integers of any size transparently, essentially for free. It also makes the language more intuitive in a way. Plus, it's one less cross-platform difference so code is more portable.

You're right it might not actually be free, though. I'll need to run some benchmarks - will do later today if I remember. It shouldn't be any slower than master, though. All it does is change what we do in our usual overflow checks, which we already had. Now, once you've overflowed and got a bigint, obviously they're slower than floats. However if you need floating-point performance you can explicitly cast to double and deliberately lose accuracy.

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

I expect, it'll make some slowdown for all PHP scripts, independently, if they use BIGINT or not.
I'll try to take a deeper look into the patch later...

Could you provide some benchmark results, comparing your patch with master?

I finally made the requested benchmarks. There’s barely a noticeable difference, though the bigint branch is apparently marginally faster (most likely from getting rid of fast_increment_function):

master	bigint
0.344788074	0.339091063
0.34658289	0.361176014
0.376623154	0.346175194
0.35006094	0.359763861
0.352533817	0.341754198
0.354025841	0.357409
0.360356092	0.379124165
0.367921829	0.351316929
0.370724916	0.373735189
0.351090908	0.346349001
0.355952978	0.356275797

average 0.357332858 0.355651855

(Times in seconds, smaller is better.)

Script:

<?php
$start = microtime(true);

for ($i = 0; $i < 1000000; $i++) {
$a = 2 * 3;
$b = $a - 3;
$c = $a * $b;
$d = $c / $c;
}

$end = microtime(true);

echo "took ", $end - $start, " secs\n”;
?>

I ran the script several times, then took the results and put them into Excel to produce the above table with its averages.

So common scripts are either unaffected, or will run ever-so-slightly faster.

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

I ran the script several times, then took the results and put them into Excel to produce the above table with its averages.

So common scripts are either unaffected, or will run ever-so-slightly faster.

Just to be clear, though, that didn’t tell the whole story. With that number of iterations, there’s no speed difference that isn’t within the margin of error. However, up the iterations by 100x and the bigint branch is consistently very slightly slower. Remove the body of the loop so it’s just for ($i = 0; $i < 100000000; $i++) {} and the bigint branch is consistently very slightly faster. No idea why either of these is the case.

So, apparently, the bigint branch both makes things slower and makes them faster! But it’s not a big enough difference for me to be worried about it. The differences that do exist might disappear if the fast_* functions can have their inline asm rewritten and be uncommented. Currently, master has custom asm for these, while the bigint branch has to use the probably slower C implementations because I don’t understand x86 or x64 asm and am unable to rewrite it.

--
Andrea Faulds
http://ajf.me/

10 years ago by Dmitry Stogov — view source

unread

Hi Andrea,

The synthetic benchmarks are not always reflect the impact on real-life
performance.

Unfortunately, I wasn't able to run any big real-life apps with your bigint
branch, because it misses support for commonly used extensions
(ext/session, ext/json, ext/pdo).

I ran bench.php and it's a bit slower with bigint.

master 1.210 sec
bigint 1.330 sec

I also measured the number of executed instructions using valgrind
--tool=callgrind (less is better)

master 1,118M
bigint 1,435M

May be part of this difference is caused by missing latest master
improvements, but anyway, introducing new core type, can't be done for free.

I also was able to run qdig, and it showed about 2% slowdown.

[master] $ sapi/cgi/php-cgi -T 1000 /var/www/html/bench/qdig/index.php >
/dev/null
Elapsed time: 3.327445 sec

[bigint] $ sapi/cgi/php-cgi -T 1000 /var/www/html/bench/qdig/index.php >
/dev/null
Elapsed time: 3.382823 sec

It would be great to measure the difference on wordpress, drupal, ZF...

Thanks. Dmitry.

I ran the script several times, then took the results and put them into
Excel to produce the above table with its averages.

So common scripts are either unaffected, or will run ever-so-slightly
faster.

Just to be clear, though, that didn’t tell the whole story. With that
number of iterations, there’s no speed difference that isn’t within the
margin of error. However, up the iterations by 100x and the bigint branch
is consistently very slightly slower. Remove the body of the loop so it’s
just for ($i = 0; $i < 100000000; $i++) {} and the bigint branch is
consistently very slightly faster. No idea why either of these is the case.

So, apparently, the bigint branch both makes things slower and makes them
faster! But it’s not a big enough difference for me to be worried about it.
The differences that do exist might disappear if the fast_* functions can
have their inline asm rewritten and be uncommented. Currently, master has
custom asm for these, while the bigint branch has to use the probably
slower C implementations because I don’t understand x86 or x64 asm and am
unable to rewrite it.

--
Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Hi!

Hi Andrea,

The synthetic benchmarks are not always reflect the impact on real-life performance.

Unfortunately, I wasn't able to run any big real-life apps with your bigint branch, because it misses support for commonly used extensions (ext/session, ext/json, ext/pdo).

Yes, that’s unfortunate. ext/json is first on my list to update once I’m done with ext/standard, I particularly want large integers in JSON to decode to bigints (though allow disabling this if you desire). I really should’ve finished porting ext/standard months ago, I’ve been dragging my heels on that one.

I ran bench.php and it's a bit slower with bigint.

master 1.210 sec
bigint 1.330 sec

I also measured the number of executed instructions using valgrind --tool=callgrind (less is better)

master 1,118M
bigint 1,435M

May be part of this difference is caused by missing latest master improvements, but anyway, introducing new core type, can't be done for free.

I’m not really sure about whether a new core type can’t be free. For switch statements, if they’re compiled to a jump table, they shouldn’t be any slower when a new case is added. But I’m not certain on that, I don’t spend much time reading generated asm.

Does bench.php do any float operations? I’m not sure from reading the source, but I think it might end up having ints overflow and become floats in master or bigints in my branch. If that’s the case, it would obviously be slower as bigints trade performance for accuracy. This particular issue can’t really be helped. Although these apps, if they want floats, can just ask for them explicitly by marking their numbers with a dot.

Another source of slowdown is, as previously mentioned, the asm functions not being updated and hence me having to disable them. Particularly for things like multiplication, addition and so on, the C code we have is far less efficient. I believe the asm code simply checks for an overflow flag after the operation, which should be very fast. On the other hand, the C code converts the ints to doubles, does a double operation, sees if the result of that is greater than PHP_INT_MAX converted to a double, and then does the operation if it won’t overflow. This means that, until the asm code is updated, all integer operations may be significantly slower, which is unfortunate. However, I think that if the asm were to be updated, the slowdown for integer ops would completely, or at least mostly, disappear.

I also was able to run qdig, and it showed about 2% slowdown.

[master] $ sapi/cgi/php-cgi -T 1000 /var/www/html/bench/qdig/index.php > /dev/null
Elapsed time: 3.327445 sec

[bigint] $ sapi/cgi/php-cgi -T 1000 /var/www/html/bench/qdig/index.php > /dev/null
Elapsed time: 3.382823 sec

It would be great to measure the difference on wordpress, drupal, ZF…

The reasons for the dig slowdown are likely the same.

--

I’ve so far been scared to touch the asm… but actually, I don’t think it could be that hard. It’s not doing something especially complex. The bigint API looks fairly stable now and I’m unlikely to change it much further, so there’s little worry about having to change the asm a second time. The main problem with asm, I suppose, is testing it. I do have a 32-bit Ubuntu VM set up, but I’d also need to set up Windows VMs, and possibly others (don’t we have PowerPC in the source just now?).

I might experiment with it tonight, or sometime later this week.

Thanks.

--
Andrea Faulds
http://ajf.me/

10 years ago by Dmitry Stogov — view source

unread

Hi!

Hi Andrea,

The synthetic benchmarks are not always reflect the impact on real-life
performance.

Unfortunately, I wasn't able to run any big real-life apps with your
bigint branch, because it misses support for commonly used extensions
(ext/session, ext/json, ext/pdo).

Yes, that’s unfortunate. ext/json is first on my list to update once I’m
done with ext/standard, I particularly want large integers in JSON to
decode to bigints (though allow disabling this if you desire). I really
should’ve finished porting ext/standard months ago, I’ve been dragging my
heels on that one.

I ran bench.php and it's a bit slower with bigint.

master 1.210 sec
bigint 1.330 sec

I also measured the number of executed instructions using valgrind
--tool=callgrind (less is better)

master 1,118M
bigint 1,435M

May be part of this difference is caused by missing latest master
improvements, but anyway, introducing new core type, can't be done for free.

I’m not really sure about whether a new core type can’t be free. For
switch statements, if they’re compiled to a jump table, they shouldn’t be
any slower when a new case is added. But I’m not certain on that, I don’t
spend much time reading generated asm.

Does bench.php do any float operations? I’m not sure from reading the
source, but I think it might end up having ints overflow and become floats
in master or bigints in my branch. If that’s the case, it would obviously
be slower as bigints trade performance for accuracy. This particular issue
can’t really be helped. Although these apps, if they want floats, can just
ask for them explicitly by marking their numbers with a dot.

bench.php does some math on long and floats, but I don't think overflow is
involved.

Another source of slowdown is, as previously mentioned, the asm functions
not being updated and hence me having to disable them. Particularly for
things like multiplication, addition and so on, the C code we have is far
less efficient. I believe the asm code simply checks for an overflow flag
after the operation, which should be very fast.

yes, this may be a reason.

On the other hand, the C code converts the ints to doubles, does a double
operation, sees if the result of that is greater than PHP_INT_MAX converted
to a double, and then does the operation if it won’t overflow. This means
that, until the asm code is updated, all integer operations may be
significantly slower, which is unfortunate. However, I think that if the
asm were to be updated, the slowdown for integer ops would completely, or
at least mostly, disappear.

I also was able to run qdig, and it showed about 2% slowdown.

[master] $ sapi/cgi/php-cgi -T 1000 /var/www/html/bench/qdig/index.php >
/dev/null
Elapsed time: 3.327445 sec

[bigint] $ sapi/cgi/php-cgi -T 1000 /var/www/html/bench/qdig/index.php >
/dev/null
Elapsed time: 3.382823 sec

It would be great to measure the difference on wordpress, drupal, ZF…

The reasons for the dig slowdown are likely the same.

2% is not a big difference (it may be even a measurement mistake), but more
tests should be done.

--

I’ve so far been scared to touch the asm… but actually, I don’t think it
could be that hard. It’s not doing something especially complex. The
bigint API looks fairly stable now and I’m unlikely to change it much
further, so there’s little worry about having to change the asm a second
time. The main problem with asm, I suppose, is testing it. I do have a
32-bit Ubuntu VM set up, but I’d also need to set up Windows VMs, and
possibly others (don’t we have PowerPC in the source just now?).

change asm for 32-bit Linux and add TODO marks for others. I don't test PHP
on PPC as well.

Thanks. Dmitry.

I might experiment with it tonight, or sometime later this week.

Thanks.

--
Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Hey Dmitry,

I’ve so far been scared to touch the asm… but actually, I don’t think it
could be that hard. It’s not doing something especially complex. The
bigint API looks fairly stable now and I’m unlikely to change it much
further, so there’s little worry about having to change the asm a second
time. The main problem with asm, I suppose, is testing it. I do have a
32-bit Ubuntu VM set up, but I’d also need to set up Windows VMs, and
possibly others (don’t we have PowerPC in the source just now?).

change asm for 32-bit Linux and add TODO marks for others. I don't test PHP
on PPC as well.

After procrastinating about this for a long time, I finally went and updated the overflow checks today and ran bench.php.

I still haven’t touched the inline asm, I’ve just removed it, since clang and GCC (only in GCC 5.0, sadly) have checked arithmetic intrinsics. If someone wants to, they can rewrite the inline asm for compilers that have no overflow-checking intrinsics, but this is good enough for now, at least for the purposes of performance checking on my machine. I’m using clang, by the way. If you want to replicate these results, you’ll probably also need it, since GCC 5.0 isn’t out yet, unfortunately.

I compiled the bigint-libtommath branch (theoretically this was just a branch, but actually all the new changes have gone there, I’ll merge it into the bigint branch once LibTomMath port is done), and the current master branch.

For bigint-libtommath, I used ./configure --enable-debug --enable-phpdbg —-disable-all —-enable-bigint-gmp

Because of the —-enable-bigint-gmp flag, it’s using the GMP backend, not the LibTomMath one. I’m doing this since there’s still one or two small things I haven’t finished implementing for LibTomMath, e.g. the binary bitwise ops have the wrong behaviour just now.

For master, I used ./configure --enable-debug --enable-phpdbg —-disable-all

Then, I ran bench.php four times, and each time I ran it first on ./php-bigint-gmp, then on ./php-bigint-master.

On each run, the bigint branch turned out faster, as well as overall:

bigint master
6.593 6.659
6.424 6.661
6.414 6.588
6.381 6.673
AVERAGE
6.453 6.64525
DIFFERENCE
-0.19225 0.19225
RATIO
0.971069561 1.0297923446

So master is 2.9% slower! Full output here: https://gist.github.com/TazeTSchnitzel/759c1513b442571f5e26

I can’t actually explain why bigints would be faster. It might just be because I got rid of fast_increment_function in favour of just checking of op1 == ZEND_LONG_MAX in zend_vm_execute.h, ditto for fast_decrement_function. Maybe using overflow intrinsics is faster than inline asm. Maybe it’s something completely different. I honestly don’t know.

The result surprised me as I expect bigints would be slower, so I redid it. Again, bigints came out on top:

bigint master
6.55 6.779
6.353 6.738
6.326 6.674
6.144 6.177
AVERAGE
6.34325 6.592
DIFFERENCE
-0.24875 0.24875
RATIO
0.9622648665 1.0392149135

This time master was around 3.9% slower. Full log here: https://gist.github.com/TazeTSchnitzel/59c190b86c9dd5b20570

If we combine the two runs:

bigint master
6.593 6.659
6.424 6.661
6.414 6.588
6.381 6.673
6.55 6.779
6.353 6.738
6.326 6.674
6.144 6.177
AVERAGE
6.398125 6.618625
DIFFERENCE
-0.2205 0.2205
RATIO
0.9666849232 1.0344632216

master’s 3.4% slower.

Just to check I named the files correctly:

oa-res-26-240:php-src ajf$ ./php-master -r 'var_dump(PHP_INT_MAX * 2);'
float(1.844674407371E+19)
oa-res-26-240:php-src ajf$ ./php-bigint-gmp -r 'var_dump(PHP_INT_MAX * 2);'
int(18446744073709551614)

Yes, it’s definitely the bigint branch.

So, at least by these preliminary results, the bigint branch would appear to be faster than master. This is merely bench.php, but it’s still a good sign. :)

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Dmitry Stogov — view source

unread

I'm really surprised by the results :)
I'll try to find time for bigint on next week and play with it a bit.

Thanks. Dmitry.

Hey Dmitry,

I’ve so far been scared to touch the asm… but actually, I don’t think it
could be that hard. It’s not doing something especially complex. The
bigint API looks fairly stable now and I’m unlikely to change it much
further, so there’s little worry about having to change the asm a second
time. The main problem with asm, I suppose, is testing it. I do have a
32-bit Ubuntu VM set up, but I’d also need to set up Windows VMs, and
possibly others (don’t we have PowerPC in the source just now?).

change asm for 32-bit Linux and add TODO marks for others. I don't test
PHP
on PPC as well.

After procrastinating about this for a long time, I finally went and
updated the overflow checks today and ran bench.php.

I still haven’t touched the inline asm, I’ve just removed it, since clang
and GCC (only in GCC 5.0, sadly) have checked arithmetic intrinsics. If
someone wants to, they can rewrite the inline asm for compilers that have
no overflow-checking intrinsics, but this is good enough for now, at least
for the purposes of performance checking on my machine. I’m using clang, by
the way. If you want to replicate these results, you’ll probably also need
it, since GCC 5.0 isn’t out yet, unfortunately.

I compiled the bigint-libtommath branch (theoretically this was just a
branch, but actually all the new changes have gone there, I’ll merge it
into the bigint branch once LibTomMath port is done), and the current
master branch.

For bigint-libtommath, I used ./configure --enable-debug --enable-phpdbg
—-disable-all —-enable-bigint-gmp

Because of the —-enable-bigint-gmp flag, it’s using the GMP backend, not
the LibTomMath one. I’m doing this since there’s still one or two small
things I haven’t finished implementing for LibTomMath, e.g. the binary
bitwise ops have the wrong behaviour just now.

For master, I used ./configure --enable-debug --enable-phpdbg —-disable-all

Then, I ran bench.php four times, and each time I ran it first on
./php-bigint-gmp, then on ./php-bigint-master.

On each run, the bigint branch turned out faster, as well as overall:

bigint master
6.593 6.659
6.424 6.661
6.414 6.588
6.381 6.673
AVERAGE
6.453 6.64525
DIFFERENCE
-0.19225 0.19225
RATIO
0.971069561 1.0297923446

So master is 2.9% slower! Full output here:
https://gist.github.com/TazeTSchnitzel/759c1513b442571f5e26

I can’t actually explain why bigints would be faster. It might just be
because I got rid of fast_increment_function in favour of just checking of
op1 == ZEND_LONG_MAX in zend_vm_execute.h, ditto for
fast_decrement_function. Maybe using overflow intrinsics is faster than
inline asm. Maybe it’s something completely different. I honestly don’t
know.

The result surprised me as I expect bigints would be slower, so I redid
it. Again, bigints came out on top:

bigint master
6.55 6.779
6.353 6.738
6.326 6.674
6.144 6.177
AVERAGE
6.34325 6.592
DIFFERENCE
-0.24875 0.24875
RATIO
0.9622648665 1.0392149135

This time master was around 3.9% slower. Full log here:
https://gist.github.com/TazeTSchnitzel/59c190b86c9dd5b20570

If we combine the two runs:

bigint master
6.593 6.659
6.424 6.661
6.414 6.588
6.381 6.673
6.55 6.779
6.353 6.738
6.326 6.674
6.144 6.177
AVERAGE
6.398125 6.618625
DIFFERENCE
-0.2205 0.2205
RATIO
0.9666849232 1.0344632216

master’s 3.4% slower.

Just to check I named the files correctly:

oa-res-26-240:php-src ajf$ ./php-master -r 'var_dump(PHP_INT_MAX * 2);'
float(1.844674407371E+19)
oa-res-26-240:php-src ajf$ ./php-bigint-gmp -r 'var_dump(PHP_INT_MAX * 2);'
int(18446744073709551614)

Yes, it’s definitely the bigint branch.

So, at least by these preliminary results, the bigint branch would appear
to be faster than master. This is merely bench.php, but it’s still a good
sign. :)

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Dmitry Stogov — view source

unread

Hi Andrea,

Where can I get the code?

Thanks. Dmitry.

I'm really surprised by the results :)
I'll try to find time for bigint on next week and play with it a bit.

Thanks. Dmitry.

Hey Dmitry,

I’ve so far been scared to touch the asm… but actually, I don’t think
it
could be that hard. It’s not doing something especially complex. The
bigint API looks fairly stable now and I’m unlikely to change it much
further, so there’s little worry about having to change the asm a
second
time. The main problem with asm, I suppose, is testing it. I do have a
32-bit Ubuntu VM set up, but I’d also need to set up Windows VMs, and
possibly others (don’t we have PowerPC in the source just now?).

change asm for 32-bit Linux and add TODO marks for others. I don't test
PHP
on PPC as well.

After procrastinating about this for a long time, I finally went and
updated the overflow checks today and ran bench.php.

I still haven’t touched the inline asm, I’ve just removed it, since clang
and GCC (only in GCC 5.0, sadly) have checked arithmetic intrinsics. If
someone wants to, they can rewrite the inline asm for compilers that have
no overflow-checking intrinsics, but this is good enough for now, at least
for the purposes of performance checking on my machine. I’m using clang, by
the way. If you want to replicate these results, you’ll probably also need
it, since GCC 5.0 isn’t out yet, unfortunately.

I compiled the bigint-libtommath branch (theoretically this was just a
branch, but actually all the new changes have gone there, I’ll merge it
into the bigint branch once LibTomMath port is done), and the current
master branch.

For bigint-libtommath, I used ./configure --enable-debug --enable-phpdbg
—-disable-all —-enable-bigint-gmp

Because of the —-enable-bigint-gmp flag, it’s using the GMP backend, not
the LibTomMath one. I’m doing this since there’s still one or two small
things I haven’t finished implementing for LibTomMath, e.g. the binary
bitwise ops have the wrong behaviour just now.

For master, I used ./configure --enable-debug --enable-phpdbg
—-disable-all

Then, I ran bench.php four times, and each time I ran it first on
./php-bigint-gmp, then on ./php-bigint-master.

On each run, the bigint branch turned out faster, as well as overall:

bigint master
6.593 6.659
6.424 6.661
6.414 6.588
6.381 6.673
AVERAGE
6.453 6.64525
DIFFERENCE
-0.19225 0.19225
RATIO
0.971069561 1.0297923446

So master is 2.9% slower! Full output here:
https://gist.github.com/TazeTSchnitzel/759c1513b442571f5e26

I can’t actually explain why bigints would be faster. It might just be
because I got rid of fast_increment_function in favour of just checking of
op1 == ZEND_LONG_MAX in zend_vm_execute.h, ditto for
fast_decrement_function. Maybe using overflow intrinsics is faster than
inline asm. Maybe it’s something completely different. I honestly don’t
know.

The result surprised me as I expect bigints would be slower, so I redid
it. Again, bigints came out on top:

bigint master
6.55 6.779
6.353 6.738
6.326 6.674
6.144 6.177
AVERAGE
6.34325 6.592
DIFFERENCE
-0.24875 0.24875
RATIO
0.9622648665 1.0392149135

This time master was around 3.9% slower. Full log here:
https://gist.github.com/TazeTSchnitzel/59c190b86c9dd5b20570

If we combine the two runs:

bigint master
6.593 6.659
6.424 6.661
6.414 6.588
6.381 6.673
6.55 6.779
6.353 6.738
6.326 6.674
6.144 6.177
AVERAGE
6.398125 6.618625
DIFFERENCE
-0.2205 0.2205
RATIO
0.9666849232 1.0344632216

master’s 3.4% slower.

Just to check I named the files correctly:

oa-res-26-240:php-src ajf$ ./php-master -r 'var_dump(PHP_INT_MAX * 2);'
float(1.844674407371E+19)
oa-res-26-240:php-src ajf$ ./php-bigint-gmp -r 'var_dump(PHP_INT_MAX *
2);'
int(18446744073709551614)

Yes, it’s definitely the bigint branch.

So, at least by these preliminary results, the bigint branch would appear
to be faster than master. This is merely bench.php, but it’s still a good
sign. :)

Thanks!

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Hi Andrea,

Where can I get the code?

Thanks. Dmitry.

Hey Dmitry,

The bigint-libtommath branch was merged back into the bigint branch since I figured there was no point keeping them separate, even if the LibTomMath backend isn’t quite complete.

So, the pull request is here: https://github.com/php/php-src/pull/876

Or, the branch directly: https://github.com/TazeTSchnitzel/php-src/tree/bigint

When configuring, you can use —-enable-bigint-gmp to use GMP for bigints. Otherwise it will use LibTomMath. GMP is probably faster, and it has all operations implemented (I still need to do bitwise ops for LibTomMath). For GMP, you’ll need to have the library installed.

Thanks.

Andrea Faulds
http://ajf.me/

10 years ago by Dmitry Stogov — view source

unread

Oh, it's still in draft state.
Too may extensions are missing ext/seesion, ext/json, ext/pdo.
Only very simple tests may be done now, and they can't predict impact on
real-life applications.

Thanks. Dmitry.

Hi Andrea,

Where can I get the code?

Thanks. Dmitry.

Hey Dmitry,

The bigint-libtommath branch was merged back into the bigint branch since
I figured there was no point keeping them separate, even if the LibTomMath
backend isn’t quite complete.

So, the pull request is here: https://github.com/php/php-src/pull/876

Or, the branch directly:
https://github.com/TazeTSchnitzel/php-src/tree/bigint

When configuring, you can use —-enable-bigint-gmp to use GMP for bigints.
Otherwise it will use LibTomMath. GMP is probably faster, and it has all
operations implemented (I still need to do bitwise ops for LibTomMath). For
GMP, you’ll need to have the library installed.

Thanks.

Andrea Faulds
http://ajf.me/

10 years ago by Pierre Joye — view source

unread

Oh, it's still in draft state.
Too may extensions are missing ext/seesion, ext/json, ext/pdo.
Only very simple tests may be done now, and they can't predict impact on
real-life applications.

We may as well try to help here.

This patch is anything we want but simple. I really do not want to see
Andrea going down the pain we had with the 64bit patch. So let
organize ourselves to avoid that.

Step 1:

Which extensions do we consider as critical to actually get a clue
about the impact?

I see session, standard ( ;) ), json on top of my head. Which other?

Let help Andrea to port these exts and do the other once we know if
the RFC is accepted or not.

Cheers,
Pierre

10 years ago by Dmitry Stogov — view source

unread

ext/session and ext/json are required by most apps.
Actually I stopped attempts to build it when I saw compilation errors in
ext/session.

Thanks. Dmitry.

Oh, it's still in draft state.
Too may extensions are missing ext/seesion, ext/json, ext/pdo.
Only very simple tests may be done now, and they can't predict impact on
real-life applications.

We may as well try to help here.

This patch is anything we want but simple. I really do not want to see
Andrea going down the pain we had with the 64bit patch. So let
organize ourselves to avoid that.

Step 1:

Which extensions do we consider as critical to actually get a clue
about the impact?

I see session, standard ( ;) ), json on top of my head. Which other?

Let help Andrea to port these exts and do the other once we know if
the RFC is accepted or not.

Cheers,
Pierre

10 years ago by Pierre Joye — view source

unread

ext/session and ext/json are required by most apps.

Right.

The question is: Do you see any other we must have before discussing
that any further?

10 years ago by Dmitry Stogov — view source

unread

I don't know which ones are supported or not.
Of course we need some extension to connect to database mysql, mysqli or
pdo_mysql.

Thanks. Dmitry.

ext/session and ext/json are required by most apps.

Right.

The question is: Do you see any other we must have before discussing
that any further?

10 years ago by Andrea Faulds — view source

unread

Hey Dmitry,

ext/session and ext/json are required by most apps.
Actually I stopped attempts to build it when I saw compilation errors in ext/session.

Thanks. Dmitry.

Oh dear, does ext/session not build? :/

So far I've only built the branch with --disable-all.

In the case of most extensions, the main source of compilation errors will be changes to certain Zend Engine functions. In particular, is_numeric_string_ex needs to support bigints now and has an extra parameter. I don't think I changed very many other functions.

Porting extensions should for the most part be relatively simple. Most extensions are just sets of functions and use zpp. If they're using the 'l' specifier (Z_PARAM_LONG) they'll continue to work. In most cases there is no need to update an 'l' parameter to support bigints. The length of a string can't exceed PHP's max integer size, for example. Of course, there are some functions where it would have a clear benefit to add bigint support.

The main problem with extensions is 'z'

Oh, it's still in draft state.
Too may extensions are missing ext/seesion, ext/json, ext/pdo.
Only very simple tests may be done now, and they can't predict impact on
real-life applications.

We may as well try to help here.

This patch is anything we want but simple. I really do not want to see
Andrea going down the pain we had with the 64bit patch. So let
organize ourselves to avoid that.

Step 1:

Which extensions do we consider as critical to actually get a clue
about the impact?

I see session, standard ( ;) ), json on top of my head. Which other?

Let help Andrea to port these exts and do the other once we know if
the RFC is accepted or not.

Cheers,
Pierre

10 years ago by Dmitry Stogov — view source

unread

BTW: why not to wrap big integers into special IS_OBJECT?
It would keep everything working out of the box (without BIGINT), and would
allow to eliminate more than half of unnecessary changes.

In the past we made similar decision for closures.

Thanks. Dmitry.

Hey Dmitry,

ext/session and ext/json are required by most apps.
Actually I stopped attempts to build it when I saw compilation errors in
ext/session.

Thanks. Dmitry.

Oh dear, does ext/session not build? :/

So far I've only built the branch with --disable-all.

In the case of most extensions, the main source of compilation errors will
be changes to certain Zend Engine functions. In particular,
is_numeric_string_ex needs to support bigints now and has an extra
parameter. I don't think I changed very many other functions.

Porting extensions should for the most part be relatively simple. Most
extensions are just sets of functions and use zpp. If they're using the 'l'
specifier (Z_PARAM_LONG) they'll continue to work. In most cases there is
no need to update an 'l' parameter to support bigints. The length of a
string can't exceed PHP's max integer size, for example. Of course, there
are some functions where it would have a clear benefit to add bigint
support.

The main problem with extensions is 'z'

On Thu, Jan 15, 2015 at 10:44 AM, Pierre Joye pierre.php@gmail.com
wrote:

Oh, it's still in draft state.
Too may extensions are missing ext/seesion, ext/json, ext/pdo.
Only very simple tests may be done now, and they can't predict impact on
real-life applications.

We may as well try to help here.

This patch is anything we want but simple. I really do not want to see
Andrea going down the pain we had with the 64bit patch. So let
organize ourselves to avoid that.

Step 1:

Which extensions do we consider as critical to actually get a clue
about the impact?

I see session, standard ( ;) ), json on top of my head. Which other?

Let help Andrea to port these exts and do the other once we know if
the RFC is accepted or not.

Cheers,
Pierre

10 years ago by Andrea Faulds — view source

unread

Hi Dmitry,

BTW: why not to wrap big integers into special IS_OBJECT?
It would keep everything working out of the box (without BIGINT), and would allow to eliminate more than half of unnecessary changes.

In the past we made similar decision for closures.

In retrospect that might have been a good idea. Though objects can't quite do everything our primitive types can. To get bigints to work that way, you'd need to improve the support for objects a lot. You'd still need to update virtually every zval-accepting extension. The signature of is_numeric_string_ex would still have to change. You would need to make constants support objects, too. You'd still need to change a lot of things, unfortunately.

At this stage, switching to using objects is probably a waste of time.

Thanks.

--
Andrea Faulds
http://ajf.me/

Thanks. Dmitry.

Hey Dmitry,

ext/session and ext/json are required by most apps.
Actually I stopped attempts to build it when I saw compilation errors in ext/session.

Thanks. Dmitry.

10 years ago by Andrea Faulds — view source

unread

Hey Dmitry,

ext/session and ext/json are required by most apps.
Actually I stopped attempts to build it when I saw compilation errors in ext/session.

Thanks. Dmitry.

Oh dear, does ext/session not build? :/

So far I've only built the branch with --disable-all.

In the case of most extensions, the main source of compilation errors will be changes to certain Zend Engine functions. In particular, is_numeric_string_ex needs to support bigints now and has an extra parameter. I don't think I changed very many other functions.

Porting extensions should for the most part be relatively simple. Most extensions are just sets of functions and use zpp. If they're using the 'l' specifier (Z_PARAM_LONG) they'll continue to work. In most cases there is no need to update an 'l' parameter to support bigints. The length of a string can't exceed PHP's max integer size, for example. Of course, there are some functions where it would have a clear benefit to add bigint support.

The main problem with extensions is 'z'

(Sorry, accidentally sent too early)

The main problem with most extensions is the 'z' format specifier which accepts any value. If it accepts IS_LONG then it needs to accept IS_BIGINT too. In many cases you can just convert the bigint to a long and maybe reject it or wrap it if it won't fit, if the function doesn't need to support large integers:

case IS_BIGINT:
if (!zend_bigint_can_fit_long(Z_BIG_P(some_zval))) {
zend_error(E_WARNING, "$some_zval too large");
RETURN_FALSE;
} else {
lval = zend_bigint_to_long(Z_BIG_P(some_zval));
}
break;

Something like that would work in most cases. There is also convert_to_long.

I probably should have focussed more on extension support, maybe I'll start try to port some of them, there's not that much Zend stuff left to do really. I would have ported ext/json, but there's now the jsond RFC.

Any help would be appreciated. I am panicking a bit as there's not that long to go before PHP7 feature freeze, assuming Zeev's timetable is actually followed. Though I think this feature should be doable: as I said, there's not much Zend stuff left to do, and most extensions should be quite simple to port.

Thanks.

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Hey everyone,

Anatol (aka welting) has done some excellent work, and a lot more extensions now build on the bigint branch, even if not all of them are fully ported:

https://wiki.php.net/rfc/bigint#todo

This should mean that testing “real-world applications” for performance is now possible.

Thanks!

--
Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Hi,

Hey everyone,

Anatol (aka welting) has done some excellent work, and a lot more extensions now build on the bigint branch, even if not all of them are fully ported:

https://wiki.php.net/rfc/bigint#todo

This should mean that testing “real-world applications” for performance is now possible.

I’m a little worried that nobody has responded to this yet. Feature freeze is looming… :(

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Is this the right approach to implement BIGINT?
I don't see the use of GMP to implement something as simple as native 64
bit numbers on 64 bit platforms as the right base.

Um, we already have this since the 64-bit patch.

All we are missing is
correctly handling two word data on a 32bit platform. The wrapping and
everything still applies, but only on a 32 bit platform ... no need for
the complication of GMP. The bit I'm looking at here IS using BIGINT as
array keys without the problems of them changing to long strings and I
don't see how GMP fixes that?

What you want is 64-bit data handling. This is arbitrary-bit data handling. It’s not a “wrong approach”.

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

What you want is 64-bit data handling. This is arbitrary-bit data handling. It’s not a “wrong approach”.

So BIGINT on 32 bit platforms will be different to BIGINT on 64 bit
platforms? BIGINT is a fix length number not a variable one …

“Bigints” typically refer to arbitrary-size integers, that is, their size is bounded only by the amount of RAM available.

I don’t know what you think a “bigint” is, but it’s different to everyone else.

--
Andrea Faulds
http://ajf.me/

10 years ago by Pierre Joye — view source

unread

What you want is 64-bit data handling. This is arbitrary-bit data
handling. It’s not a “wrong approach”.

So BIGINT on 32 bit platforms will be different to BIGINT on 64 bit
platforms? BIGINT is a fix length number not a variable one …
“Bigints” typically refer to arbitrary-size integers, that is, their
size is bounded only by the amount of RAM available.

I don’t know what you think a “bigint” is, but it’s different to
everyone else.

PLEASE rename the page name for this and stop using BIGINT for it ...

BIGINT is the SQL99-compliant 64-bit signed integer type

http://www.firebirdsql.org/refdocs/langrefupd25-bigint.html
http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
http://dev.mysql.com/doc/refman/5.5/en/integer-types.html

BIGINT is very much an 8 byte data value which we have been struggling
with on PHP for some time. Now that it's available in 64bit builds, we
need a simple transparent way to maintain that in 32bit builds ...

If you are proposing BigInteger that is something else
http://docs.oracle.com/javase/7/docs/api/java/math/BigInteger.html and
GMP already provides that. Miking it up with the BIGINT standard which
is something we DO need is the problem here ...

No.

The problem is on the other end of the wire. Big integer concept, in a
programming language context, or libraries (openssl, Crypto++, etc), is
pretty clear.

Can we now move to discussions about the implementations and open
questions? That will be much more useful.

10 years ago by Lester Caine — view source

unread

On Oct 11, 2014 4:14 PM, "Lester Caine" <lester@lsces.co.uk
mailto:lester@lsces.co.uk> wrote:

What you want is 64-bit data handling. This is arbitrary-bit
data handling. It’s not a “wrong approach”.

So BIGINT on 32 bit platforms will be different to BIGINT on 64 bit
platforms? BIGINT is a fix length number not a variable one …
“Bigints” typically refer to arbitrary-size integers, that is, their
size is bounded only by the amount of RAM available.

I don’t know what you think a “bigint” is, but it’s different to
everyone else.

PLEASE rename the page name for this and stop using BIGINT for it ...

BIGINT is the SQL99-compliant 64-bit signed integer type

http://www.firebirdsql.org/refdocs/langrefupd25-bigint.html
http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
http://dev.mysql.com/doc/refman/5.5/en/integer-types.html

BIGINT is very much an 8 byte data value which we have been struggling
with on PHP for some time. Now that it's available in 64bit builds, we
need a simple transparent way to maintain that in 32bit builds ...

If you are proposing BigInteger that is something else
http://docs.oracle.com/javase/7/docs/api/java/math/BigInteger.html and
GMP already provides that. Miking it up with the BIGINT standard which
is something we DO need is the problem here ...

No.

The problem is on the other end of the wire. Big integer concept, in a
programming language context, or libraries (openssl, Crypto++, etc), is
pretty clear.

Can we now move to discussions about the implementations and open
questions? That will be much more useful.

BIGINT is a cleanly defined concept and something we have had to cope
with for some time since PHP does not support 64 bit integers cleanly.
Now that 64 bit builds support a clean 64 bit integer, the problem
arises that 32 bit builds will handle this in the old way.

I'll be quite frank,, and the idea of BigInteger was not something I was
even aware of but it does NOT use 'bigint' as a shorthand for it, so
mixing the two concepts up does not make any sense!

What needs tidying up here is the confusion being created for those of
us who use 'BIGINT' day in day out. GMP provides larger range integer
values, while the proposed rfc is ONLY describing handling 64 bit wrap
around ... which is what is needed to tidy up bigint handling on 32bit
platforms. There are two conflicting requirements which this rfc is
mixing up. Creating an unlimited integer size via GMP should correctly
fix all of the knock on effects such as shift on the whole integer
rather than just 64bits of it. While on the other hand, we need a clean
bigint alternative for 32bit platforms to mirror that on 64bit builds
... which does not involve GMP

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by johannes@schlueters.de — view source

unread

BIGINT is a cleanly defined concept and something we have had to cope
with for some time since PHP does not support 64 bit integers cleanly.
Now that 64 bit builds support a clean 64 bit integer, the problem
arises that 32 bit builds will handle this in the old way.

    Arbitrary-precision arithmetic
    From Wikipedia, the free encyclopedia
      (Redirected from Bigint)
    http://en.wikipedia.org/wiki/Bigint
    
    In computer science, arbitrary-precision arithmetic, also called
    bignum arithmetic, multiple precision arithmetic, or sometimes
    infinite-precision arithmetic, indicates that calculations are
    performed on numbers whose digits of precision are limited only
    by the available memory of the host system. This contrasts with
    the faster fixed-precision arithmetic found in most arithmetic
    logic unit (ALU) hardware, which typically offers between 8 and
    64 bits of precision
    [...]

But that's bikesheding, if we like we can call it also
yellow-blue-striped birds with red dots. Please discuss the contents not
the painting. Thanks.

johannes

10 years ago by Lester Caine — view source

unread

But that's bikesheding, if we like we can call it also
yellow-blue-striped birds with red dots. Please discuss the contents not
the painting. Thanks.

I AM discussing the content ... it will not fix the problem that was
ORIGINALLY being discussed it just changes to yet another abstract
concept which can not be used as a 'bigint' index on arrays in 32bit
builds of PHP ...

It may be that we now need two rfc's one for BigInteger and one to fix
the problem of BIGINT ...

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Rowan Collins — view source

unread

it will not fix the problem that was
ORIGINALLY being discussed

Your assuming that Andrea's RFC is a response to that particular
discussion, rather than simply assuming that it's a valid discussion in
its own right.

It's perfectly possible to have two discussions at once, even about
slightly-related things.

--
Rowan Collins
[IMSoP]

10 years ago by Rowan Collins — view source

unread

BIGINT is the SQL99-compliant 64-bit signed integer type

It's a matter of context. In C, and therefore in related discussions
(which includes the internals of PHP), integers are referred to as
"short" (for 16-bit), "long" (for 32-bit) and "long long" (for 64-bit),
but never as "big". SQL is unusual in calling a 32-bit integer "BigInt"
rather than some variant of "long" or "32", and since we're not
discussing databases here, it's of only marginal relevance.

I can see why, having spent more time in SQL than C, you might jump to
the wrong meaning of "BigInt", and it might even be worth considering
this potential confusion when writing the end-user documentation for
this new feature, should it be implemented.

But, as others have said, that's really not an issue which should
dominate the discussion at this stage.

--
Rowan Collins
[IMSoP]

10 years ago by Lester Caine — view source

unread

BIGINT is the SQL99-compliant 64-bit signed integer type

It's a matter of context. In C, and therefore in related discussions
(which includes the internals of PHP), integers are referred to as
"short" (for 16-bit), "long" (for 32-bit) and "long long" (for 64-bit),
but never as "big". SQL is unusual in calling a 32-bit integer "BigInt"
rather than some variant of "long" or "32", and since we're not
discussing databases here, it's of only marginal relevance.

I can see why, having spent more time in SQL than C, you might jump to
the wrong meaning of "BigInt", and it might even be worth considering
this potential confusion when writing the end-user documentation for
this new feature, should it be implemented.

But, as others have said, that's really not an issue which should
dominate the discussion at this stage.

There were only two things on my wish list for PHP6 ... Unicode and
proper handling of BIGINT. The use of a 64 bit value in databases as the
primary key has been long established and a related 64 bit element is
timestamp, While the timestamp can be handled as two 32 bit values (on
some databases!), the primary key is a 64 bit number. Many database
engines started using 64 bit builds years ago and have native 64bit
integers, but still handle that on 32bit builds as well. Bringing that
key into PHP has always been something of a gamble as to how it will be
represented, but the database drivers do seem to line up now.

You are right that I had miss interpreted Andrea's use of bigint but I
have always commented on it in the context of using a 64bit integer key
for arrays. In relation to the CURRENT discussion, if a 32bit build is
going to default to a 'BigInteger' string rather than a simple 64bit
value then there is a problem. I would rather THAT did not happen, but
it still leaves the problem of 64bit keys for arrays in 32bit builds.
There are many reasons why a 32bit primary key was limiting in databases
and while I have no doubt at some point it will be upgraded to 128bit, a
64bit value for timestamps and record counts will last long after most
of even you youngsters have past away? BigInteger is a PHP8 problem
64bit array keys is the PHP7 one!

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Andrea Faulds — view source

unread

If a 64bit build of PHP is using a simple integer key for a BIGINT key
from the database, what will be the equivalent on a 32bit build?

It may be that we have to add code to the DB drivers to ensure that
BIGINT remains a standard string conversion on both platforms in order
to maintain consistent results. One being a simple integer key and the
other a GMP based key IS a problem when the key is constructed from
other shifted elements. This may be no more complicated than happens
now, but is a real life situation that needs a consistent result.

So, we’re talking about array keys, right? Well, strings sometimes become integer keys, and sometimes become string keys in arrays. This is actually the same behaviour the RFC and patch currently have for bigints: if it’s in the range of a long, it’s an integer key, otherwise a string key.

So the handling would actually be the same as now for array keys.

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

The real life situation is that databases have used a 64bit integer as
the primary key for records in a table for a long time now. Loading
these records into arrays using the primary key as the array key is a
natural process but as you have identified currently once they go over a
32 bit value they switch from simple integer to strings. Upgrading PHP
to natively support 64bit integers on 64bit platforms but leaving them
as 32bit on 32bit platforms is creating an inconsistency which this rfc
seems to be making more complex rather than less. This why I was simply
looking for a 64bit integer type that works as a simple integer on 32bit
platforms as well. GMP is overkill for this simple case and only needed
when one actually needs integers bigger than 64bit, and normal
programming does not need that in PHP7, just a clean 64bit integer.
Currently we can do any necessary 64bit maths on the keys in the
database, but for cross database working a consistent solution inside
PHP would help.

Well, you also get 64-bit support for free with GMP. Maybe it’s “overkill”, but it does solve your use case.

If you really want to go and add 64-bit emulation to 32-bit builds, be my guest. But nobody’s gone and done that.

Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Just to break this down a bit:

The RFC should probably aim to make array keys consistent across platforms. This is currently left as an unanswered question in the RFC, but it seems natural to include it if the stated aim is to eliminate inconsistencies.

It’s something I do have an answer for, but it’ll be a separate RFC which I need to write a patch for and have pass before this one.

By using an unlimited type, we open the door to constructing "integer"-keyed arrays from 128-bit or larger sources. We also eliminate weirdness encountered at the boundaries, such as $foo[PHP_MAX_INT] = 'hello'; $foo[] = 'world'; - see http://3v4l.org/69mif

That’s something which I would handle in the aforementioned not-yet-existant RFC.

On the other hand, an implementation might be possible that allowed 64-bit integers on 32-bit systems, but nothing more; this would potentially be both simpler and more efficient.

More efficient, as in faster? Sure, that’d be faster. Simpler? Not really. The main difficulty in this patch is updating thousands of tests and functions. The basics of it (the new type, changes to operators) are very easy.

Implementing both is probably a bad idea, as some core code would then need to handle 3 different types of integer (native, emulated 64-bit, and GMP), but it could be looked at as an alternative if the current RFC is rejected on complexity or licensing grounds.

As all bigint ops are abstracted, you could modify the patch to implement 64-bit integers if you wanted. Granted, they’d be unnecessarily allocated their own memory, but it’d be doable.

--
Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Hi Lester,

Andrea ... I am still unhappy that this is being pushed as the core
'integer' handling. Further options have appeared for 32bit devices
which are driving me to seriously consider perhaps having to look into a
simple 64bit maths solution to BIGINT on 32 bit devices. But simply
getting interbase extension compiling clean with master is enough work
without having to add other speculative work that is further detracting
from the real job of getting into a state where I could even use PHP7 in
production. Just what work IS required to build this in parallel with my
almost working PHP7 development platform ... will all the third party
extensions also need work, such as imagick which I still don't have a
clean copy off yet.

Yes, most extensions will need some updating, but from what I have seen, the changes needed are very small (maybe a few lines) in most cases, unless you’re porting ext/gmp or something.

Since a clean 64bit build of PHP does not need anything other than
'integer' to support 64bit BIGINT SQL numbers, loading 32bit builds with
an overly heavy solution is just not right!

I don’t see how it’s “overly heavy”. Bear in mind that several extensions (not just ext/gmp) already require GMP anyway.

'longint' only needs an
extension to provide it, which can then be replaced by properly crafted
code for devices that already have 256bit and better maths capability
anyway.

GMP provides high-performance, optimised SIMD assembly implementations for most platforms.

--
Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

Hi Lester,

Since a clean 64bit build of PHP does not need anything other than

'integer' to support 64bit BIGINT SQL numbers, loading 32bit builds with
an overly heavy solution is just not right!
I don’t see how it’s “overly heavy”. Bear in mind that several extensions (not just ext/gmp) already require GMP anyway.

libgmp.so is 538.6kb
gmp.so add a further 242.1kb

You will have to elaborate on what else is reliant on it for a normal
PHP build.

cURL and other things that use GnuTLS: https://bugs.php.net/bug.php?id=63595

Also, the bigint branch doesn’t require ext/gmp to be enabled.

I've only JUST installed any of it on the development machine
to get those sizes and gmp extension is not yet enabled.

Well, you don’t actually need GMP anyway. The patch also bundles a lightweight, pure C89 bigint library called LibTomMath which will be built if you don’t ask to use GMP explicitly. It’s nowhere near as fast as GMP, but it’s liberally-licensed, lightweight, etc.

THAT just seems rather heavy to just support a pair of 32bit integers
that the database extension has already loaded and is handling as a
single object ... on a 32bit platform.

Even just a “pair of 32-bit integers” requires a significant effort. You need to bundle some sort of library to deal with it.

'longint' only needs an
extension to provide it, which can then be replaced by properly crafted
code for devices that already have 256bit and better maths capability
anyway.
GMP provides high-performance, optimised SIMD assembly implementations for most platforms.

Yes GMP does have some ARM support, but it may not be the most economic
solution. ARM does have it's own more compact libraries for the same
functionality and replacing the 780kb gmp option by something smaller
should be an option, rather than making the rest of the core dependent
on it.

If you wish to go to the hassle of adding a bigint backend specifically optimised for ARM, you’re free to.

But I don’t consider 0.25MB extra to be such a problem in practice. The PHP binary is already huge, and every system running PHP will have ample memory.

Thanks.

--
Andrea Faulds
http://ajf.me/

10 years ago by marius adrian popa — view source

unread

I agree with Andrea's points , also GMP is also used in ruby 2.1 with good
results https://bugs.ruby-lang.org/issues/8796

ps: LibTomMath seems to be used in firebird also

Hi Lester,

Since a clean 64bit build of PHP does not need anything other than

'integer' to support 64bit BIGINT SQL numbers, loading 32bit builds
with
an overly heavy solution is just not right!
I don’t see how it’s “overly heavy”. Bear in mind that several
extensions (not just ext/gmp) already require GMP anyway.

libgmp.so is 538.6kb
gmp.so add a further 242.1kb

You will have to elaborate on what else is reliant on it for a normal
PHP build.

cURL and other things that use GnuTLS:
https://bugs.php.net/bug.php?id=63595

Also, the bigint branch doesn’t require ext/gmp to be enabled.

I've only JUST installed any of it on the development machine
to get those sizes and gmp extension is not yet enabled.

Well, you don’t actually need GMP anyway. The patch also bundles a
lightweight, pure C89 bigint library called LibTomMath which will be built
if you don’t ask to use GMP explicitly. It’s nowhere near as fast as GMP,
but it’s liberally-licensed, lightweight, etc.

THAT just seems rather heavy to just support a pair of 32bit integers
that the database extension has already loaded and is handling as a
single object ... on a 32bit platform.

Even just a “pair of 32-bit integers” requires a significant effort. You
need to bundle some sort of library to deal with it.

'longint' only needs an
extension to provide it, which can then be replaced by properly
crafted
code for devices that already have 256bit and better maths capability
anyway.
GMP provides high-performance, optimised SIMD assembly implementations
for most platforms.

Yes GMP does have some ARM support, but it may not be the most economic
solution. ARM does have it's own more compact libraries for the same
functionality and replacing the 780kb gmp option by something smaller
should be an option, rather than making the rest of the core dependent
on it.

If you wish to go to the hassle of adding a bigint backend specifically
optimised for ARM, you’re free to.

But I don’t consider 0.25MB extra to be such a problem in practice. The
PHP binary is already huge, and every system running PHP will have ample
memory.

Thanks.

--
Andrea Faulds
http://ajf.me/

10 years ago by Andrea Faulds — view source

unread

But I don’t consider 0.25MB extra to be such a problem in practice. The PHP binary is already huge, and every system running PHP will have ample memory.

Yes one approach is 'computers are getting faster with lots of memory'
... and for servers this is not a problem ... they will more than
likely be 64bit anyway! But for smaller embedded devices php IS
becoming an option so I don't have to program in C or something else,
and then we look to strip everything that does not need to be present.

Sure, but I don’t think we shouldn’t cripple the language merely for the sake of really low-end embedded devices. Also, I’m not convinced that the overhead, at least in terms of file size, is really that big of an issue.

Just for you, I’ve gone and compiled the bigint branch (with LibTomMath) and master on my machine:

$ ls -l php7-*
-rwxr-xr-x 1 ajf staff 6400408 3 Feb 16:39 php7-bigint
-rwxr-xr-x 1 ajf staff 6248920 3 Feb 16:42 php7-master

The difference is a mere 151488 B, or 151 KB.

Is that really so bad?

--
Andrea Faulds
http://ajf.me/

10 years ago by Martin Keckeis — view source

unread

Am 03.02.2015 17:44 schrieb "Andrea Faulds" ajf@ajf.me:

But I don’t consider 0.25MB extra to be such a problem in practice.
The PHP binary is already huge, and every system running PHP will have
ample memory.

Yes one approach is 'computers are getting faster with lots of memory'
... and for servers this is not a problem ... they will more than
likely be 64bit anyway! But for smaller embedded devices php IS
becoming an option so I don't have to program in C or something else,
and then we look to strip everything that does not need to be present.

Sure, but I don’t think we shouldn’t cripple the language merely for the
sake of really low-end embedded devices. Also, I’m not convinced that the
overhead, at least in terms of file size, is really that big of an issue.

Just for you, I’ve gone and compiled the bigint branch (with LibTomMath)
and master on my machine:

$ ls -l php7-*
-rwxr-xr-x 1 ajf staff 6400408 3 Feb 16:39 php7-bigint
-rwxr-xr-x 1 ajf staff 6248920 3 Feb 16:42 php7-master

The difference is a mere 151488 B, or 151 KB.

Is that really so bad?

--
Andrea Faulds
http://ajf.me/

--

Please get this mayor feature finally into the core....
In the current century a real 64bit support is not discussable anymore...

10 years ago by Lester Caine — view source

unread

Please get this mayor feature finally into the core....
In the current century a real 64bit support is not discussable anymore...

Martin this has NOTHING to do with getting 64 bit support into core.
That has already been achieved by the introduction of 64 bit builds.

What this is about is modifying integer support so that when an integer
gets bigger than 64bits on a 64bit platform, it automatically folds to a
gmp object bidden in the background. My problem is with that process
happening when a 32bit integer becomes a gmp object on a 32bit platform.
Under some conditions that SHOULD overflow as it does currently, but in
other conditions you want a 64 bit number that overflows at 64 bits even
on the 32bit platform.

As Leigh has also identified, handling of wrap around in these cases is
equally important. 'Arbitrary integer math' has a place - fine, but most
exist in parallel with fixed integer maths.

--
Lester Caine - G8HFL

Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

10 years ago by Mike Willbanks — view source

unread

But I don’t consider 0.25MB extra to be such a problem in practice. The
PHP binary is already huge, and every system running PHP will have ample
memory.

Yes one approach is 'computers are getting faster with lots of memory'
... and for servers this is not a problem ... they will more than
likely be 64bit anyway! But for smaller embedded devices php IS
becoming an option so I don't have to program in C or something else,
and then we look to strip everything that does not need to be present.

Sure, but I don’t think we shouldn’t cripple the language merely for the
sake of really low-end embedded devices. Also, I’m not convinced that the
overhead, at least in terms of file size, is really that big of an issue.

Just for you, I’ve gone and compiled the bigint branch (with LibTomMath)
and master on my machine:

$ ls -l php7-*
-rwxr-xr-x 1 ajf staff 6400408 3 Feb 16:39 php7-bigint
-rwxr-xr-x 1 ajf staff 6248920 3 Feb 16:42 php7-master

The difference is a mere 151488 B, or 151 KB.

Is that really so bad?

I would take 1MB if I had to so that I could have this in core. I work
with them everyday and the pain of having to deal with them as strings is a
royal pain. It would only become worse if this does not get into core and
type hints do as the mess would be drastic for any systems that must handle
64bit integers across the board. It's not useful to have to always go to
gmp to handle numbers of this type. It is simply not realistic.

On top of that, we use embedded systems everyday. I have 6 devices sitting
in front of me right now. Think of a GUID whereas they have large integer
internal representations and a hex representation for human readable. On
top of this, we also have beaglebones and raspberry pi's that have internal
memory capacities where this easily fits and flash storage. There is no
reason that this feature should be held back for embedded systems or even
system on a chip.

Regards,

Mike

10 years ago by Anatol Belski — view source

unread

Hi Lester,

But I don’t consider 0.25MB extra to be such a problem in practice. The

PHP binary is already huge, and every system running PHP will have ample
memory.

Yes one approach is 'computers are getting faster with lots of memory'
... and for servers this is not a problem ... they will more than
likely be 64bit anyway! But for smaller embedded devices php IS becoming
I would like to ask, what those embedded devices are.

Also I would like to state that it's hard to part the concerns you
express. Arbitrary integer math is the daily bread of any modern
programming language. Furthermore, from the test done so far there's no
much visible impact to the present operation.

Regards

Anatol

10 years ago by Andrea Faulds — view source

unread

Sure, but I don’t think we shouldn’t cripple the language merely for the sake of really low-end embedded devices. Also, I’m not convinced that the overhead, at least in terms of file size, is really that big of an issue.

'I don’t think we should cripple' ?

There are two views on the handling of integers. Obviously it would be
nice if there was no limit to the size of a number and there are
situations where that is indeed useful. However there are equally
situations where both the natural 32bit and 64bit limits of the target
hardware and software needs to be observed and these require a different
method of handling things like 'overflow’.

Maybe. But PHP has never cared about those things. We promote to float when integers would “overflow” in another language.

Simply automatically
upgrading an integer value to a more complex object depending on what
hardware one is running on adds the overhead of having to create code to
disable things depending on some hardware flag.

Not really. We already do promotion for integers to float, and it’s a very simple operation to check for overflow, natively supported by many compilers, and we also have a pure C fallback. All that bigints do is that instead of promoting to float, we promote to bigints. There’s no new overhead, these checks already existed, and have done for well over a decade.

The only complexity bigints add are the addition of the new IS_BIGINT type (which goes in your operator type matrices etc.), and requiring some bigint library.

With the bulk of SQL persistent data having to manage both 32 and 64bit
integer limits and the matching float/numeric limits, a system of
working which mirrors that would naturally seem to be the sensible
default.

Why should it match SQL? Why should we have the complexity of two different sizes of integer? Why not just have a single, unified arbitrary-precision integer type, like the RFC proposes.

If those limitations have been avoided by the use of additional
libraries, then a matching additional library in PHP also comes into play.

Currently we have a problem with the size of integers, but simply
ignoring that there are limits is not the may to fix that problem.

This RFC doesn’t ignore that there are limits. Arbitrary-precision integers are, naturally, bounded by available RAM (including the request memory limit).

--
Andrea Faulds
http://ajf.me/

10 years ago by Rowan Collins — view source

unread

Sure, but I don’t think we shouldn’t cripple the language merely for
the sake of really low-end embedded devices. Also, I’m not convinced
that the overhead, at least in terms of file size, is really that big
of an issue.

'I don’t think we should cripple' ?

There are two views on the handling of integers. Obviously it would be
nice if there was no limit to the size of a number and there are
situations where that is indeed useful. However there are equally
situations where both the natural 32bit and 64bit limits of the target
hardware and software needs to be observed and these require a
different
method of handling things like 'overflow'. Simply automatically
upgrading an integer value to a more complex object depending on what
hardware one is running on adds the overhead of having to create code
to
disable things depending on some hardware flag.

With the bulk of SQL persistent data having to manage both 32 and 64bit
integer limits and the matching float/numeric limits, a system of
working which mirrors that would naturally seem to be the sensible
default. If those limitations have been avoided by the use of
additional
libraries, then a matching additional library in PHP also comes into
play.

Currently we have a problem with the size of integers, but simply
ignoring that there are limits is not the may to fix that problem.

I don't think a high-level language should expose the details of integer representation of the server you install it on any more than it exposes memory management. A user might need to know limitations of the implementation, but they shouldn't be relying on details of it.

Nor should you be relying on your DB server having the same limits as your PHP server, so you'd need to check for 32-bit overflow anyway to install on a 64-bit system.

Making the PHP side predictably support arbitrary precision would seem to remove special cases from userland, not add them. It shunts them into C code, which does all sorts of icky low-level stuff we'd like to ignore already.

--
Rowan Collins
[IMSoP]

[RFC] Big Integer Support

Thanks!

What do you think?

-- Lester Caine - G8HFL

Assuming I actually get round to updating the DB drivers.

You seem to misunderstand what my proposal is. It doesn't add any additional overflow, it just changes from using separate type masquerading as int to using objects. All the rest stays the same.

The point is the degree to which they can act the same. Objects can only go so far.

Thoughts?

Thanks!

So common scripts are either unaffected, or will run ever-so-slightly faster.

Thanks!

Thanks!

Thanks!

Thanks.

Thanks.

Thanks.

I’m a little worried that nobody has responded to this yet. Feature freeze is looming… :(

What you want is 64-bit data handling. This is arbitrary-bit data handling. It’s not a “wrong approach”.

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

-- Lester Caine - G8HFL

So the handling would actually be the same as now for array keys.

If you really want to go and add 64-bit emulation to 32-bit builds, be my guest. But nobody’s gone and done that.

-- Lester Caine - G8HFL

--
Lester Caine - G8HFL

You seem to misunderstand what my proposal is. It doesn't add any
additional overflow, it just changes from using separate type
masquerading as int to using objects. All the rest stays the same.

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL

--
Lester Caine - G8HFL