Hi,
while refining the big string support, it turned out that we've an issue.
The syntax like $s[42] = 'x'; is currently inconsistend, because we have
uint32 for string offsets. This actually means, the behaviour is currently
only available in the old style and can handle not more than 2gb big
strings.
Also discussed with Laruence on IRC and he actually expressed the concern
that we pay overhead on that. From my side I was investigating on that and
could suggest several solutions for that:
- stay with the old behavior (indexes would be able to handle only 2gb
strings, this is the status quo) - implement a function like in JS String.charAt() as alternative
- turn to some temp_variable solution we currently have in PHP5. Laruence
told be that dropping temp_variable was one of the improvementes.
Actually, the string index functionality is utilized in two opcodes, so
maybe that were just a local case.
Anyway not talking about touching zval, as that would grow by 8 bytes with
a size_t str_offset. Just maybe there were another solution I oversee?
Regards
Anatol
Hi,
while refining the big string support, it turned out that we've an issue.
The syntax like $s[42] = 'x'; is currently inconsistend, because we have
uint32 for string offsets. This actually means, the behaviour is currently
only available in the old style and can handle not more than 2gb big
strings.Also discussed with Laruence on IRC and he actually expressed the concern
that we pay overhead on that. From my side I was investigating on that and
could suggest several solutions for that:
- stay with the old behavior (indexes would be able to handle only 2gb
strings, this is the status quo)- implement a function like in JS String.charAt() as alternative
- turn to some temp_variable solution we currently have in PHP5. Laruence
told be that dropping temp_variable was one of the improvementes.
Actually, the string index functionality is utilized in two opcodes, so
maybe that were just a local case.Anyway not talking about touching zval, as that would grow by 8 bytes with
a size_t str_offset. Just maybe there were another solution I oversee?
We don’t need to actually support >2GB string indexing realistically. As I understand it, we’re using size_t because it’s the proper type for string lengths, not because we need >2GB strings.
I’d just leave things as they are… though I suppose there might be some benefit to switching to size_t for string offsets. Does that avoid a cast in the generated assembly?
--
Andrea Faulds
http://ajf.me/
Hi,
while refining the big string support, it turned out that we've an
issue.
The syntax like $s[42] = 'x'; is currently inconsistend, because we
have uint32 for string offsets. This actually means, the behaviour is
currently
only available in the old style and can handle not more than 2gb big
strings.Also discussed with Laruence on IRC and he actually expressed the
concern
that we pay overhead on that. From my side I was investigating on that
and
could suggest several solutions for that:
- stay with the old behavior (indexes would be able to handle only 2gb
strings, this is the status quo) - implement a function like in JS
String.charAt() as alternative- turn to some temp_variable solution we currently have in PHP5.
Laruence
told be that dropping temp_variable was one of the improvementes.
Actually, the string index functionality is utilized in two opcodes, so
maybe that were just a local case.Anyway not talking about touching zval, as that would grow by 8 bytes
with
a size_t str_offset. Just maybe there were another solution I oversee?
We don’t need to actually support >2GB string indexing realistically. As
I
understand it, we’re using size_t because it’s the proper type for string
lengths, not because we need >2GB strings.
The goal of the 64 bit RFC was to make everything work consistent across
64 bit platforms. That involved size_t and dynamic integer, for big
strings and integers. Now that we've merged it some parts might differ
from the original RFC. Whereby security and correct types are as well in
the background.
I’d just leave things as they are… though I suppose there might be some
benefit to switching to size_t for string offsets. Does that avoid a cast
in the generated assembly?
My point is just that it should be consistent as we have it, in less or or
prefferably more plausible way. Though i didn't quite understand the
question about the compiled assembly - you mean whether it'll be binary
compatible with the previous bins? then no.
Regards
Anatol
I’d just leave things as they are… though I suppose there might be some
benefit to switching to size_t for string offsets. Does that avoid a cast
in the generated assembly?
My point is just that it should be consistent as we have it, in less or or
prefferably more plausible way. Though i didn't quite understand the
question about the compiled assembly - you mean whether it'll be binary
compatible with the previous bins? then no.
If I recall correctly, one of the reasons to use size_t for string lengths was avoid casts in the generated machine code. That’s what I was referring to: on 64-bit platforms, you’d need to do a 32-bit to 64-bit cast.
--
Andrea Faulds
http://ajf.me/
I’d just leave things as they are… though I suppose there might be
some benefit to switching to size_t for string offsets. Does that
avoid a
cast
in the generated assembly?
My point is just that it should be consistent as we have it, in less oror
prefferably more plausible way. Though i didn't quite understand the
question about the compiled assembly - you mean whether it'll be binary
compatible with the previous bins? then no.If I recall correctly, one of the reasons to use size_t for string
lengths was avoid casts in the generated machine code. That’s what I was
referring to: on 64-bit platforms, you’d need to do a 32-bit to 64-bit
cast.
Generally this is true - when only 64 bit types are used, then twice fewer
operations are necessary. Despite that, some cases might show 64 bit
slower than 32 bit because of cache misses. Still, highly dependent on
concrete case. With size_t it's additionally the security as it cannot be
overflown.
Currently, we still mix 32 and 64 bit integers to reduce the memory
consumption on the internal structures and keep the efficiency. So using a
very fine comb to touch the places really relevant.
Regards
Anatol
Hi,
while refining the big string support, it turned out that we've an issue.
The syntax like $s[42] = 'x'; is currently inconsistend, because we have
uint32 for string offsets. This actually means, the behaviour is currently
only available in the old style and can handle not more than 2gb big
strings.Also discussed with Laruence on IRC and he actually expressed the concern
that we pay overhead on that. From my side I was investigating on that and
could suggest several solutions for that:
- stay with the old behavior (indexes would be able to handle only 2gb
strings, this is the status quo)
I think it's okey. maybe throw a warning if it's bigger than 32bits?
if a string is bigger than 2^32... I think there must be a bug.... :)
thanks
- implement a function like in JS String.charAt() as alternative
- turn to some temp_variable solution we currently have in PHP5. Laruence
told be that dropping temp_variable was one of the improvementes.
Actually, the string index functionality is utilized in two opcodes, so
maybe that were just a local case.Anyway not talking about touching zval, as that would grow by 8 bytes with
a size_t str_offset. Just maybe there were another solution I oversee?Regards
Anatol
--
Xinchen Hui
@Laruence
http://www.laruence.com/
On Fri, Aug 29, 2014 at 11:49 PM, Anatol Belski anatol.php@belski.net
wrote:Hi,
while refining the big string support, it turned out that we've an
issue. The syntax like $s[42] = 'x'; is currently inconsistend, because
we have uint32 for string offsets. This actually means, the behaviour is
currently only available in the old style and can handle not more than
2gb big
strings.Also discussed with Laruence on IRC and he actually expressed the
concern that we pay overhead on that. From my side I was investigating
on that and could suggest several solutions for that:
- stay with the old behavior (indexes would be able to handle only 2gb
strings, this is the status quo)I think it's okey. maybe throw a warning if it's bigger than 32bits?
if a string is bigger than 2^32... I think there must be a bug.... :)
Only for this case you mean, or generally? As we safe with memory_limit
anyway.
Regards
anatol
if a string is bigger than 2^32... I think there must be a bug.... :)
Only for this case you mean, or generally? As we safe with memory_limit
anyway.
Even though size_t allows "huge" strings, would it be so bad to throw an
error when one tries to create a string longer than 2^32 bytes,
regardless of memory_limit?
Cheers
Matteo Beccati
Development & Consulting - http://www.beccati.com/
if a string is bigger than 2^32... I think there must be a bug.... :)
Only for this case you mean, or generally? As we safe with memory_limit
anyway.Even though size_t allows "huge" strings, would it be so bad to throw an
error when one tries to create a string longer than 2^32 bytes,
regardless of memory_limit?
This would be an unnecessary and somewhat arbitrary limitation. Yes,
loading >2GB of anything into memory in PHP probably makes no
sense (probably 99% of real-world installations have memory_limit set
much lower than this anyway), but just because we cannot think of a
valid use case does not mean there isn't one.
If the string index deref issue cannot be solved, it can simply be
documented as only working for offsets up to 2^31, but I personally am
not in favour of imposing limitations on the size of a string for a
non-technical reason.
Thanks, Chris
Even though size_t allows "huge" strings, would it be so bad to throw an
error when one tries to create a string longer than 2^32 bytes,
regardless of memory_limit?This would be an unnecessary and somewhat arbitrary limitation. Yes,
loading >2GB of anything into memory in PHP probably makes no
sense (probably 99% of real-world installations have memory_limit set
much lower than this anyway), but just because we cannot think of a
valid use case does not mean there isn't one.If the string index deref issue cannot be solved, it can simply be
documented as only working for offsets up to 2^31, but I personally am
not in favour of imposing limitations on the size of a string for a
non-technical reason.
Well, the only use case I can think of for strings >2GB involves
modifying it in place. Crazy, but valid nonetheless. Too bad string
offsets can't be used for the part exceeding the 2GB limit ;)
Non-working string offsets to me sounds like a technical reason and not
something arbitrary.
Cheer
Matteo Beccati
Development & Consulting - http://www.beccati.com/
Even though size_t allows "huge" strings, would it be so bad to throw
an
error when one tries to create a string longer than 2^32 bytes,
regardless of memory_limit?This would be an unnecessary and somewhat arbitrary limitation. Yes,
loading >2GB of anything into memory in PHP probably makes no
sense (probably 99% of real-world installations have memory_limit set
much lower than this anyway), but just because we cannot think of a
valid use case does not mean there isn't one.If the string index deref issue cannot be solved, it can simply be
documented as only working for offsets up to 2^31, but I personally am
not in favour of imposing limitations on the size of a string for a
non-technical reason.Well, the only use case I can think of for strings >2GB involves
modifying it in place. Crazy, but valid nonetheless. Too bad string
offsets can't be used for the part exceeding the 2GB limit ;)Non-working string offsets to me sounds like a technical reason and not
something arbitrary.
And the tech reason is? I do not know it yet but if it is only about using
another type in the implementation it will not have any impact. But if it
has an impact, let forget it, users should rely on stream then.
Cheers,
On Fri, Aug 29, 2014 at 11:49 PM, Anatol Belski anatol.php@belski.net
wrote:Hi,
while refining the big string support, it turned out that we've an
issue. The syntax like $s[42] = 'x'; is currently inconsistend, because
we have uint32 for string offsets. This actually means, the behaviour is
currently only available in the old style and can handle not more than
2gb big
strings.Also discussed with Laruence on IRC and he actually expressed the
concern that we pay overhead on that. From my side I was investigating
on that and could suggest several solutions for that:
- stay with the old behavior (indexes would be able to handle only 2gb
strings, this is the status quo)I think it's okey. maybe throw a warning if it's bigger than 32bits?
Here's a patch http://belski.net/phpz/64_str_offset.patch fixing the the
64 bit offsets. Please take a look, maybe we'd have a perspective with it.
So far the test suite tells no difference.
Thanks
Anatol
Hi Anatol,
Thanks!
For what I see it should have no impact, either mem usage or perf but when
such offset is used, in 64bit.
However some numbers are better, could you provide some using exclusively
this syntax and using common apps pls? Only to be sure.
Cheers,
Pierre
On Fri, Aug 29, 2014 at 11:49 PM, Anatol Belski anatol.php@belski.net
wrote:Hi,
while refining the big string support, it turned out that we've an
issue. The syntax like $s[42] = 'x'; is currently inconsistend, because
we have uint32 for string offsets. This actually means, the behaviour is
currently only available in the old style and can handle not more than
2gb big
strings.Also discussed with Laruence on IRC and he actually expressed the
concern that we pay overhead on that. From my side I was investigating
on that and could suggest several solutions for that:
- stay with the old behavior (indexes would be able to handle only 2gb
strings, this is the status quo)I think it's okey. maybe throw a warning if it's bigger than 32bits?
Here's a patch http://belski.net/phpz/64_str_offset.patch fixing the the
64 bit offsets. Please take a look, maybe we'd have a perspective with it.
So far the test suite tells no difference.Thanks
Anatol
Hi Pierre,
Hi Anatol,
Thanks!
For what I see it should have no impact, either mem usage or perf but
when such offset is used, in 64bit.However some numbers are better, could you provide some using exclusively
this syntax and using common apps pls? Only to be sure.
while testing drupal and wp, i see a regression of .3% on mem usage and
.1% on perf. When using the exclusive syntax, i see a perf difference
almost in ms range, so even not worth to mention. The functional part
shows no diff so far.
Regards
Anatol
Hi Pierre,
Hi Anatol,
Thanks!
For what I see it should have no impact, either mem usage or perf but
when such offset is used, in 64bit.However some numbers are better, could you provide some using
exclusively this syntax and using common apps pls? Only to be sure.while testing drupal and wp, i see a regression of .3% on mem usage and
.1% on perf. When using the exclusive syntax, i see a perf difference
almost in ms range, so even not worth to mention. The functional part shows
no diff so far.
Laruence on the IRC suggested to implement this as a new zval type, so I'm
going to try it out yet as that's a cleaner way do go.
Regards
Anatol
Hi Anatol,
what do you mean? heap allocated structure?
I think, it's not a good option :(
I didn't have time to think about this yet.
Thanks. Dmitry.
Hi Pierre,
Hi Anatol,
Thanks!
For what I see it should have no impact, either mem usage or perf but
when such offset is used, in 64bit.However some numbers are better, could you provide some using
exclusively this syntax and using common apps pls? Only to be sure.while testing drupal and wp, i see a regression of .3% on mem usage and
.1% on perf. When using the exclusive syntax, i see a perf difference
almost in ms range, so even not worth to mention. The functional part
shows
no diff so far.Laruence on the IRC suggested to implement this as a new zval type, so I'm
going to try it out yet as that's a cleaner way do go.Regards
Anatol
Hi Dmitry,
Hi Anatol,
what do you mean? heap allocated structure? I think, it's not a good
option :(I didn't have time to think about this yet.
I thought about creating an extended zend_string struct with an appended
offset member. Pointer to that structure could be injected into
_zend_value and were compatible with the normal zend_string and would be
used only for the offset case. But then, we could get rid of u2.str_offset
and change the meaning of the IS_STR_OFFSET related macros.
That struct could be initialized the same way as any other in an opcode.
However I have yet to figure out where in the compiler this has to happen
(somewhere near the zend_compile_assign() i suppose).
Regards
Anatol
Hi Dmitry,
Hi Anatol,
what do you mean? heap allocated structure? I think, it's not a good
option :(I didn't have time to think about this yet.
I thought about creating an extended zend_string struct with an appended
offset member. Pointer to that structure could be injected into _zend_value
and were compatible with the normal zend_string and would be used only for
the offset case. But then, we could get rid of u2.str_offset and change
the meaning of the IS_STR_OFFSET related macros.That struct could be initialized the same way as any other in an opcode.
However I have yet to figure out where in the compiler this has to happen
(somewhere near the zend_compile_assign() i suppose).
So, the approach described here isn't worky. While trying to implement it,
I saw that it's not possible to isolate the exact case on compile time. A
syntax like $s[0] = 's' can be related to too many things and on
compilation time are abstract. Say $s = 'abc' is put somewhere in the
script, but $s[0] = 'd' can be right the next statement or x lines later.
This kind of things can be only catched on execution time.
And, while working on that, I've realized that usage of
zend_assign_to_string_offset() in the ASSIGN opcode is a dead case. So
effectively, the only opcode where it has to be handled is ASSIGN_DIM.
Also, I've rethought the solution - previously I was trying to integrate
some overloaded struct to carry the correct offset. After I saw that's
only one place where the actualy offset assignment can happen, I've
changed the tactics. Now, reading the offset directly on execution time
and passing to zend_assign_to_string_offset(). This one has no impact
neither in perf nor in memory. No new structs or heap allocations, just a
slightly modified flow.
http://belski.net/phpz/64_str_offset_better.patch
Please take a look. I'd suggest to give this last patch a try and would
apply this WE if there's no objection.
Regards
anatol