Hi internals,
I was wondering if the maintainers of the Opcache Jit would be open to work on optimizing functions such as intdiv()
, fdiv(), and spl_object_id()
in the JIT to assembly code when it was safe to do so.
These functions have simple C implementations, so the performance overhead of calling PHP functions in the generated assembly
would be noticeable compared to emitting optimized assembly.
I'd expect that it'd be safe to do so under the following circumstances:
- All arguments are Compiled Variables(CV) in the opcodes (i.e. $var)
(http://nikic.github.io/2017/04/14/PHP-7-Virtual-machine.html#variable-types) - The types and count of the function arguments are known to be strictly correct, e.g.
- Opcache infers they will not throw a TypeError or emit undefined variable warnings
- maybe strictly accept floats but not integers for fdiv
- The return value of the expression won't throw
(e.g. don't perform this optimization for$y = spl_object_id($x)
when a CV is the return value) - The function in question exists and is not disabled.
For example, these operands would be seen by the JIT for this sequence of opcodes
/*
0000 CV0($x) = RECV 1
0001 INIT_FCALL 1 96 string("spl_object_id")
0002 SEND_VAR CV0($x) 1
0003 V2 = DO_ICALL
0004 T1 = INIT_ARRAY 1 (packed) CV0($x) V2
0005 RETURN T1
*/
function create_set(stdClass $x) : array {
return [spl_object_id($x) => $x];
}
I expect it to be technically feasible to check for the sequence of opcodes INIT_FCALL, SEND_VAR (repeated), and DO_ICALL, in ext/opcache/jit/zend_jit.c. The resulting assembly would be smaller and much faster.
(this would be done before emitting any assembly - skip over the opcodes for INIT_ARRAY and SEND_VAR and DO_ICALL)
- What do the maintainers of the JIT module think of this idea?
(e.g. are there concerns about this making it harder to understand the JIT codebase, adding potential bugs, or slowing down the generation of assembly) - Are there any general guidelines (or talks/articles) you'd recommend for new contributors to the Opcache JIT? I couldn't find a README and https://wiki.php.net/rfc/jit doesn't seem to make recommendations that apply to my question.
// core fdiv C implementation (does not throw for a divisor of 0)
RETURN_DOUBLE(dividend / divisor);
// core of spl_object_id C implementation
RETURN_LONG((zend_long)Z_OBJ_HANDLE_P(obj));
I started looking into this because I work on an application that heavily uses spl_object_id()
and would see a small performance benefit from this,
and was reminded of this when looking at the get_resource_id() proposal.
(https://github.com/phan/phan/search?q=spl_object_id&unscoped_q=spl_object_id)
Thanks,
- Tyson
On Sat, Apr 25, 2020 at 6:44 PM tyson andre tysonandre775@hotmail.com
wrote:
Hi internals,
I was wondering if the maintainers of the Opcache Jit would be open to
work on optimizing functions such asintdiv()
, fdiv(), andspl_object_id()
in the JIT to assembly code when it was safe to do so.
These functions have simple C implementations, so the performance overhead
of calling PHP functions in the generated assembly
would be noticeable compared to emitting optimized assembly.I'd expect that it'd be safe to do so under the following circumstances:
- All arguments are Compiled Variables(CV) in the opcodes (i.e. $var)
(
http://nikic.github.io/2017/04/14/PHP-7-Virtual-machine.html#variable-types
)- The types and count of the function arguments are known to be strictly
correct, e.g.
- Opcache infers they will not throw a TypeError or emit undefined
variable warnings- maybe strictly accept floats but not integers for fdiv
- The return value of the expression won't throw
(e.g. don't perform this optimization for$y = spl_object_id($x)
when
a CV is the return value)- The function in question exists and is not disabled.
For example, these operands would be seen by the JIT for this sequence of
opcodes/* 0000 CV0($x) = RECV 1 0001 INIT_FCALL 1 96 string("spl_object_id") 0002 SEND_VAR CV0($x) 1 0003 V2 = DO_ICALL 0004 T1 = INIT_ARRAY 1 (packed) CV0($x) V2 0005 RETURN T1 */ function create_set(stdClass $x) : array { return [spl_object_id($x) => $x]; }
I expect it to be technically feasible to check for the sequence of
opcodes INIT_FCALL, SEND_VAR (repeated), and DO_ICALL, in
ext/opcache/jit/zend_jit.c. The resulting assembly would be smaller and
much faster.
(this would be done before emitting any assembly - skip over the opcodes
for INIT_ARRAY and SEND_VAR and DO_ICALL)
- What do the maintainers of the JIT module think of this idea?
(e.g. are there concerns about this making it harder to understand the
JIT codebase, adding potential bugs, or slowing down the generation of
assembly)- Are there any general guidelines (or talks/articles) you'd recommend for
new contributors to the Opcache JIT? I couldn't find a README and
https://wiki.php.net/rfc/jit doesn't seem to make recommendations that
apply to my question.// core fdiv C implementation (does not throw for a divisor of 0) RETURN_DOUBLE(dividend / divisor); // core of spl_object_id C implementation RETURN_LONG((zend_long)Z_OBJ_HANDLE_P(obj));
I started looking into this because I work on an application that heavily
usesspl_object_id()
and would see a small performance benefit from this,
and was reminded of this when looking at the get_resource_id() proposal.
(
https://github.com/phan/phan/search?q=spl_object_id&unscoped_q=spl_object_id
)Thanks,
- Tyson
Hi Tyson,
Our general approach to this is to first add a VM opcode for the operation,
which will also provide a benefit if the JIT is not used. There's already
plenty of those, see ZEND_STRLEN for example. Adding JIT support for the
opcode would then be the natural second step.
There aren't any hard rules for when this should be done, but I believe
having some evidence that the operation is both common and performance
critical would be good.
Nikita
Hi Nikita,
Our general approach to this is to first add a VM opcode for the operation,
which will also provide a benefit if the JIT is not used.
There's already plenty of those, see ZEND_STRLEN for example.
Adding JIT support for the opcode would then be the natural second step.
I've seen those - I'd assume there'd be a long tail of functions such as preg_match()
without references, spl_object_id()
, substr()
, etc, that are called frequently, but not often enough to have their own opcodes.
What about adding a new opcode types that combines INIT_FCALL, SEND_VAR (or SEND_VAL for constants), and DO_ICALL for known internal functions?
Has that been proposed before?
This could be used when the arguments were inferred by opcache to have correct types.
Old (in function where $x is known to be a non-reference object):
0001 INIT_FCALL 1 96 string("spl_object_id")
0002 SEND_VAR CV0($x) 1
0003 V2 = DO_ICALL
0004 T1 = INIT_ARRAY 1 (packed) CV0($x) V2
0005 RETURN T1
New (after opcache):
0001 T1 = INIT_AND_DO_ICALL_1_ARG $x // op1=$x, op2 is unused, extended_value = enum value corresponding to `spl_object_id()`
0002 RETURN T1
And fdiv() could become INIT_AND_DO_ICALL_2_ARG $x $y op1=$x, op2 is $y, extended_value = enum value corresponding to fdiv
This should give opcache the information it needs to continue tracking the definitions and uses of variables, as well as function return types.
- I'd hope there'd be a performance improvement, due to not needing to look up zend_function, using the native C stack for zvals instead of a dynamic stack, perform param type checks, etc.
This would only accept constants and compiled variables of known good types for functions that wouldn't emit notices or cause re-entry - temporary values (e.g. spl_object_id(some_function())) wouldn't be accepted.
The C implementation for the new opcode could be a switch on extended_value with enum values for dozens of commonly used functions
that weren't common enough for individual opcodes.
Thanks,
- Tyson