Runtime JIT Proposals

18 years ago by Sara Golemon — view source — reply

unread

For reasons best left on IRC, it looks like I'll be working on runtime
JIT. To that end, I've come up with a few proposals of varying
complexity and feature-set completeness:

Option 1:
Dump support for compile-time JIT and replace it with a call at runtime
using the same semantics.

Advantages: No change in the API (well, no further change anyway,
Unicode support pretty much guarantees that things will change regardless).

Disadvantages: Could someone be relying on compile-time JIT for
something already? Maybe activation triggers an action which has to
take place prior to script execution? For what I've seen JIT isn't in
heavy use, but my perceptions on the topic aren't definitive.

Option 2:
Leave compile-time JIT alone, and add a second callback for runtime JIT.

Advantages: Doesn't break BC, and offers extensions the chance to know
that the code contains autoglobal references without actually having to
act on them unless they're needed.

Disadvantages: Adds to complexity/confusion by having two separate
callbacks for essentially the same action.

Option 3:
Extend JIT signature with a "stage" parameter to indicate if the JIT
trigger is occuring during compile-time or run-time. The callback can
decide when/if it performs processing using current return value
disarming semantics.

Option 4:
Include fetchtype and subelement during runtime JIT callback allowing
JIT callback to only do work necessary to prepare for the read/write
call being performed.

e.g.

int php_example_jit_callback(int str_type, zstr str, int str_len,
int stage, zval *container, int fetch_type, zval *element);

Where str_type/str/str_len indicate the name of the variable being
JITed, stage is one of COMPILETIME or RUNTIME, container is the
autoglobal itself.
Fetch_type is ZEND_FETCH_(DIM_|OBJ_)?_(R|W|RW), and element is the
specific property/offset (only applicable for DIM/OBJ fetches, NULL for
plain fetch.

Advantages: Gives maximum flexibility to the implementation. In the
case of http request encoding, it allows the decoder to differentiate
between requests for a single element and fetches which want to retreive
the entire array (e.g. foreach).

Disadvantages: Adds a lot of complexity to the fetching of autoglobals
and qand effectively doubles the amount of callback work being done for
autoglobal objects. Will also confuse implementers on what the
difference between this fetch callback is and the
(read|write)_(dimension|property) callbacks used by objects.

In response to the suggestion to just turn $_REQUEST (et.al.) into
objects with overloaded array access, the big danger there is that the
following behavior would change:

$postdata = $_REQUEST;
foreach($postdata as $idx => $val) {
$postdata[$idx] = some_filter_func($val);
}

Were $_REQUEST turned into an object with overloaded array access, these
changes to $postdata would modify the values in the original $_REQUEST
(due to the reference-like behavior of PHP5+ objects).

Personally, I like Option 4, but then I like complexity. I can
certainly see going for any of the others, but I want to go with
something that the rest of the group can see being appropriately useful.

If I can get something approaching a semi-consensus on direction, I can
have an implementation (or a couple, depending on feelings on the
matter) in a few days.

-Sara

18 years ago by Andi Gutmans — view source — reply

unread

Hi Sara,

Sorry but I wasn't on IRC so I don't quite understand what you're trying to accomplish ;)
Can you please explain? Once I understand what you're trying to accomplish I'll be more than happy to provide feedback to the
options list.

Thanks,

Andi

-----Original Message-----
From: Sara Golemon [mailto:pollita@php.net]
Sent: Sunday, January 14, 2007 8:25 PM
To: internals@lists.php.net
Subject: [PHP-DEV] Runtime JIT Proposals

For reasons best left on IRC, it looks like I'll be working
on runtime JIT. To that end, I've come up with a few
proposals of varying complexity and feature-set completeness:

Option 1:
Dump support for compile-time JIT and replace it with a call
at runtime using the same semantics.

Advantages: No change in the API (well, no further change
anyway, Unicode support pretty much guarantees that things
will change regardless).

Disadvantages: Could someone be relying on compile-time JIT
for something already? Maybe activation triggers an action
which has to take place prior to script execution? For what
I've seen JIT isn't in heavy use, but my perceptions on the
topic aren't definitive.

Option 2:
Leave compile-time JIT alone, and add a second callback for
runtime JIT.

Advantages: Doesn't break BC, and offers extensions the
chance to know that the code contains autoglobal references
without actually having to act on them unless they're needed.

Disadvantages: Adds to complexity/confusion by having two
separate callbacks for essentially the same action.

Option 3:
Extend JIT signature with a "stage" parameter to indicate if the JIT
trigger is occuring during compile-time or run-time. The
callback can
decide when/if it performs processing using current return value
disarming semantics.

Option 4:
Include fetchtype and subelement during runtime JIT callback allowing
JIT callback to only do work necessary to prepare for the read/write
call being performed.

e.g.

int php_example_jit_callback(int str_type, zstr str, int str_len,
int stage, zval *container, int fetch_type, zval *element);

Where str_type/str/str_len indicate the name of the variable being
JITed, stage is one of COMPILETIME or RUNTIME, container is the
autoglobal itself.
Fetch_type is ZEND_FETCH_(DIM_|OBJ_)?_(R|W|RW), and element is the
specific property/offset (only applicable for DIM/OBJ
fetches, NULL for
plain fetch.

Advantages: Gives maximum flexibility to the implementation. In the
case of http request encoding, it allows the decoder to differentiate
between requests for a single element and fetches which want
to retreive
the entire array (e.g. foreach).

Disadvantages: Adds a lot of complexity to the fetching of
autoglobals
and qand effectively doubles the amount of callback work
being done for
autoglobal objects. Will also confuse implementers on what the
difference between this fetch callback is and the
(read|write)_(dimension|property) callbacks used by objects.

In response to the suggestion to just turn $_REQUEST (et.al.) into
objects with overloaded array access, the big danger there is
that the
following behavior would change:

$postdata = $_REQUEST;
foreach($postdata as $idx => $val) {
$postdata[$idx] = some_filter_func($val);
}

Were $_REQUEST turned into an object with overloaded array
access, these
changes to $postdata would modify the values in the original
$_REQUEST
(due to the reference-like behavior of PHP5+ objects).

Personally, I like Option 4, but then I like complexity. I can
certainly see going for any of the others, but I want to go with
something that the rest of the group can see being
appropriately useful.

If I can get something approaching a semi-consensus on
direction, I can
have an implementation (or a couple, depending on feelings on the
matter) in a few days.

-Sara

18 years ago by Rasmus Lerdorf — view source — reply

unread

Sara Golemon wrote:

For reasons best left on IRC, it looks like I'll be working on runtime
JIT. To that end, I've come up with a few proposals of varying
complexity and feature-set completeness:

Option 1:
Dump support for compile-time JIT and replace it with a call at runtime
using the same semantics.

Advantages: No change in the API (well, no further change anyway,
Unicode support pretty much guarantees that things will change regardless).

Disadvantages: Could someone be relying on compile-time JIT for
something already? Maybe activation triggers an action which has to
take place prior to script execution? For what I've seen JIT isn't in
heavy use, but my perceptions on the topic aren't definitive.

I have a feeling this won't break much, if anything, but I am not sure
this is the best approach for Unicode encoding (see my response to
Option 4).

Option 2:
Leave compile-time JIT alone, and add a second callback for runtime JIT.

Advantages: Doesn't break BC, and offers extensions the chance to know
that the code contains autoglobal references without actually having to
act on them unless they're needed.

Disadvantages: Adds to complexity/confusion by having two separate
callbacks for essentially the same action.

What would compile-time JIT do here? Just create a bunch of binary
elements that are then overwritten at runtime with the encoded elements
on access? This doesn't seem like a good idea either as the
compile-time version would almost always be completely redundant,
wouldn't it?

Option 3:
Extend JIT signature with a "stage" parameter to indicate if the JIT
trigger is occuring during compile-time or run-time. The callback can
decide when/if it performs processing using current return value
disarming semantics.

I think we'd confuse people with that. We should pick one and stick
with it.

Option 4:
Include fetchtype and subelement during runtime JIT callback allowing
JIT callback to only do work necessary to prepare for the read/write
call being performed.

I like this approach. Getting right down to the individual GPC entries
avoids what could potentially be crippling overhead iterating through a
lot of fields which may never be used. It also solves the issue of what
to do in case of a conversion error. When you convert an entire array
at once as current compile-time JIT does, what happens when a single
entry has a conversion error? How do you propogate the error to the
user? And what if the error is on an element the user doesn't care
about? In fact, a bad guy could simply add random elements full of
bogus data to trigger these errors. By taking this approach we avoid
these poisonous entries and any encoding errors can be reported back to
the user right when they happen. When you toss error handling into the
mix I don't think this is the most complex solution as you indicated. I
think this actually simplifies things a lot.

-Rasmus

18 years ago by Pierre — view source — reply

unread

Hello Sara,

For reasons best left on IRC, it looks like I'll be working on runtime
JIT. To that end, I've come up with a few proposals of varying
complexity and feature-set completeness:

Option 1:
Dump support for compile-time JIT and replace it with a call at runtime
using the same semantics.

Advantages: No change in the API (well, no further change anyway,
Unicode support pretty much guarantees that things will change regardless).

Disadvantages: Could someone be relying on compile-time JIT for
something already? Maybe activation triggers an action which has to
take place prior to script execution? For what I've seen JIT isn't in
heavy use, but my perceptions on the topic aren't definitive.

As I told you, there was already a consensus on this solution, check
my initial proposal (solution #2):

http://news.php.net/php.internals/26965

In response to the suggestion to just turn $_REQUEST (et.al.) into
objects with overloaded array access, the big danger there is that the
following behavior would change:

It will bring a BC break as well or is_array($arrayaccessobject) will
have to return true and we have to be sure about its implementation
(like properties access not always working well).

Personally, I like Option 4, but then I like complexity. I can
certainly see going for any of the others, but I want to go with
something that the rest of the group can see being appropriately useful.

I like my initial proposal. All it needs is an extra function and to
move the JIT management to runtime. The complexity is the same as what
we have now.

--Pierre

18 years ago by Andrei Zmievski — view source — reply

unread

I like Option 4.

-Andrei

Option 4:
Include fetchtype and subelement during runtime JIT callback
allowing JIT callback to only do work necessary to prepare for the
read/write call being performed.

e.g.

int php_example_jit_callback(int str_type, zstr str, int str_len,
int stage, zval *container, int fetch_type, zval *element);

Where str_type/str/str_len indicate the name of the variable being
JITed, stage is one of COMPILETIME or RUNTIME, container is the
autoglobal itself.
Fetch_type is ZEND_FETCH_(DIM_|OBJ_)?_(R|W|RW), and element is the
specific property/offset (only applicable for DIM/OBJ fetches, NULL
for plain fetch.

Advantages: Gives maximum flexibility to the implementation. In
the case of http request encoding, it allows the decoder to
differentiate between requests for a single element and fetches
which want to retreive the entire array (e.g. foreach).

Disadvantages: Adds a lot of complexity to the fetching of
autoglobals and qand effectively doubles the amount of callback
work being done for autoglobal objects. Will also confuse
implementers on what the difference between this fetch callback is
and the (read|write)_(dimension|property) callbacks used by objects.

In response to the suggestion to just turn $_REQUEST (et.al.) into
objects with overloaded array access, the big danger there is that
the following behavior would change:

$postdata = $_REQUEST;
foreach($postdata as $idx => $val) {
$postdata[$idx] = some_filter_func($val);
}

Were $_REQUEST turned into an object with overloaded array access,
these changes to $postdata would modify the values in the original
$_REQUEST (due to the reference-like behavior of PHP5+ objects).

Personally, I like Option 4, but then I like complexity. I can
certainly see going for any of the others, but I want to go with
something that the rest of the group can see being appropriately
useful.

If I can get something approaching a semi-consensus on direction, I
can have an implementation (or a couple, depending on feelings on
the matter) in a few days.

-Sara

18 years ago by Sara Golemon — view source — reply

unread

Option 4:
Include fetchtype and subelement during runtime JIT callback allowing
JIT callback to only do work necessary to prepare for the read/write
call being performed.

e.g.

int php_example_jit_callback(int str_type, zstr str, int str_len,
int stage, zval *container, int fetch_type, zval *element);

Where str_type/str/str_len indicate the name of the variable being
JITed, stage is one of COMPILETIME or RUNTIME, container is the
autoglobal itself.
Fetch_type is ZEND_FETCH_(DIM_|OBJ_)?_(R|W|RW), and element is the
specific property/offset (only applicable for DIM/OBJ fetches, NULL for
plain fetch.

Advantages: Gives maximum flexibility to the implementation. In the
case of http request encoding, it allows the decoder to differentiate
between requests for a single element and fetches which want to retreive
the entire array (e.g. foreach).

Disadvantages: Adds a lot of complexity to the fetching of autoglobals
and qand effectively doubles the amount of callback work being done for
autoglobal objects. Will also confuse implementers on what the
difference between this fetch callback is and the
(read|write)_(dimension|property) callbacks used by objects.

Okay, in attempting an implementation of this, I got reminded
none-too-gently by the engine that it's not quite as simple as I'd
remembered:

<?php
$g = $_GET;
$f = $_POST['foo'];
?>

compiled vars: !0 = $g, !1 = $f
line # op fetch operands

2 0 FETCH_R global $0, '_GET'
1 ASSIGN $1, !0, $0
3 2 FETCH_R global $2, '_POST'
3 FETCH_DIM_R $3, $2, 'foo'
4 ASSIGN $4, !1, $3

Autoglobals aren't treated as CVs (as, for some reason, I was thinking
they were) so at the time of initial fetch, it's difficult to know if
the whole var is being fetched (as in the case of the line 2 assignment,
or if it's being fetched so that a subelement can be fetched later (as
in the case of the line 3 assignment).

The solution I'm tempted to pursue for this is to back up yet another
step and make autoglobals be CVs by extending the zend_compiled_variable
struct to contain a flag indicating how the var should be fetched (the
determination for which happens during fetch_simple_var during the
compilation. This would then yield an opcode stack like the following:

compiled vars: !0* = $_GET, !1 = $g, !2* = $_POST, !3 = $f
line # op fetch operands

2 0 ASSIGN $0, !1, !0*
3 1 FETCH_DIM_R $1, !2*, 'foo'
2 ASSIGN $2, !3, $1

(The * notation indicating that cv->fetch_type == ZEND_FETCH_GLOBAL)

Once that's applied (basicly as a stand-alone speed improvement, since
globals are turned into CV fetches), the RT-JIT can be done using the
plan I'd already formulated for Option 4. THEN we can apply
runtime-JIT to http input encoding detection.

Just keeping the conversation in the open and hoping anyone which
critiques will voice them sooner rather than later.

-Sara

Two steps forward, one step back.

18 years ago by Sara Golemon — view source — reply

unread

The solution I'm tempted to pursue for this is to back up yet another
step and make autoglobals be CVs by extending the zend_compiled_variable
struct to contain a flag indicating how the var should be fetched (the
determination for which happens during fetch_simple_var during the
compilation. This would then yield an opcode stack like the following:

Okay, here's a shockingly simple patch for allowing auto globals to be
treated as CVs. The one question mark I've got in here is: Why the last
check in fetch_simple_var_ex() for the ZEND_BEGIN_SILENCE opcode? This
seems completely unnecessary from what I can tell and shouldn't bar a
variable (global or not) from being treated as a CV...

Am I missing something really obvious?

-Sara

Runtime JIT Proposals

compiled vars: !0 = $g, !1 = $f line # op fetch operands

compiled vars: !0* = $_GET, !1 = $g, !2* = $_POST, !3 = $f line # op fetch operands

compiled vars: !0 = $g, !1 = $f
line # op fetch operands

compiled vars: !0* = $_GET, !1 = $g, !2* = $_POST, !3 = $f
line # op fetch operands