Hello internals,
For the past few weeks I have been refactoring the PDO extension to reduce the complexity of the implementation but also lower the memory requirement of the _pdo_stmt_t struct.
And in this process I rediscovered an oddity which prevents removing the usage of the FCI struct (which takes 64 bytes) from it.
The constructor arguments for the class are passed as a PHP array to PDOStatement::setFetchMode(), PDOStatement::fetchAll(), and PDOStatment::fetchObject().
However, this array of parameters does not have the same semantics as the $args parameter of call_user_func_array()
. (CUFA for short.)
The reason for this is that it uses a rather unintuitive and obscure Zend API called zend_fcall_info_args_ex(),
this API sets the arguments of a call by copying each argument one after the other as a positional argument.
And if a zend_function pointer is provided, it will check if the function argument should be passed by reference and automatically wrap values into a reference without a warning.
This behaviour is at odds with passing a keyed array to CUFA, where the key will act as a named argument look-up, and with normal calling semantics where a by-ref argument being passed by value will trigger an E_WARNING.
Moreover, this API is effectively unused. A SourceGraph search [1] came up with 9 results, where only 2 of them are relevant:
- The UOPZ extension
- The Zephir project, which allows generating a PHP C extension from PHPesque code
- All other 7 projects are extensions generated by using Zephir
I have submitted PRs to Zephir [2] and UOPZ [3] that have been merged that remove the usage of this API by instead passing the HashTable pointer of the array as the named_params field of the FCI.
Thus, the only remaining usage of this API I can find is now in PDO.
This conversion of using the named_params field of the FCI for calling the constructor can also be done for PDO,
allowing the removal of the FCI from _pdo_stmt_t, convert the ctor_arg zval (which takes 16 bytes) to a HashTable* (taking only 8 bytes), and remove various copies of the constructor arguments.
Doing this causes some minor BC breaks as this aligns the behaviour with CUFA which can be seen in the latest PR implementation. [4]
It is possible to emulate the previous behaviour, but this requires doing a deep copy of the constructor arguments, negating that benefit.
To me, these minor BC breaks are acceptable as:
- by-ref arguments in a constructor seems already rare, even more so in an object that represents a database row
- having an argument array behave differently than CUFA or what you would receive with variadic arugments is somewhat a language inconsistency.
Thoughts?
For reference, this is effectively the same point I raised in a previous internals thread, which didn't really get much attention. [5]
Best regards,
Gina P. Banyard
[1] https://sourcegraph.com/search?q=context:global+lang:C+zend_fcall_info_args_ex+-file:zend_API.h+-file:zend_API.c+-file:pdo_stmt.c&patternType=keyword&sm=0
[2] Zephir PR: https://github.com/zephir-lang/zephir/pull/2443
[3] UOPZ PR: https://github.com/krakjoe/uopz/pull/188
[4] https://github.com/php/php-src/pull/17427
[5] https://externals.io/message/118847
Hi Gina,
Your proposal makes sense to me. I agree that the behaviour should be
consistent.
If I understand correctly, an associative array currently binds only
to positional arguments. If you make this change, how will it interact
with PDO::FETCH_PROPS_LATE? Am I correct in assuming that this will
also be a BC break? Will the values coming from the DB take over if no
matching param is found in the constructor array?
Regards,
Kamil