Hi,
I would like to start a discussion on a new RFC: "Add MariaDB-specific
features to mysqlnd and mysqli."
RFC:
https://wiki.php.net/rfc/mariadb_rfc
A proof of concept is available on my forked github-repository:
https://github.com/9EOR9/php-src/tree/8.6-mariadb
Benchmarks and results:
https://github.com/9EOR9/php-mariadb-rfc
/Georg
--
Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab
Hi,
I would like to start a discussion on a new RFC: "Add MariaDB-specific features to mysqlnd and mysqli."
RFC:
https://wiki.php.net/rfc/mariadb_rfcA proof of concept is available on my forked github-repository:
https://github.com/9EOR9/php-src/tree/8.6-mariadbBenchmarks and results:
https://github.com/9EOR9/php-mariadb-rfc/Georg
--
Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab
Hi Georg,
Thank you for presenting this RFC. I have a lot of comments:
First of all, regardless of what I am going to say next, I believe all
of these features should be treated in isolation and implemented in
separate PRs. It makes PR reviewing and git blaming easier, but also
performance changes can be tested in isolation.
Authentication Plugins:
You mention only one new plugin: parsec. Adding authentication plugins
does not require RFC. You can submit a separate PR without the RFC
process. Even if MySQL doesn't support the new method, adding a new
plugin isn't going to break anything.
Extended Metadata:
It looks like you have some formatting issues there. I don't
understand what this feature is, how to use it, nor how it will be
exposed. It needs more explanation.
In the table, you mention that it doesn't require any user-facing
changes, but it does. The returned structure is extended with two new
fields from what I can see. Additionally, the PoC changes existing
field values too.
Prepared Statement Metadata Caching:
In theory I am all for adding transparent improvements and they don't
usually require an RFC, but in this case, I don't understand how the
caching will be done. Are you saying the user will need to cache the
metadata somewhere? Is the cache shared across requests? How will the
cache be invalidated/cleaned?
From the implementation, it seems like the cache is only per prepared
statement, i.e. only the first execute() call will fetch the metadata.
If that's all it is then it can be made into a simple PR without RFC.
But in the table you say "Avoids resending metadata for repeated
prepares" which confuses me.
You also said:
Performance Improvements: Existing PHP functions that prepare and execute statements, such as mysql_execute_query() (introduced in PHP 8.2), could be updated to use execute_direct when connected to a MariaDB server. This would allow one-time statements to bypass the extra round-trip and metadata fetch, providing immediate performance gains.
Does this refer to caching? How does caching improve the performance
of a one-time executed prepared statement? If that's not what it
refers to, then what did you mean?
Progress Indication:
I'm quite worried about this one. PHP doesn't have native asynchronous
support yet. How do you intend for this to work when a built-in PHP
function can be suspended and possibly interrupted completely while
the MySQL is processing the request?
PDO Considerations:
Why is PDO out of scope? PDO is the main database extension in PHP. If
any new feature is added, then it should probably be exposed via PDO
first. What do you mean by "PDO_MySQL currently does not support array
binding or bulk execution semantics"?
Server capabilities:
I don't think it's more robust to detect MariaDB server by the lack of
CLIENT_LONG_PASSWORD than it is by the server_version. What if MySQL
decides to remove that flag in the future? What if derived products
don't follow the same convention?
Why is it necessary to expose this information to the PHP user via
server_capabilities and ext_server_capabilities? I can't find anywhere
that you explain what information is presented in these properties or
how to use it.
Memory Consumption:
Why is using generators going to use less memory? How does the new
function use generators that it's able to provide performance
improvements without reading all of the data from the generator
beforehand? This needs to be clearly explained so that it can be
explained in the PHP manual later. What is the difference between
these two:
$stmt->executemany("sd", $generator);
$stmt->executemany("sd", iterator_to_array($generator));
mysqli_stmt::executemany implementation:
You only mention the OO-style method in the RFC. Will the procedural
style counterpart also be implemented?
The name doesn't follow mysqli naming standard. It should be called
mysqli_stmt::execute_many. However, I wonder what made you land on
this name instead of mysqli_stmt::bulk_execute?
There should be no $types parameter IMHO. The data should either
always be treated as text, or it should be bound using the PHP type,
i.e. autodetected. I believe it was a mistake that bind_param() has a
$types parameter. All it serves is to cast the data silently into the
specified type. It has been a source of many silent bugs and a lot of
confusion. Additionally, giving users more type choices in
executemany() will not only create an inconsistency with bind_param(),
but also confuse PHP users who don't know if a variable is 8, or 16,
or 32, or 64-bit.
Return Values: Returns true on success or false on failure.
Please be explicit what the failure conditions may be.
How does get_result() work with executemany()? For example, if I
execute SELECT ?,? with [[1,2], [3,4]] will I get that same structure
in the result set? Is the sequence guaranteed? Is there a way to
distinguish which row came from which input data row? What if one
SELECT produces two rows and the other zero? Will it still be a single
mysqli_result?
How does it work with store_result() and also with unbuffered result
sets? I assume it's all treated the same as a normal execute(),
correct?
How does this affect insert_id? MariaDB documentation says that only
"the first auto-generated ID" is returned, which is contrary to what I
expect: the last. Does the value propagate to PHP? If so, why?
From what I can see, it is possible to request that MariaDB reply with
individual result sets. Wouldn't that be better? How would that impact
the performance?
How about atomicity? If I make 3 INSERTS and the middle one fails,
will the other two execute?
Indicator enum should contain only two values: IGNORE and DEFAULT.
Null is already a PHP data type and does not require an indicator.
None serves no purpose (what would even happen if someone used it).
Adding an Indicator enum will create an inconsistency with the
existing feature set which do not support it.
An enum should be defined by the extension in which it will be used,
i.e. mysqli. The mysqlnd extension shuld not expose any user-facing
APIs.
When connected to the MySQL server, how will executemany() behave? I
assume there will be an emulated support for it in mysqlnd, but how
will you ensure consistency between these two implementations? Making
the function no-op with MySQL should be out of the question.
Regards,
Kamil Tekiela