[RFC][Discussion] Add MariaDB-specific features to mysqlnd and mysqli

1 month ago by Georg Richter — view source — reply

unread

Hi,

I would like to start a discussion on a new RFC: "Add MariaDB-specific
features to mysqlnd and mysqli."

RFC:
https://wiki.php.net/rfc/mariadb_rfc

A proof of concept is available on my forked github-repository:
https://github.com/9EOR9/php-src/tree/8.6-mariadb

Benchmarks and results:
https://github.com/9EOR9/php-mariadb-rfc

/Georg

--
Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab

1 month ago by Kamil Tekiela — view source — reply

unread

Hi,

I would like to start a discussion on a new RFC: "Add MariaDB-specific features to mysqlnd and mysqli."

RFC:
https://wiki.php.net/rfc/mariadb_rfc

A proof of concept is available on my forked github-repository:
https://github.com/9EOR9/php-src/tree/8.6-mariadb

Benchmarks and results:
https://github.com/9EOR9/php-mariadb-rfc

/Georg

--
Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab

Hi Georg,

Thank you for presenting this RFC. I have a lot of comments:

First of all, regardless of what I am going to say next, I believe all
of these features should be treated in isolation and implemented in
separate PRs. It makes PR reviewing and git blaming easier, but also
performance changes can be tested in isolation.

Authentication Plugins:
You mention only one new plugin: parsec. Adding authentication plugins
does not require RFC. You can submit a separate PR without the RFC
process. Even if MySQL doesn't support the new method, adding a new
plugin isn't going to break anything.

Extended Metadata:
It looks like you have some formatting issues there. I don't
understand what this feature is, how to use it, nor how it will be
exposed. It needs more explanation.

In the table, you mention that it doesn't require any user-facing
changes, but it does. The returned structure is extended with two new
fields from what I can see. Additionally, the PoC changes existing
field values too.

Prepared Statement Metadata Caching:
In theory I am all for adding transparent improvements and they don't
usually require an RFC, but in this case, I don't understand how the
caching will be done. Are you saying the user will need to cache the
metadata somewhere? Is the cache shared across requests? How will the
cache be invalidated/cleaned?

From the implementation, it seems like the cache is only per prepared
statement, i.e. only the first execute() call will fetch the metadata.
If that's all it is then it can be made into a simple PR without RFC.
But in the table you say "Avoids resending metadata for repeated
prepares" which confuses me.

You also said:

Performance Improvements: Existing PHP functions that prepare and execute statements, such as mysql_execute_query() (introduced in PHP 8.2), could be updated to use execute_direct when connected to a MariaDB server. This would allow one-time statements to bypass the extra round-trip and metadata fetch, providing immediate performance gains.

Does this refer to caching? How does caching improve the performance
of a one-time executed prepared statement? If that's not what it
refers to, then what did you mean?

Progress Indication:
I'm quite worried about this one. PHP doesn't have native asynchronous
support yet. How do you intend for this to work when a built-in PHP
function can be suspended and possibly interrupted completely while
the MySQL is processing the request?

PDO Considerations:
Why is PDO out of scope? PDO is the main database extension in PHP. If
any new feature is added, then it should probably be exposed via PDO
first. What do you mean by "PDO_MySQL currently does not support array
binding or bulk execution semantics"?

Server capabilities:
I don't think it's more robust to detect MariaDB server by the lack of
CLIENT_LONG_PASSWORD than it is by the server_version. What if MySQL
decides to remove that flag in the future? What if derived products
don't follow the same convention?

Why is it necessary to expose this information to the PHP user via
server_capabilities and ext_server_capabilities? I can't find anywhere
that you explain what information is presented in these properties or
how to use it.

Memory Consumption:
Why is using generators going to use less memory? How does the new
function use generators that it's able to provide performance
improvements without reading all of the data from the generator
beforehand? This needs to be clearly explained so that it can be
explained in the PHP manual later. What is the difference between
these two:

$stmt->executemany("sd", $generator);
$stmt->executemany("sd", iterator_to_array($generator));

mysqli_stmt::executemany implementation:
You only mention the OO-style method in the RFC. Will the procedural
style counterpart also be implemented?

The name doesn't follow mysqli naming standard. It should be called
mysqli_stmt::execute_many. However, I wonder what made you land on
this name instead of mysqli_stmt::bulk_execute?

There should be no $types parameter IMHO. The data should either
always be treated as text, or it should be bound using the PHP type,
i.e. autodetected. I believe it was a mistake that bind_param() has a
$types parameter. All it serves is to cast the data silently into the
specified type. It has been a source of many silent bugs and a lot of
confusion. Additionally, giving users more type choices in
executemany() will not only create an inconsistency with bind_param(),
but also confuse PHP users who don't know if a variable is 8, or 16,
or 32, or 64-bit.

Return Values: Returns true on success or false on failure.

Please be explicit what the failure conditions may be.

How does get_result() work with executemany()? For example, if I
execute SELECT ?,? with [[1,2], [3,4]] will I get that same structure
in the result set? Is the sequence guaranteed? Is there a way to
distinguish which row came from which input data row? What if one
SELECT produces two rows and the other zero? Will it still be a single
mysqli_result?

How does it work with store_result() and also with unbuffered result
sets? I assume it's all treated the same as a normal execute(),
correct?

How does this affect insert_id? MariaDB documentation says that only
"the first auto-generated ID" is returned, which is contrary to what I
expect: the last. Does the value propagate to PHP? If so, why?

From what I can see, it is possible to request that MariaDB reply with
individual result sets. Wouldn't that be better? How would that impact
the performance?

How about atomicity? If I make 3 INSERTS and the middle one fails,
will the other two execute?

Indicator enum should contain only two values: IGNORE and DEFAULT.
Null is already a PHP data type and does not require an indicator.
None serves no purpose (what would even happen if someone used it).
Adding an Indicator enum will create an inconsistency with the
existing feature set which do not support it.
An enum should be defined by the extension in which it will be used,
i.e. mysqli. The mysqlnd extension shuld not expose any user-facing
APIs.

When connected to the MySQL server, how will executemany() behave? I
assume there will be an emulated support for it in mysqlnd, but how
will you ensure consistency between these two implementations? Making
the function no-op with MySQL should be out of the question.

Regards,
Kamil Tekiela

1 month ago by Georg Richter — view source — reply

unread

Hi Kamil,

Thank you for the detailed feedback. I’ve updated the RFC to reflect
several of your points, specifically regarding the separation of concerns.
To keep things clean, everything not directly visible to the user (metadata
caching, extended metadata, auth plugins) will be moved to separate PRs.

Extended Metadata:
It looks like you have some formatting issues there. I don't
understand what this feature is, how to use it, nor how it will be
exposed. It needs more explanation.

In the table, you mention that it doesn't require any user-facing
changes, but it does. The returned structure is extended with two new
fields from what I can see. Additionally, the PoC changes existing
field values too.

Extended metadata will go to a separate PR - we only need to map
MYSQL_TYPE_LONGTYPE + json to MYSQL_TYPE_JSON. and MYSQL_TYPE_BLOB + vector
to MYSQL_TYPE_VECTOR.

Prepared Statement Metadata Caching:
In theory I am all for adding transparent improvements and they don't
usually require an RFC, but in this case, I don't understand how the
caching will be done. Are you saying the user will need to cache the
metadata somewhere? Is the cache shared across requests? How will the
cache be invalidated/cleaned?

Right this will also be implemented via PR

From the implementation, it seems like the cache is only per prepared
statement, i.e. only the first execute() call will fetch the metadata.
If that's all it is then it can be made into a simple PR without RFC.
But in the table you say "Avoids resending metadata for repeated
prepares" which confuses me.

The caching mechanism is designed to be transparent within the mysqlnd
layer. Historically, MySQL and older MariaDB versions sent metadata during
the PREPARE phase and again for every subsequent EXECUTE.

MariaDB has optimized this: the server now only sends metadata during
EXECUTE if the table structure or definition has changed. By implementing
caching in mysqlnd, we can skip the redundant allocation and parsing of
metadata packets. This is managed internally by the driver and does not
require the user to manually clean or invalidate the cache.

Performance Improvements: Existing PHP functions that prepare and
execute statements, such as mysql_execute_query() (introduced in PHP 8.2),
could be updated to use execute_direct when connected to a MariaDB server.
This would allow one-time statements to bypass the extra round-trip and
metadata fetch, providing immediate performance gains.

Does this refer to caching? How does caching improve the performance
of a one-time executed prepared statement? If that's not what it
refers to, then what did you mean?

Sorry I forgot to remove this before I published it. MariaDB supports
direct execution by sending prepare and execute together - this doesn't
seem to work with php_streams.

Progress Indication:
I'm quite worried about this one. PHP doesn't have native asynchronous
support yet. How do you intend for this to work when a built-in PHP
function can be suspended and possibly interrupted completely while
the MySQL is processing the request?

To clarify: this is not asynchronous in the Fiber sense. It is a
synchronous "keep-alive" mechanism. During long-running operations (like
ALTER TABLE, LOAD DATA, or complex INSERT...SELECT statements), the server
sends "progress" packets.

Currently, mysqlnd ignores these, which can lead to connection timeouts or
a lack of feedback for the user. By supporting these packets, the driver
stays active, preventing timeouts and allowing the application to provide
real-time feedback to the user or logs, ensuring the client knows the
server is still alive and working.

PDO Considerations:
Why is PDO out of scope? PDO is the main database extension in PHP. If
any new feature is added, then it should probably be exposed via PDO
first. What do you mean by "PDO_MySQL currently does not support array
binding or bulk execution semantics"?

PDO is currently out of scope only to ensure a manageable and focused RFC.
Implementing these features in mysqlnd provides the necessary binary
foundation. A future "PDO Bulk RFC" can then leverage this work across all
drivers (PostgreSQL, SQLite, etc.) without needing to reinvent the
protocol-specific logic for the MySQL/MariaDB ecosystem. We are building
the engine first; the unified PDO interface can follow.

Server capabilities:
I don't think it's more robust to detect MariaDB server by the lack of
CLIENT_LONG_PASSWORD than it is by the server_version. What if MySQL
decides to remove that flag in the future? What if derived products
don't follow the same convention?

Why is it necessary to expose this information to the PHP user via
server_capabilities and ext_server_capabilities? I can't find anywhere
that you explain what information is presented in these properties or
how to use it.

There is a fundamental difference between what a server might do (Version)
and what it can do (Capabilities). A version string is a static label.
Capability flags, however, reflect the server’s current build-time features
and runtime configuration.

Exposing ext_server_capabilities allows developers and frameworks to
implement reliable feature detection. Without these flags, a developer
cannot safely use functions without risking a fatal error or needing to
wrap every call in a try-catch block.

Memory Consumption:
Why is using generators going to use less memory? How does the new
function use generators that it's able to provide performance
improvements without reading all of the data from the generator
beforehand? This needs to be clearly explained so that it can be
explained in the PHP manual later. What is the difference between
these two:
$stmt->executemany("sd", $generator);
$stmt->executemany("sd", iterator_to_array($generator));
The performance improvement is not derived from the Generator itself, but
from the reduction of network round-trips.

standard foreach + execute(): 1,000 rows = 1,000 "Send" packets + 1,000
"Response" packets.
execute_many(): The driver pulls rows from the iterable and serializes
them into a large internal network buffer. 1,000 rows = 1 large Send packet

1 Response packet.

Regarding iterator_to_array($generator): This forces PHP to allocate memory
for every single row simultaneously. If you are importing 1 million rows,
PHP must allocate memory for 1 million arrays before the first byte is ever
sent. By using the Generator directly, we maintain O(1) memory usage,
fetching and serializing only one row at a time.

mysqli_stmt::executemany implementation:
You only mention the OO-style method in the RFC. Will the procedural
style counterpart also be implemented?

yes - OO and procedural functions use the same code.

The name doesn't follow mysqli naming standard. It should be called
mysqli_stmt::execute_many. However, I wonder what made you land on
this name instead of mysqli_stmt::bulk_execute?

There should be no $types parameter IMHO. The data should either
always be treated as text, or it should be bound using the PHP type,
i.e. autodetected. I believe it was a mistake that bind_param() has a
$types parameter. All it serves is to cast the data silently into the
specified type. It has been a source of many silent bugs and a lot of
confusion. Additionally, giving users more type choices in
executemany() will not only create an inconsistency with bind_param(),
but also confuse PHP users who don't know if a variable is 8, or 16,
or 32, or 64-bit.

You are absolutely right—the name should follow the underscore convention,
so it will be execute_many. Regarding the parameter order and the $types
question, the proposed signature could be:

public mysqli_stmt::execute_many(iterable $data, ?iterable $control = null,
?string $types = null): bool
public mysqli_stmt_execute_many(mysqli_stmt $stmt, iterable $data,
?iterable $control = null, ?string $types = null): bool

By placing $control and $types at the end as optional parameters, we can
leverage PHP 8 named parameters to resolve the conflict between ease-of-use
and technical optimization.

The "All-Strings" Path (Implicit Autodetection)

To address your concern about autodetection: providing $types is not
mandatory. If the user simply calls $stmt->execute_many($data), the driver
will implicitly treat all scalar values as strings. Since MariaDB and MySQL
handle implicit casting efficiently, this provides the seamless "it just
works" experience you requested for the majority of use cases.

The "Strict-Typed" Path (Memory & Bandwidth Optimization)

While autodetection is convenient, specifying $types remains critical when
resource optimization is a factor:

Binary Efficiency: By specifying types like i (LONG) or 1 (TINYINT), we
significantly reduce the network payload compared to sending everything as
strings (up to an 80% reduction for certain numeric columns).
O(1) Memory Safety for Streams: Because the MariaDB bulk protocol
requires type declarations in the packet header before the data rows are
sent, the driver cannot "autodetect" binary widths from a Generator without
reading the entire stream into memory first. Providing the $types string
acts as a contract that allows for true, constant-memory streaming.

The "Control" Parameter and Indicators

I agree that the Indicator Enum should be moved to mysqli
(mysqli_indicator). The $control array serves three critical architectural
purposes:

Non-Destructive Transformation: It allows developers to apply logic (like
DEFAULT or NULL overrides) without modifying the original $data array or
stream. This is vital if the data source is a read-only Generator or is
being reused elsewhere in the application.
Security (Sensitive Data Overwrite): The $control array acts as a secure
template. A developer can pass a raw data array but use the control array
to force a status column to "active" or a last_modified_by column to a
specific ID, ensuring sensitive fields are overwritten regardless of the
input data's content.
-Flexibility: It supports both a Global Mode (one row that acts as a mask
for the entire batch) and a Streaming Mode (an iterable that advances in
parallel with the data).

This design ensures that execute_many remains simple for beginners while
providing the granular control required by high-performance implementations.

Please be explicit what the failure conditions may be.

Like for all other functions you have to retrieve error messages via
mysqli_stmt::error/errno.

How does get_result() work with executemany()? For example, if I
execute SELECT ?,? with [[1,2], [3,4]] will I get that same structure
in the result set? Is the sequence guaranteed? Is there a way to
distinguish which row came from which input data row? What if one
SELECT produces two rows and the other zero? Will it still be a single
mysqli_result?

As mentioned in the beginning of the RFC, execute_many is used for UPSERT
(insert, replace,delete and update) operations. However it might contain a
result set, e.g. by having a RETURNING clause.

How does it work with store_result() and also with unbuffered result
sets? I assume it's all treated the same as a normal execute(),
correct?

Yep

How does this affect insert_id? MariaDB documentation says that only
"the first auto-generated ID" is returned, which is contrary to what I
expect: the last. Does the value propagate to PHP? If so, why?

This behavior matches the standard MySQL C API and protocol where
mysql_insert_id() returns the first ID of a multi-row insert, not the last:

MySQL [test]> truncate table t1;
Query OK, 0 rows affected (0.075 sec)

MySQL [test]> insert into t1 (name) values ('foo'),('bar'),('PHP');
Query OK, 3 rows affected (0.017 sec)
Records: 3 Duplicates: 0 Warnings: 0

MySQL [test]> select last_insert_id()\G;
*************************** 1. row ***************************
last_insert_id(): 1

From what I can see, it is possible to request that MariaDB reply with
individual result sets. Wouldn't that be better? How would that impact
the performance?

If you require the IDs of every row in a bulk insert, the MariaDB RETURNING
clause should be used:

INSERT INTO t1 (name) VALUES (?) RETURNING id

How about atomicity? If I make 3 INSERTS and the middle one fails,
will the other two execute?

For transactional engines like InnoDB, atomicity is guaranteed. If a
protocol error occurs or a constraint is hit mid-batch, the server handles
the rollback, ensuring the database remains in a consistent state.

Indicator enum should contain only two values: IGNORE and DEFAULT.
Null is already a PHP data type and does not require an indicator.
None serves no purpose (what would even happen if someone used it).
Adding an Indicator enum will create an inconsistency with the
existing feature set which do not support it.
An enum should be defined by the extension in which it will be used,
i.e. mysqli. The mysqlnd extension shuld not expose any user-facing
APIs.

See my comments about new execute_many parameters (control). Even None is
used to indicate to use the original value from $data block.
NULL and mysqli_indicator::Null are indeed the same - however it is more
consistant with other drivers (e.g. ODBC and OCI) to have a Null indicator.

When connected to the MySQL server, how will executemany() behave? I
assume there will be an emulated support for it in mysqlnd, but how
will you ensure consistency between these two implementations? Making
the function no-op with MySQL should be out of the question.

When connected to a server that does not support the native bulk protocol
(such as standard MySQL), execute_many() will transition to a fallback
emulation layer. This ensures that the function remains a reliable,
portable API for developers regardless of the backend. However, there are
inherent limitations to this emulation:

Limited Indicator Support: Since the standard MySQL COM_STMT_EXECUTE
protocol does not understand MariaDB-specific indicators, the fallback will
only support mysqli_indicator::Null (translated to a standard SQL NULL) and
mysqli_indicator::None. Indicators like DEFAULT and IGNORE are technically
impossible to implement in the fallback without complex SQL string
manipulation/rewriting, which would introduce unacceptable CPU overhead.
Transaction Safety & Atomicity: In native MariaDB bulk mode, the entire
batch is sent as a single unit. In the fallback emulation, rows are
executed one by one. For non-transactional engines, a failure on row 500
would leave the first 499 rows committed. To maintain consistency, we
should recommend that users wrap execute_many() in an explicit transaction
when portability across MySQL and MariaDB is required.
Performance Delta: While the fallback provides a consistent API, it does
not offer the network round-trip savings of the native protocol. Each row
in the fallback requires a separate request/response cycle, effectively
acting as a "C-speed" foreach loop.

/Georg

--
Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab

1 month ago by Kamil Tekiela — view source — reply

unread

Hi Georg,

Thank you for addressing my questions.

Regarding the progress indicator: it looks to me like it will be very
difficult to implement properly in PHP, but I might be completely
misunderstanding the design. If you think a PoC is possible, could you
please prepare one and maybe then we can come back to this discussion.

There is a fundamental difference between what a server might do (Version) and what it can do (Capabilities). A version string is a static label. Capability flags, however, reflect the server’s current build-time features and runtime configuration.

Exposing ext_server_capabilities allows developers and frameworks to implement reliable feature detection. Without these flags, a developer cannot safely use functions without risking a fatal error or needing to wrap every call in a try-catch block.

This is exactly what exceptions and errors are for. The PHP user
SHOULD NOT be concerned with what capabilities the server offers and
which are compatible with PHP. Even if it could be potentially useful
to PHP developers, we should not expose this information as part of
mysqli API. Either mysqli supports the feature or it doesn't; there
should be no maybe.

The performance improvement is not derived from the Generator itself, but from the reduction of network round-trips.

standard foreach + execute(): 1,000 rows = 1,000 "Send" packets + 1,000 "Response" packets.

execute_many(): The driver pulls rows from the iterable and serializes them into a large internal network buffer. 1,000 rows = 1 large Send packet + 1 Response packet.

Regarding iterator_to_array($generator): This forces PHP to allocate memory for every single row simultaneously. If you are importing 1 million rows, PHP must allocate memory for 1 million arrays before the first byte is ever sent. By using the Generator directly, we maintain O(1) memory usage, fetching and serializing only one row at a time.

If mysqlnd is going to serialize it all into a large internal buffer
then it still needs to read all the data from the iterator before
making the query? It's still going to use the same amount of memory.
How is it going to maintain constant memory usage?

A generator might be a neat trick for users to prepare data on the go,
but it doesn't reduce the memory usage if the consumer needs to use
all of the data in one go. As I understand, execute_many will send it
all in one batch, so the memory footprint stays the same.

Re execute_many parameters:

Despite your convincing arguments for better network utilization by
providing the types, I still think we should not offer the possibility
of specifying the types. I don't know what other PHP developers on
this mailing list think about it, but for me the type feature goes
against the nature of PHP. Making the parameter optional is very good
choice and eases my concerns slightly, so if I am outnumbered in my
opinion, I won't be upset.

The number of mysqli users grows increasingly smaller. Out of this,
the number of people who will use execute_many and who will need to
optimize for TINYINT is unbelievably tiny. Any string easily
overshadows the numerical data. Thus, this feature won't see much
legitimate use.

Because the MariaDB bulk protocol requires type declarations in the packet header before the data rows are sent, the driver cannot "autodetect" binary widths from a Generator without reading the entire stream into memory first. Providing the $types string acts as a contract that allows for true, constant-memory streaming.

Isn't that what it does anyway? You need to read all the data
(serialize) before you make the EXECUTE command, correct? I don't
understand why you can't prepare the type specification automatically
while serializing the data.

Regarding the control parameter:

Why not make it a callback? Provide the $row as an argument and let
the user modify it inside the callback, substituting values for
mysqli_indicator, and returning the row to be inserted. It would offer
a lot more flexibility to the user and would make the implementation
simpler. This way, you don't need to implement Null or None anymore.

Please be explicit what the failure conditions may be.

Like for all other functions you have to retrieve error messages via mysqli_stmt::error/errno.

I was asking you to list in the RFC all the possible client errors
that are added as part of this implementation. For example, "Row %lu
is not an array". This should be part of the RFC, in my opinion, as we
may want to discuss the error conditions and messages too.

Result sets:

As mentioned in the beginning of the RFC, execute_many is used for UPSERT (insert, replace,delete and update) operations. However it might contain a result set, e.g. by having a RETURNING clause.

It can also be used with SELECT. The SQL is irrelevant.

From what I can see, it is possible to request that MariaDB reply with
individual result sets. Wouldn't that be better? How would that impact
the performance?

If you require the IDs of every row in a bulk insert, the MariaDB RETURNING clause should be used:

That's not what I meant. I was asking whether it could be implemented
with MARIADB_CLIENT_BULK_UNIT_RESULTS instead. When users execute a
SELECT with 2 data rows, I would like to see it return 2 mysqli_result
objects. Same with INSERT statements, it should return a separate
result for each insertion. If there are arguments against that, they
should be explained in the RFC.

Anyway, the RFC should clearly explain how result fetching works with
all 3 methods (unbuffered, stored, and get_result) and what are the
possible gotchas.

For transactional engines like InnoDB, atomicity is guaranteed. If a protocol error occurs or a constraint is hit mid-batch, the server handles the rollback, ensuring the database remains in a consistent state.

Transaction Safety & Atomicity: In native MariaDB bulk mode, the entire batch is sent as a single unit. In the fallback emulation, rows are executed one by one. For non-transactional engines, a failure on row 500 would leave the first 499 rows committed. To maintain consistency, we should recommend that users wrap execute_many() in an explicit transaction when portability across MySQL and MariaDB is required.

I am confused. Aren't both of these statements stating the same? Why
can't you wrap the fallback in an automatic transaction to make it
work exactly the same as the native MariaDB solution?

If execute_many implies an automatic transaction but only with
transactional engines, it should be clearly stated in the RFC so that
it can be later documented in PHP manual too.

Limited Indicator Support: Since the standard MySQL COM_STMT_EXECUTE protocol does not understand MariaDB-specific indicators, the fallback will only support mysqli_indicator::Null (translated to a standard SQL NULL) and mysqli_indicator::None. Indicators like DEFAULT and IGNORE are technically impossible to implement in the fallback without complex SQL string manipulation/rewriting, which would introduce unacceptable CPU overhead.

And for this reason, I think that maybe we shouldn't even implement
the control parameter at all. It sounds like a neat feature, but it
costs performance in a function that is all about improving
performance, and it is DB-version specific. The new execute_many
function doesn't need the control parameter to function properly, and
in my opinion, it would be better to keep it as simple as possible.
But I am curious to see what others think.

Regards,
Kamil

1 month ago by go.al.ni@gmail.com — view source — reply

unread

Regarding the progress indicator: it looks to me like it will be very
difficult to implement properly in PHP, but I might be completely
misunderstanding the design. If you think a PoC is possible, could you
please prepare one and maybe then we can come back to this discussion.

Example of such feature already present in PHP 1. Callback function
should post progress data into external storage (redis, session). That
data may be then exposed to user via separate endpoint call.

IMHO this is super useful feature.

1 month ago by Georg Richter — view source — reply

unread

Hi Kamil,

Regarding the progress indicator: it looks to me like it will be very
difficult to implement properly in PHP, but I might be completely
misunderstanding the design. If you think a PoC is possible, could you
please prepare one and maybe then we can come back to this discussion.

We have callback function support in several extension - the closest
matching one is probably curl:

curl_setopt($resource, CURLOPT_PROGRESSFUNCTION, 'progressCallback');

Even if no progress callback function was specified, we will always read
the progress information from server to avoid possible read timeouts for
long running operations.

This is exactly what exceptions and errors are for. The PHP user
SHOULD NOT be concerned with what capabilities the server offers and
which are compatible with PHP. Even if it could be potentially useful
to PHP developers, we should not expose this information as part of
mysqli API. Either mysqli supports the feature or it doesn't; there
should be no maybe.

I disagree — relying on try/catch is a waste of resources, especially in
high-load environments, when the information is already available locally.
It forces the application to parse error messages or perform unnecessary
network round-trips just to 'discover' server limitations.

More importantly, this approach is dangerous for data integrity. For
example, if you attempt to insert 1,000 rows via execute_many() on a MyISAM
table and the last row contains a feature (like an indicator variable) that
the server does not support, 999 rows will be committed before the error is
triggered. You cannot 'catch' your way out of a partially written batch in
a non-transactional engine.
Capability checking is a standard and necessary pattern in PHP. ext/gd uses
a bitmask for supported formats so developers can choose the correct logic
path before processing begins. Furthermore, ext/mysqli already exposes
several capability flags. While some were historically used for internal
purposes or inherited from libmysql (like SSL_VERIFY_SERVER_CERT), the
precedent for exposing capabilities to the user is already firmly
established in the current API.

If mysqlnd is going to serialize it all into a large internal buffer
then it still needs to read all the data from the iterator before
making the query? It's still going to use the same amount of memory.
How is it going to maintain constant memory usage?

Exactly. Because mysqlnd sits on top of PHP streams — which do not expose
direct access to their internal network buffers (at least not at the time
when Andrey wrote mysqlnd) —mysqlnd must allocate its own memory buffer for
each command before transferring it via the Stream API.

The memory usage is determined by the size of the protocol packet being
sent. To maintain efficiency, mysqlnd manages this buffer dynamically: it
reallocates the buffer only when necessary using an exponential growth
strategy to minimize overhead.

Regardless of whether the input is an array or a stream, the buffer must be
large enough to hold the serialized command. The 'constant memory' aspect
refers to the fact that we aren't duplicating the entire dataset multiple
times in different formats; rather, we are streaming the serialized data
into a managed internal buffer that mysqlnd already relies on for network
communication.

A generator might be a neat trick for users to prepare data on the go,
but it doesn't reduce the memory usage if the consumer needs to use
all of the data in one go. As I understand, execute_many will send it
all in one batch, so the memory footprint stays the same.

No, the peak memory usage is significantly lower. Consider the difference
between file() and fread():

file()/Array: You must store the entire dataset in PHP memory as zvals,
then duplicate it into the mysqlnd network buffer. You are essentially
double-buffering.
Generator/Stream: PHP memory remains constant because it only holds one
row at a time. mysqlnd pulls that row, serializes it directly into the
network buffer, and moves on.
By using a stream, you eliminate the massive overhead of the intermediate
PHP array, even if the final network packet requires a large buffer.

Re execute_many parameters:

Despite your convincing arguments for better network utilization by
providing the types, I still think we should not offer the possibility
of specifying the types. I don't know what other PHP developers on
this mailing list think about it, but for me the type feature goes
against the nature of PHP. Making the parameter optional is very good
choice and eases my concerns slightly, so if I am outnumbered in my
opinion, I won't be upset.

The number of mysqli users grows increasingly smaller. Out of this,
the number of people who will use execute_many and who will need to
optimize for TINYINT is unbelievably tiny. Any string easily
overshadows the numerical data. Thus, this feature won't see much
legitimate use.

I agree that 99% of users likely won't specify types. However, there will
always be cases—such as limited memory or restricted CPU—where this
optional parameter is essential. It reduces the footprint and eliminates
the overhead of type conversions on both the client and the server.

Even if the primary use case is 'tiny,' a low-level driver like mysqlnd
should provide the tools for maximum efficiency, especially when the
implementation cost for the engine is minimal but the potential performance
gain for the user is high.

Because the MariaDB bulk protocol requires type declarations in the
packet header before the data rows are sent, the driver cannot "autodetect"
binary widths from a Generator without reading the entire stream into
memory first. Providing the $types string acts as a contract that allows
for true, constant-memory streaming.

Isn't that what it does anyway? You need to read all the data
(serialize) before you make the EXECUTE command, correct? I don't
understand why you can't prepare the type specification automatically
while serializing the data.

That works for arrays, but not for streams or generators. Unlike an array,
a stream is a one-way 'pull' mechanism—we don't know the type of the second
or hundredth row until we have already consumed the last.

To determine types automatically, we would have to buffer the entire stream
into memory first to inspect it, which completely defeats the purpose of
using a stream to save memory. Providing the types upfront allows mysqlnd
to serialize the stream directly to the wire in a single pass.

Regarding the control parameter:

Why not make it a callback? Provide the $row as an argument and let
the user modify it inside the callback, substituting values for
mysqli_indicator, and returning the row to be inserted. It would offer
a lot more flexibility to the user and would make the implementation
simpler. This way, you don't need to implement Null or None anymore.

A callback would actually be a major step backward for performance.
Invoking a PHP user-land closure 100,000 times in a single execute_many()
call introduces a massive overhead due to context switching between the
C-engine and the PHP VM. This would effectively negate the performance
gains we are trying to achieve.

My planned implementation already solves the flexibility problem via
Generators. Since execute_many() accepts an iterable, a user can already
use a generator to perform row-by-row logic. This is far more efficient
than a callback.

Furthermore, a callback does not eliminate the need for
mysqli_indicator::None. Even inside a function, the driver still needs a
clear 'metadata signal' to know whether to pull a value from the data
source or to treat it as an override.

I was asking you to list in the RFC all the possible client errors

that are added as part of this implementation. For example, "Row %lu
is not an array". This should be part of the RFC, in my opinion, as we
may want to discuss the error conditions and messages too.

I haven't introduced any new error codes; all data validation errors use
CR_INVALID_PARAMETER_NO with UNKNOWN_SQLSTATE. I didn't see a requirement
in the RFC guidelines to list every specific error message, but if it is
mandatory for the process, I can add them to the document of course.

Result sets:

That's not what I meant. I was asking whether it could be implemented
with MARIADB_CLIENT_BULK_UNIT_RESULTS instead. When users execute a
SELECT with 2 data rows, I would like to see it return 2 mysqli_result
objects. Same with INSERT statements, it should return a separate
result for each insertion. If there are arguments against that, they
should be explained in the RFC.

Expecting 1,000 separate result objects for 1,000 inserted rows would cause
a massive performance collapse. Each result set would require its own
network packet and redundant metadata headers, completely defeating the
purpose of a bulk execution API.
Regarding MARIADB_CLIENT_BULK_UNIT_RESULTS: this is a very recnt feature
(introduced in MariaDB 11.5) that allows the server to return a single
result package containing multiple status rows. I did not include it in the
current RFC because it is not yet widely available in LTS releases.
Furthermore, for the MySQL fallback (where this capability is absent),
mysqlnd would have to 'artificially' construct these result sets in memory,
which adds significant overhead. The goal of execute_many is maximum
throughput, which is best achieved by providing a summary of the bulk
operation rather than individual results for every row.

Anyway, the RFC should clearly explain how result fetching works with
all 3 methods (unbuffered, stored, and get_result) and what are the
possible gotchas.

The RFC notes that execute_many() can return a result set (e.g., when using
a RETURNING clause or similar). The PHP documentation already clearly
defines how result sets are retrieved from prepared statements: via
bind_result() with store_result()/use_result(), or via get_result().

Since execute_many() follows the existing mysqli_stmt behavior and does not
change how results are buffered or fetched, adding a redundant explanation
of standard mysqli mechanisms would only clutter the RFC.

For transactional engines like InnoDB, atomicity is guaranteed. If a
protocol error occurs or a constraint is hit mid-batch, the server handles
the rollback, ensuring the database remains in a consistent state.

Transaction Safety & Atomicity: In native MariaDB bulk mode, the entire
batch is sent as a single unit. In the fallback emulation, rows are
executed one by one. For non-transactional engines, a failure on row 500
would leave the first 499 rows committed. To maintain consistency, we
should recommend that users wrap execute_many() in an explicit transaction
when portability across MySQL and MariaDB is required.

I am confused. Aren't both of these statements stating the same? Why
can't you wrap the fallback in an automatic transaction to make it
work exactly the same as the native MariaDB solution?

If execute_many implies an automatic transaction but only with
transactional engines, it should be clearly stated in the RFC so that
it can be later documented in PHP manual too.

I am against wrapping the fallback in an automatic transaction because a
low-level driver should not modify the session's transactional state behind
the scenes.

If mysqli were to automatically inject START TRANSACTION and COMMIT, it
could unexpectedly commit a user's existing work or interfere with their
manual transaction logic. Furthermore, an 'automatic' transaction would be
a false promise on engines like MyISAM, where BEGIN and COMMIT are simply
ignored, still resulting in partial inserts.

I've updated the RFC to clarify this.

Limited Indicator Support: Since the standard MySQL COM_STMT_EXECUTE
protocol does not understand MariaDB-specific indicators, the fallback will
only support mysqli_indicator::Null (translated to a standard SQL NULL) and
mysqli_indicator::None. Indicators like DEFAULT and IGNORE are technically
impossible to implement in the fallback without complex SQL string
manipulation/rewriting, which would introduce unacceptable CPU overhead.

And for this reason, I think that maybe we shouldn't even implement
the control parameter at all. It sounds like a neat feature, but it
costs performance in a function that is all about improving
performance, and it is DB-version specific. The new execute_many
function doesn't need the control parameter to function properly, and
in my opinion, it would be better to keep it as simple as possible.
But I am curious to see what others think.

Could you please explain why you expect a performance loss? I believe the
opposite is the case.

Consider the following common scenario: A system collects records from
various external APIs. The 'IDs' from these sources are not unique, so we
need the database to generate its own AUTO_INCREMENT keys. Additionally,
for privacy compliance (GDPR/CCPA), we must mask phone numbers during the
import.

The 'expensive' way would be to modify the existing data source. In the
worst case—for instance, if you need to log the original data after the
import—you would have to create a full copy of the data first. You then
have to iterate over the entire dataset to:

Set every id to null to trigger AUTO_INCREMENT.
Overwrite every phone number with a masked string (e.g., +XX XXX-XXXXXXX).

By specifying a control parameter, the source data remains completely
untouched. The driver handles both the nullification and the masking at the
C-level.

Example:

/* Raw external data: [External_ID, Name, Phone] */
$external_data = [
[101, 'John Doe', '555-1234'],
[102, 'Jane Doe', '555-5678'],
...
[100000, "Rasmus Lerdorf', '431-1233939']
];

/* Control parameter:

Column 0 (ID): Force NULL to trigger AUTO_INCREMENT
Column 1 (Name): Use mysqli_indicator::None (Keep original data)
Column 2 (Phone): Scalar override for privacy masking */
$control = [
mysqli_indicator::Null,
mysqli_indicator::None,
"+XX XXX-XXXXXXX"
];

$stmt->execute_many($external_data, control: $control);

/Georg

Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab

1 month ago by Morgan — view source — reply

unread

If mysqli were to automatically inject START TRANSACTION and COMMIT, it
could unexpectedly commit a user's existing work or interfere with their
manual transaction logic.

Couldn't a SAVEPOINT command be injected instead, with a RELEASE
SAVEPOINT on success and ROLLBACK TO SAVEPOINT on error? Of course now
the driver has to be aware of any current transaction to know it's to
use savepoints instead of a whole transaction...

Furthermore, an 'automatic' transaction would
be a false promise on engines like MyISAM, where BEGIN and COMMIT are
simply ignored, still resulting in partial inserts.

At which point the recommendation to users to wrap bulk executions in
such commands is ineffective anyway. But if they're using a
nontransactional engine they should already be aware of these pitfalls.

1 month ago by Georg Richter — view source — reply

unread

Hi,

Couldn't a SAVEPOINT command be injected instead, with a RELEASE
SAVEPOINT on success and ROLLBACK TO SAVEPOINT on error? Of course now
the driver has to be aware of any current transaction to know it's to
use savepoints instead of a whole transaction...

Transaction safety should always be guaranteed in the application code, not
in the driver. The driver cannot know how transactions are handled by the
user, whether the underlying tables even support transactions, or what the
specific application logic requires.

Adding transaction logic into the driver leads to unpredictable state
management. For example, if the driver were to automatically issue a
RELEASE SAVEPOINT, the application would lose the ability to roll back the
execute_many operation as part of its own broader error-handling logic.

At which point the recommendation to users to wrap bulk executions in

such commands is ineffective anyway. But if they're using a
nontransactional engine they should already be aware of these pitfalls.

Exactly!

/Georg

--
Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab

1 month ago by Robert Humphries — view source — reply

unread

Despite your convincing arguments for better network utilization by
providing the types, I still think we should not offer the possibility
of specifying the types. I don't know what other PHP developers on
this mailing list think about it, but for me the type feature goes
against the nature of PHP. Making the parameter optional is very good
choice and eases my concerns slightly, so if I am outnumbered in my
opinion, I won't be upset.
The number of mysqli users grows increasingly smaller. Out of this,
the number of people who will use execute_many and who will need to
optimize for TINYINT is unbelievably tiny. Any string easily
overshadows the numerical data. Thus, this feature won't see much
legitimate use.

I think the risk here is that currently the types parameter is
supported in existing code (via bind_param); and it is not
deprecated / no warnings exist on its usage in the manual. I think
having the parameter in new code be optional was a sensible change;
but until mysqli_stmt::bind_param is deprecated or the types
parameter is removed there; allowing users to specify the types in new
functions where there is a benefit to doing so (even if only a small
number of people will use the benefit) makes sense to do unless there
are technical or other reasons not to. If setting types for a prepared
statement shouldn't be done in PHP, then where that currently can be
done should be deprecated and in the future removed.

1 month ago by Georg Richter — view source — reply

unread

Hi Robert,

When I designed and wrote mysqli (and parts of libmysql 4.1) nearly 25
years ago, the computing landscape was fundamentally different. Memory was
a severe constraint; allocating large buffers to send data all at once was
often impossible. This necessitated the use of mysql_stmt_send_long_data to
stream larger BLOBs in portions—which was a primary reason for requiring
the explicit types specification in mysqli_stmt::bind_param.

Furthermore, CPU power was significantly more limited. The overhead of
converting types between the client and server was expensive. By providing
explicit type hints (like "isss"), we gave the engine a shortcut, bypassing
the need for the driver to "guess" or perform trial-and-error casts, thus
saving precious cycles at both ends of the connection.

Twenty-five years later, the landscape has evolved. Physical memory is
rarely the bottleneck; we are now primarily limited by the
max_allowed_packet size. Modern hardware and the maturity of libraries mean
that the CPU cost of internal type-juggling is now negligible for the vast
majority of applications.

However, we must avoid the "sledgehammer" approach of deprecation. In
high-speed, high-concurency environments where packet size and execution
speed remain fundamental requirements, the ability to explicitly define
types is still important. It allows for the most efficient binary
serialization possible, which is critical for the "power-user" tier of the
PHP ecosystem.

/Georg

On Sun, Apr 19, 2026 at 3:45 PM Robert Humphries <
contact@developer-rob.co.uk> wrote:

Despite your convincing arguments for better network utilization by
providing the types, I still think we should not offer the possibility
of specifying the types. I don't know what other PHP developers on
this mailing list think about it, but for me the type feature goes
against the nature of PHP. Making the parameter optional is very good
choice and eases my concerns slightly, so if I am outnumbered in my
opinion, I won't be upset.
The number of mysqli users grows increasingly smaller. Out of this,
the number of people who will use execute_many and who will need to
optimize for TINYINT is unbelievably tiny. Any string easily
overshadows the numerical data. Thus, this feature won't see much
legitimate use.

I think the risk here is that currently the types parameter is
supported in existing code (via bind_param); and it is not
deprecated / no warnings exist on its usage in the manual. I think
having the parameter in new code be optional was a sensible change;
but until mysqli_stmt::bind_param is deprecated or the types
parameter is removed there; allowing users to specify the types in new
functions where there is a benefit to doing so (even if only a small
number of people will use the benefit) makes sense to do unless there
are technical or other reasons not to. If setting types for a prepared
statement shouldn't be done in PHP, then where that currently can be
done should be deprecated and in the future removed.

--
Georg Richter, Staff Software Engineer
Client Connectivity
MariaDB Corporation Ab