Hi internals,
For the past few days I've been thinking about whether to make a suggestion to add two-phase commit functionality to pdo.
I would like to hear everyone's opinions.
About two-phase commit
Regular transactions do not guarantee consistency across multiple DBs. For example, if you update DB1 and DB2 at the same time, only DB1 may be updated and the DB2 update may fail.
When using two-phase commit, it is as if a common transaction has been started for multiple DBs, and their consistency can be guaranteed.
Currently, pdo does not have such a feature as a method, but it is possible to use two-phase commit by executing a query. Think of it as adding such functionality as a method.
Although I have only written about the advantages, two-phase commit also has disadvantages, such as the possibility that locks may not be released forever under certain conditions.
Regards.
Saki
Hi internals,
For the past few days I've been thinking about whether to make a suggestion
to add two-phase commit functionality to pdo.I would like to hear everyone's opinions.
Hello,
Would the implementation use the underlying RDBMS two-phase commit? Or would the proposal be to try to build something at the PDO layer to mimic two-phase commits? The former seems good, the latter, not so much.
Best,
-Jeff
Hi Jeff,
Would the implementation use the underlying RDBMS two-phase commit? Or would the proposal be to try to build something at the PDO layer to mimic two-phase commits? The former seems good, the latter, not so much.
When implementing this, if each DB driver's API has a two-phase commit function, I would like to use it. I don't mean "emulate" it with PDO.
I haven't checked all the DB APIs because I'm still thinking about whether to propose it, but I've confirmed that at least Firebird's API has that functionality.
Regards.
Saki
Hi Jeff,
Would the implementation use the underlying RDBMS two-phase commit? Or would the proposal be to try to build something at the PDO layer to mimic two-phase commits? The former seems good, the latter, not so much.
When implementing this, if each DB driver's API has a two-phase commit
function, I would like to use it. I don't mean "emulate" it with PDO.I haven't checked all the DB APIs because I'm still thinking about
whether to propose it, but I've confirmed that at least Firebird's API
has that functionality.Regards.
Saki
I have not heard of this functionality before, so I don't know how common it is. If it's only lightly supported and in different ways, perhaps this is a use case for the new per-DB subclasses?
--Larry Garfield
On Thu, Nov 30, 2023 at 7:22 PM Larry Garfield larry@garfieldtech.com
wrote:
I have not heard of this functionality before, so I don't know how common
it is. If it's only lightly supported and in different ways, perhaps this
is a use case for the new per-DB subclasses?
Postgre (https://www.postgresql.org/docs/current/two-phase.html), MySQL (
https://dev.mysql.com/doc/refman/8.0/en/xa.html) and Oracle (
https://docs.oracle.com/en/database/oracle/oracle-database/21/admin/distributed-transactions-concepts.html#GUID-8152084F-4760-4B89-A91C-9A84F81C23D1)
all support it.
--
Best regards,
Bruce Weirdan mailto:
weirdan@gmail.com
On Thu, Nov 30, 2023 at 7:22 PM Larry Garfield larry@garfieldtech.com
wrote:I have not heard of this functionality before, so I don't know how common
it is. If it's only lightly supported and in different ways, perhaps
this
is a use case for the new per-DB subclasses?Postgre (https://www.postgresql.org/docs/current/two-phase.html), MySQL (
https://dev.mysql.com/doc/refman/8.0/en/xa.html) and Oracle (https://docs.oracle.com/en/database/oracle/oracle-database/21/admin/distributed-transactions-concepts.html#GUID-8152084F-4760-4B89-A91C-9A84F81C23D1
)
all support it.--
Best regards,
Bruce Weirdan mailto:
weirdan@gmail.com
Having MySQL XA Transactions exposed via PDO methods through a simplified
user experience would be amazing.
--
Marco Deleu
One thing I'd like to point out is regarding second phase (commit phase)
It's important to understand that once this phase has started, there's no
way back. After code execution is behind commit point, the transactions
MUST commit no matter what. Data must be committed regardless of network
issues even if such last for minutes.
Let's say we have two databases.
On the first phase we have asked them and received a confirmation of being
ready to commit.
Then second (Commit) phase starts:
- We have sent commit to the first database and it responded us with OK
(comitted) - Next we send commit to the second database, and it may be the case that
in that very moment we lose connection. Hence, if connection is lost before
final commit request, second database won't receive commit, and since It's
not aware of whether transaction should be committed or rolled back, data
is never made persistent.
FWIK, this case is usually handled at the application level using async
commit in message consumers. Meaning commit operation will be retied again,
and again, and again until the connection is restored.
Therefore, commit point is the point of no-return and network issues is the
problem we have to deal with.
Do you have any ideas how to handle this case at the core level in the
regard of php lifecycle?
Hi Eugene.
- We have sent commit to the first database and it responded us with OK (comitted)
- Next we send commit to the second database, and it may be the case that in that very moment we lose connection. Hence, if connection is lost before final commit request, second database won't receive commit, and since It's not aware of whether transaction should be committed or rolled back, data is never made persistent.
FWIK, this case is usually handled at the application level using async commit in message consumers. Meaning commit operation will be retied again, and again, and again until the connection is restored.
Therefore, commit point is the point of no-return and network issues is the problem we have to deal with.
This point highlights a critical and common challenge in deciding
whether to commit to a database.
Moreover, this consideration applies broadly to any side effects on
external systems, not solely to databases or two-phase commits.
Since the solution method differs depending on the system requirements
and specifications, I think developers should ensure safety through
their own implementation, and in other words, there is no need to
consider it as a capability of PHP itself.
Consider, for instance more generally, the following scenarios where
unintended loss of connectivity and processing interruptions are
critical.
- Send multiple HTTP requests with side-effects to external APIs at
the same time. If the connection is lost before the response is
returned. - Processes that acquire locks, such as by creating a file in the
filesystem, rather than using locks from RDBMS, Redis, etc. If the
process is terminated abnormally by the OS while locked, it becomes
difficult to determine if the file's presence is due to an unfinished
process or an abnormal termination.
In both scenarios, consistency will be checked using another process
equivalent to the message consumers you mentioned, or some other
method.
In the case of a two-phase commit as well, I think that PHP itself MAY
do nothing (or SHOULD NOT do anything).
Regarts.
Kentaro Takeda
Hi,
Do you have any ideas how to handle this case at the core level in the
regard of php lifecycle?
I agree with Kentaro on this. This can be said to be a weakness of two-phase commit itself, but I think it should be addressed in userland or user operations.
BTW, I said that I was thinking of implementing two-phase commit in pdo core, but that may be a bit unreasonable. Perhaps because it is not standardized in standard SQL, the method of implementing this function varies from database to database. The steps leading up to the final commit vary greatly depending on the DB.
Looking at the current implementation for each DB, it is possible to forcefully create a method in the core, but this will break down when the DB specifications change in the future.
Perhaps two-phase commit is a feature that should be implemented in pdo subclasses with "retaining the personality of the DB''.
Regards.
Saki
Hi there,
This functionality is common for data intensive applications that leverage
distributed systems for coping with load.
In other words, having it at core level is desirable for enterprise-level
systems.
Ordinary web-sites are unlikely to use this feature.
Though, I'd highly appreciate it if added to the core.
Hi Larry,
I have not heard of this functionality before, so I don't know how common it is. If it's only lightly supported and in different ways, perhaps this is a use case for the new per-DB subclasses?
Many DBs have this functionality, so I was thinking of implementing it in the same way as beginTransaction()
and commit()
, rather than using subclasses.
SQLite may not have a two-phase commit feature. I think to make it to be an error if I use these methods in such cases. (In fact, commit()
etc. will also generate an error if it is not supported)
Regards.
Saki