Hi all,
I have been submitting hundreds of bugs (see https://github.com/php/php-src/issues/created_by/YuanchengJiang) during the past months and I first thank all the developers who take time to fix these issues to make PHP better.
I am thrilled to introduce one fully automated fuzz testing tool, FlowFusion, for discovering various bugs of the PHP interpreter.
The core idea behind FlowFusion is to leverage dataflow as an effective representation of test cases (.phpt files) maintained by PHP developers, merging two (or more) test cases to produce fused test cases with more complex code semantics. We connect two (or more) test cases via interleaving their dataflows, i.e., bringing the code context from one test case to another. This enables interactions among existing test cases, which are mostly the unit tests verifying one single functionality, making fused test cases interesting with merging code semantics.
FlowFusion additionally fuzzes all defined functions and class methods using the code contexts of fused test cases. Available functions, classes, and methods are pre-collected and stored in sqlite3 with necessary information like the number of parameters. FlowFusion will be automatically upgrading if phpt files keep updating. Any new single test can bring thousands of new fused tests.
The search space of FlowFusion is huge, which means it can cover various corner cases. Reasons for the huge search space are three-fold: (i) two random combinations of around 20,000 test cases can generate 400,000,000 test cases, and we can combine even more; (ii) the interleaving has randomness, given two test cases, there could be multiple ways to connect them; and (iii) FlowFusion also mutates the test case, fuzzes the runtime environment/configuration like JIT.
I can open-source the tool under my personal repository. I wonder by any chance if I can contribute it as the official PHP tool under https://github.com/php, and I would be happy to maintain it for a long time.
Best,
Yuancheng
Hi Yuancheng,
Awesome! I noticed the impressive number of opened issues, and given your background I guessed you were working on some new fuzzer.
I’ve been thinking that the fuzzing corpus of this new fuzzer (and of the existing oss-fuzz fuzzer SAPI, currently being ran by Google) may be improved by also adding all of the code of the community libraries (i.e. those defined by the nightly GitHub workflow at https://github.com/php/php-src/blob/master/.github/workflows/nightly.yml#L485).
I understand this might cause issues due to the volume of additional code being permutated by the fuzzer, but even running the community tests themselves without fuzzing already uncovers multiple segfaults (some of which are still being failing on master as of today).
Some time ago, I submitted https://github.com/php/php-src/pull/12406, which does multiple things to improve the coverage of the JIT compiler:
-
Add a few more popular CPU-intensive community libs like phpseclib, psalm, phpstan, etc: all prime candidates for addition to the nightly tests, as they're all extremely CPU-intensive libraries which benefit a lot from JIT, but suffer from its bugs (i.e. the psalm/phpseclib phpunit testsuites currently fail with segfaults with tracing JIT, phpseclib even explicitly disables JIT on windows to avoid a yet unsolved bug, etc...).
-
Parallelise community tests using a custom new runner, addressing concerns from Ilya which was worried about CI run times caused by the addition of new community tests
-
Add --repeat 2 to all tests (including community tests), which manages to catch some nasty JIT bugs by re-invoking the same script twice without actually recompiling the PHP code and zend byte code twice, managing to find issues caused by side effects of the first compilation.
-
Improve JIT flags of community and phpt tests to detect more JIT bugs, mainly copying them from https://github.com/danog/jit_bugs/blob/master/php.ini
I don’t currently have the free time to clean up the pull request and fix the numerous JIT bugs it currently detects (at least without a support contract), but all of these approaches may be reused if you or anyone else decides to upstream FlowFusion (though I would love at least a @danog mention in the pull request :)
Regards,
Daniil Gentili
—
Daniil Gentili - Senior software engineer
Portfolio: https://daniil.it
https://daniil.it/Telegram: https://t.me/danogentili
Hi all,
I have been submitting hundreds of bugs (see https://github.com/php/php-src/issues/created_by/YuanchengJiang) during the past months and I first thank all the developers who take time to fix these issues to make PHP better.
I am thrilled to introduce one fully automated fuzz testing tool, FlowFusion, for discovering various bugs of the PHP interpreter.
The core idea behind FlowFusion is to leverage dataflow as an effective representation of test cases (.phpt files) maintained by PHP developers, merging two (or more) test cases to produce fused test cases with more complex code semantics. We connect two (or more) test cases via interleaving their dataflows, i.e., bringing the code context from one test case to another. This enables interactions among existing test cases, which are mostly the unit tests verifying one single functionality, making fused test cases interesting with merging code semantics.
FlowFusion additionally fuzzes all defined functions and class methods using the code contexts of fused test cases. Available functions, classes, and methods are pre-collected and stored in sqlite3 with necessary information like the number of parameters. FlowFusion will be automatically upgrading if phpt files keep updating. Any new single test can bring thousands of new fused tests.
The search space of FlowFusion is huge, which means it can cover various corner cases. Reasons for the huge search space are three-fold: (i) two random combinations of around 20,000 test cases can generate 400,000,000 test cases, and we can combine even more; (ii) the interleaving has randomness, given two test cases, there could be multiple ways to connect them; and (iii) FlowFusion also mutates the test case, fuzzes the runtime environment/configuration like JIT.
I can open-source the tool under my personal repository. I wonder by any chance if I can contribute it as the official PHP tool under https://github.com/php, and I would be happy to maintain it for a long time.
Best,
Yuancheng
Hi all,
I have been submitting hundreds of bugs (see https://github.com/php/php-src/issues/created_by/YuanchengJiang https://github.com/php/php-src/issues/created_by/YuanchengJiang) during the past months and I first thank all the developers who take time to fix these issues to make PHP better.
I am thrilled to introduce one fully automated fuzz testing tool, FlowFusion, for discovering various bugs of the PHP interpreter.
The core idea behind FlowFusion is to leverage dataflow as an effective representation of test cases (.phpt files) maintained by PHP developers, merging two (or more) test cases to produce fused test cases with more complex code semantics. We connect two (or more) test cases via interleaving their dataflows, i.e., bringing the code context from one test case to another. This enables interactions among existing test cases, which are mostly the unit tests verifying one single functionality, making fused test cases interesting with merging code semantics.
FlowFusion additionally fuzzes all defined functions and class methods using the code contexts of fused test cases. Available functions, classes, and methods are pre-collected and stored in sqlite3 with necessary information like the number of parameters. FlowFusion will be automatically upgrading if phpt files keep updating. Any new single test can bring thousands of new fused tests.
The search space of FlowFusion is huge, which means it can cover various corner cases. Reasons for the huge search space are three-fold: (i) two random combinations of around 20,000 test cases can generate 400,000,000 test cases, and we can combine even more; (ii) the interleaving has randomness, given two test cases, there could be multiple ways to connect them; and (iii) FlowFusion also mutates the test case, fuzzes the runtime environment/configuration like JIT.
I can open-source the tool under my personal repository. I wonder by any chance if I can contribute it as the official PHP tool under https://github.com/php https://github.com/php, and I would be happy to maintain it for a long time.
Best,
Yuancheng
Hi Yuancheng
Thanks for all the reports you made, certainly an impressive feat!
I don't know what other maintainers think, but FWIW I'd be in favor incorporating this into our toolchain.
Kind regards
Niels
Hi Yuancheng
I have been submitting hundreds of bugs (see https://github.com/php/php-src/issues/created_by/YuanchengJiang) during the past months and I first thank all the developers who take time to fix these issues to make PHP better.
I am thrilled to introduce one fully automated fuzz testing tool, FlowFusion, for discovering various bugs of the PHP interpreter.
I can open-source the tool under my personal repository. I wonder by any chance if I can contribute it as the official PHP tool under https://github.com/php, and I would be happy to maintain it for a long time.
Thank you very much for your continued effort in finding and reporting
these bugs! Congratulations on this impressive tool. It has certainly
proven helpful. A few questions:
Are you happy adopting an appropriate license, e.g. the PHP license?
[1] (Or potentially some other the community agrees with). Can we
assume this tool remains a PHP specific tool, or are you planning on
expanding it to other programming languages, now that the concept has
proven useful? Provided these two things are not a problem, I don't
see a reason not to move it into the PHP organization.
Could you also expand on hosting? Will infrastructure be provided
(assuming we want continuous fuzzing) or is this something we will
need to set up?
It would also be nice to know how issues are reported, how many
false-positives there are, how we can tweak fuzzing configuration,
etc. This discussion doesn't need to happen on a big public list like
this one. You can contact me directly if you wish to move this
forward.
Ilija
On Tue, Nov 19, 2024 at 12:22 AM Ilija Tovilo tovilo.ilija@gmail.com
wrote:
Hi Yuancheng
On Fri, Nov 15, 2024 at 2:21 PM Yuancheng Jiang 0599jiangyc@gmail.com
wrote:I have been submitting hundreds of bugs (see
https://github.com/php/php-src/issues/created_by/YuanchengJiang) during
the past months and I first thank all the developers who take time to fix
these issues to make PHP better.I am thrilled to introduce one fully automated fuzz testing tool,
FlowFusion, for discovering various bugs of the PHP interpreter.I can open-source the tool under my personal repository. I wonder by any
chance if I can contribute it as the official PHP tool under
https://github.com/php, and I would be happy to maintain it for a long
time.Thank you very much for your continued effort in finding and reporting
these bugs! Congratulations on this impressive tool. It has certainly
proven helpful. A few questions:Are you happy adopting an appropriate license, e.g. the PHP license?
Please don't use PHP license. Prefer to use BSD, MIT or Apache license.
Thanks
Jakub
Thanks for acknowledging FlowFusion.
License looks fine to me. Ilija, I will private message you for details.
FlowFusion beta is available, let me know if you wanna try.
Best,
Yuancheng
Hi Yuancheng
I have been submitting hundreds of bugs (see https://github.com/php/php-src/issues/created_by/YuanchengJiang) during the past months and I first thank all the developers who take time to fix these issues to make PHP better.
I am thrilled to introduce one fully automated fuzz testing tool, FlowFusion, for discovering various bugs of the PHP interpreter.
I can open-source the tool under my personal repository. I wonder by any chance if I can contribute it as the official PHP tool under https://github.com/php, and I would be happy to maintain it for a long time.
Thank you very much for your continued effort in finding and reporting
these bugs! Congratulations on this impressive tool. It has certainly
proven helpful. A few questions:Are you happy adopting an appropriate license, e.g. the PHP license?
Please don't use PHP license. Prefer to use BSD, MIT or Apache license.
Thanks
Jakub