Can we add the BLAKE3 hash?
Created a PR here: https://github.com/php/php-src/pull/13194
BLAKE3 is a very fast ("blazing fast") cryptographically secure hash. It is
the latest iteration of the BLAKE hash, which was a SHA3 finalist~ see
https://github.com/BLAKE3-team/BLAKE3 for more info on BLAKE3.
In the PR is a portable C implementation, along with optimized ARM-neon and
x86_64 SSE2, SSE41, AVX2, and AVX512 implementations for GCC+unix and
GCC+windows and MSVC (*MSVC is currently only using the portable
implementation, but it should be easy for a developer equipped with MSVC to
enable the optimized implementations. I don't have MSVC personally)
That means the PR includes ~35 copies of the same algorithm, in
hand-written assembly, optimized for various CPU/compiler/OS combinations.
Which means the PR is huge.
It would be possible to only ship a subset of them (For example, keeping
just the gcc+unix+SSE2 and gcc+unix+AVX2 and ARM-neon and trash the rest,
would benefit a lot systems in-the-wild, and reduce the size of the PR
substantially)
It would also be possible to only ship the portable pure C implementation,
but that would also be detrimental to the performance, which is the main
motivator for adding BLAKE3 in the first place.
But the groundwork to ship them all is already done (see the PR)
Thoughts?
Can we add the BLAKE3 hash?
Created a PR here: https://github.com/php/php-src/pull/13194
BLAKE3 is a very fast ("blazing fast") cryptographically secure hash. It is
the latest iteration of the BLAKE hash, which was a SHA3 finalist~ see
https://github.com/BLAKE3-team/BLAKE3 for more info on BLAKE3.In the PR is a portable C implementation, along with optimized ARM-neon and
x86_64 SSE2, SSE41, AVX2, and AVX512 implementations for GCC+unix and
GCC+windows and MSVC (*MSVC is currently only using the portable
implementation, but it should be easy for a developer equipped with MSVC to
enable the optimized implementations. I don't have MSVC personally)That means the PR includes ~35 copies of the same algorithm, in
hand-written assembly, optimized for various CPU/compiler/OS combinations.
Which means the PR is huge.It would be possible to only ship a subset of them (For example, keeping
just the gcc+unix+SSE2 and gcc+unix+AVX2 and ARM-neon and trash the rest,
would benefit a lot systems in-the-wild, and reduce the size of the PR
substantially)It would also be possible to only ship the portable pure C implementation,
but that would also be detrimental to the performance, which is the main
motivator for adding BLAKE3 in the first place.But the groundwork to ship them all is already done (see the PR)
Thoughts?
BLAKE3 has 2 default sizes, BLAKE3_256 and BLAKE3_512. Internally the
hashblock size is 512,
With other algo's we have added these different hash sizes, would it be
possible for you to expose the 2 hash sizes.
BLAKE3 has 2 default sizes
Nope, only 1 canonical size, 256 bits.
BUT BLAKE3 is XOF, it can be exactly as long as you want it to be:
$ echo test | b3sum --length 5
dea2b412aa -
$ echo test | b3sum --length 10
dea2b412aa90f1b43a06 -
$ echo test | b3sum --length 32
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2 -
$ echo test | b3sum --length 64
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2ea67cf3c73a3acbfa2bdab694345d8ecf5e353dd1a3d5a9628aec9bffc3e4cca
$ echo test | b3sum --length 999
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2ea67cf3c73a3acbfa2bdab694345d8ecf5e353dd1a3d5a9628aec9bffc3e4ccaa32f434df18da6161cabb08b6278dcebca9833fe8d9f65d64db922cecf78c55b521f60dbd77d8ad8378a8f481f2941fedc817d7e1fdeb9c9c9915f3e0a8a8b3cbd4849e21dbe4e359b21224dee5b75bcee0f2083bb8c25559b109727d23b02bde4d2e212529106a1b23be564007909fa23e39c8fdca42a86e75f1568d77a85b0efb0acfa0258907f6d9bfae259234d782d53276f823fe32e29b7165818cbc75e4860188d60f6bb31b00308b1a7293b75e007eaf2de846709bb1856ed398e1c354a093b4f4853b9127ba2e9d85b5336b3e09eb802eef8168f1954c34cc9c61bb933de56790caaff3e03b43f85febfc175e3534e687527a757c2b2e5474efa6db51873da140f5ebc65dca5545b73dd64ac7585fe1d123475e128878962ff8952cd2c8372c4808c4893c8038e6ffb52ef7cf9416ad71588d779c8d60d19c997524b6f756b1d0d5934d41a8e3644fb3fc23e2403bf8b94b95a36f66fb108b6ed824b117f3de9314566bd7042bdd5116e096f0846121ba7034559b234074eac403d2d0f9a4386745375c54d2c22cc970a1cd9836cc9ad1bc3b8c511e5674f05cd5cb8d844c3e802199f0d8b9f3b6e2abd8e830b5768c1539b2d445181fbdcf77c51c330c67aa7b62691d18ecdb7d3124ac4e5fd83a8251ec072740aa4029624ad0a51ebfc8281a5e098ceda2b468e0f936a93b3498b0f11484c4e04cd7be657614ddebe9c08eb0c912431239605e1924009d32afeb965e9c7bbde77bc8efc2ebbc7eb3555286bb7b97fc30fe33806b36aef129d975251a737f0a285fd7cb617b9326211d22924704a2760e235ffa0c125eabb556698120229880b3af0f6dc81336af17fc90f3e889142a5e338a28816c0b6b3944d2f05b7a70189d3e8a19a1e6f6ca0041d4eb165ab4e4aad2f6ec87dc2986263e395c5a5d626bf8847d8b4a70126858f6adda1f39ce0cacf266895856c9ea118418b80c1a37260c7ef73598beb6b2cb3665eece981e249fec4ab8ad2424f1243b0835a7f079a3a9e9c288395a88e70f75eb5610251a416a7189d6e1c3c25a6729d3c9bae65970f8fa48d3ef8f8469ab62c19c3adc04a5c7debea10a910df7d389b183c18cd33fe6b946ebfc5b8a0505968a63122fe0f618e8cf07a978777381bdbafac8024226eee532b76d63ee4a0b45f1f623928afcce21977284868747d7949dd912c8b0894b6a782d2985085f0e629c0c7be7ab19b37e4c5f01a1636f62ee55783b86df8d53698e8b4bbe03fd69322609bb6fdee35cb433d44ec7322d6f1d040f87072bba06ab793bd857c7f754b080b8b04b28c
And what's more, thanks to PHP8.1.0's new $options argument for hash()
we can expose blake3's XOF like
hash("blake3", "test", options: ["length"=>512/8]): blake3_512
hash("blake3", "test", options: ["length"=>256/8]): blake3_256
hash("blake3", "test", options: ["length"=>8/8]): blake3_8
hash("blake3", "test", options: ["length"=>1000]): blake3_8000
that shouldn't be too difficult to implement either! good idea
Can we add the BLAKE3 hash?
Created a PR here: https://github.com/php/php-src/pull/13194
BLAKE3 is a very fast ("blazing fast") cryptographically secure hash. It is
the latest iteration of the BLAKE hash, which was a SHA3 finalist~ see
https://github.com/BLAKE3-team/BLAKE3 for more info on BLAKE3.In the PR is a portable C implementation, along with optimized ARM-neon and
x86_64 SSE2, SSE41, AVX2, and AVX512 implementations for GCC+unix and
GCC+windows and MSVC (*MSVC is currently only using the portable
implementation, but it should be easy for a developer equipped with MSVC to
enable the optimized implementations. I don't have MSVC personally)That means the PR includes ~35 copies of the same algorithm, in
hand-written assembly, optimized for various CPU/compiler/OS combinations.
Which means the PR is huge.It would be possible to only ship a subset of them (For example, keeping
just the gcc+unix+SSE2 and gcc+unix+AVX2 and ARM-neon and trash the rest,
would benefit a lot systems in-the-wild, and reduce the size of the PR
substantially)It would also be possible to only ship the portable pure C implementation,
but that would also be detrimental to the performance, which is the main
motivator for adding BLAKE3 in the first place.But the groundwork to ship them all is already done (see the PR)
Thoughts?
BLAKE3 has 2 default sizes, BLAKE3_256 and BLAKE3_512. Internally the
hashblock size is 512,
With other algo's we have added these different hash sizes, would it be
possible for you to expose the 2 hash sizes.
Having looked into it, it seems difficult after all,
I would want a new $options argument for hash_final()
, and some
internal changes to struct php_hash_blake3_ops,
and that internal change would have to be updated for all other hashes
PHP support..
I'm not up for doing that now.
And I think it should be a separate PR, after the initial support gets merged.
BLAKE3 has 2 default sizes
Nope, only 1 canonical size, 256 bits.
BUT BLAKE3 is XOF, it can be exactly as long as you want it to be:$ echo test | b3sum --length 5
dea2b412aa -
$ echo test | b3sum --length 10
dea2b412aa90f1b43a06 -
$ echo test | b3sum --length 32
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2 -
$ echo test | b3sum --length 64
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2ea67cf3c73a3acbfa2bdab694345d8ecf5e353dd1a3d5a9628aec9bffc3e4cca$ echo test | b3sum --length 999
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2ea67cf3c73a3acbfa2bdab694345d8ecf5e353dd1a3d5a9628aec9bffc3e4ccaa32f434df18da6161cabb08b6278dcebca9833fe8d9f65d64db922cecf78c55b521f60dbd77d8ad8378a8f481f2941fedc817d7e1fdeb9c9c9915f3e0a8a8b3cbd4849e21dbe4e359b21224dee5b75bcee0f2083bb8c25559b109727d23b02bde4d2e212529106a1b23be564007909fa23e39c8fdca42a86e75f1568d77a85b0efb0acfa0258907f6d9bfae259234d782d53276f823fe32e29b7165818cbc75e4860188d60f6bb31b00308b1a7293b75e007eaf2de846709bb1856ed398e1c354a093b4f4853b9127ba2e9d85b5336b3e09eb802eef8168f1954c34cc9c61bb933de56790caaff3e03b43f85febfc175e3534e687527a757c2b2e5474efa6db51873da140f5ebc65dca5545b73dd64ac7585fe1d123475e128878962ff8952cd2c8372c4808c4893c8038e6ffb52ef7cf9416ad71588d779c8d60d19c997524b6f756b1d0d5934d41a8e3644fb3fc23e2403bf8b94b95a36f66fb108b6ed824b117f3de9314566bd7042bdd5116e096f0846121ba7034559b234074eac403d2d0f9a4386745375c54d2c22cc970a1cd9836cc9ad1bc3b8c511e5674f05cd5cb8d844c3e802199f0d8b9f3b6e2abd8e830b5768c1539b2d445181fbdcf77c51c330c67aa7b62691d18ecdb7d3124ac4e5fd83a8251ec072740aa4029624ad0a51ebfc8281a5e098ceda2b468e0f936a93b3498b0f11484c4e04cd7be657614ddebe9c08eb0c912431239605e1924009d32afeb965e9c7bbde77bc8efc2ebbc7eb3555286bb7b97fc30fe33806b36aef129d975251a737f0a285fd7cb617b9326211d22924704a2760e235ffa0c125eabb556698120229880b3af0f6dc81336af17fc90f3e889142a5e338a28816c0b6b3944d2f05b7a70189d3e8a19a1e6f6ca0041d4eb165ab4e4aad2f6ec87dc2986263e395c5a5d626bf8847d8b4a70126858f6adda1f39ce0cacf266895856c9ea118418b80c1a37260c7ef73598beb6b2cb3665eece981e249fec4ab8ad2424f1243b0835a7f079a3a9e9c288395a88e70f75eb5610251a416a7189d6e1c3c25a6729d3c9bae65970f8fa48d3ef8f8469ab62c19c3adc04a5c7debea10a910df7d389b183c18cd33fe6b946ebfc5b8a0505968a63122fe0f618e8cf07a978777381bdbafac8024226eee532b76d63ee4a0b45f1f623928afcce21977284868747d7949dd912c8b0894b6a782d2985085f0e629c0c7be7ab19b37e4c5f01a1636f62ee55783b86df8d53698e8b4bbe03fd69322609bb6fdee35cb433d44ec7322d6f1d040f87072bba06ab793bd857c7f754b080b8b04b28cAnd what's more, thanks to PHP8.1.0's new $options argument for
hash()
we can expose blake3's XOF like
hash("blake3", "test", options: ["length"=>512/8]): blake3_512
hash("blake3", "test", options: ["length"=>256/8]): blake3_256
hash("blake3", "test", options: ["length"=>8/8]): blake3_8
hash("blake3", "test", options: ["length"=>1000]): blake3_8000that shouldn't be too difficult to implement either! good idea
Can we add the BLAKE3 hash?
Created a PR here: https://github.com/php/php-src/pull/13194
BLAKE3 is a very fast ("blazing fast") cryptographically secure hash. It is
the latest iteration of the BLAKE hash, which was a SHA3 finalist~ see
https://github.com/BLAKE3-team/BLAKE3 for more info on BLAKE3.In the PR is a portable C implementation, along with optimized ARM-neon and
x86_64 SSE2, SSE41, AVX2, and AVX512 implementations for GCC+unix and
GCC+windows and MSVC (*MSVC is currently only using the portable
implementation, but it should be easy for a developer equipped with MSVC to
enable the optimized implementations. I don't have MSVC personally)That means the PR includes ~35 copies of the same algorithm, in
hand-written assembly, optimized for various CPU/compiler/OS combinations.
Which means the PR is huge.It would be possible to only ship a subset of them (For example, keeping
just the gcc+unix+SSE2 and gcc+unix+AVX2 and ARM-neon and trash the rest,
would benefit a lot systems in-the-wild, and reduce the size of the PR
substantially)It would also be possible to only ship the portable pure C implementation,
but that would also be detrimental to the performance, which is the main
motivator for adding BLAKE3 in the first place.But the groundwork to ship them all is already done (see the PR)
Thoughts?
BLAKE3 has 2 default sizes, BLAKE3_256 and BLAKE3_512. Internally the
hashblock size is 512,
With other algo's we have added these different hash sizes, would it be
possible for you to expose the 2 hash sizes.
That's why I suggested implementing separate length algorithms like we have
for SHA3.
On Fri, 19 Jan 2024 at 21:03, Hans Henrik Bergan divinity76@gmail.com
wrote:
Having looked into it, it seems difficult after all,
I would want a new $options argument forhash_final()
, and some
internal changes to struct php_hash_blake3_ops,
and that internal change would have to be updated for all other hashes
PHP support..
I'm not up for doing that now.And I think it should be a separate PR, after the initial support gets
merged.On Fri, 19 Jan 2024 at 21:42, Hans Henrik Bergan divinity76@gmail.com
wrote:BLAKE3 has 2 default sizes
Nope, only 1 canonical size, 256 bits.
BUT BLAKE3 is XOF, it can be exactly as long as you want it to be:$ echo test | b3sum --length 5
dea2b412aa -
$ echo test | b3sum --length 10
dea2b412aa90f1b43a06 -
$ echo test | b3sum --length 32
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2 -
$ echo test | b3sum --length 64dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2ea67cf3c73a3acbfa2bdab694345d8ecf5e353dd1a3d5a9628aec9bffc3e4cca
$ echo test | b3sum --length 999
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2ea67cf3c73a3acbfa2bdab694345d8ecf5e353dd1a3d5a9628aec9bffc3e4ccaa32f434df18da6161cabb08b6278dcebca9833fe8d9f65d64db922cecf78c55b521f60dbd77d8ad8378a8f481f2941fedc817d7e1fdeb9c9c9915f3e0a8a8b3cbd4849e21dbe4e359b21224dee5b75bcee0f2083bb8c25559b109727d23b02bde4d2e212529106a1b23be564007909fa23e39c8fdca42a86e75f1568d77a85b0efb0acfa0258907f6d9bfae259234d782d53276f823fe32e29b7165818cbc75e4860188d60f6bb31b00308b1a7293b75e007eaf2de846709bb1856ed398e1c354a093b4f4853b9127ba2e9d85b5336b3e09eb802eef8168f1954c34cc9c61bb933de56790caaff3e03b43f85febfc175e3534e687527a757c2b2e5474efa6db51873da140f5ebc65dca5545b73dd64ac7585fe1d123475e128878962ff8952cd2c8372c4808c4893c8038e6ffb52ef7cf9416ad71588d779c8d60d19c997524b6f756b1d0d5934d41a8e3644fb3fc23e2403bf8b94b95a36f66fb108b6ed824b117f3de9314566bd7042bdd5116e096f0846121ba7034559b234074eac403d2d0f9a4386745375c54d2c22cc970a1cd9836cc9ad1bc3b8c511e5674f05cd5cb8d844c3e802199f0d8b9f3b6e2abd8e830b5768c1539b2d445181fbdcf77c51c330c67aa7b62691d18ecdb7d3124ac4e5fd83a8251ec072740aa4029624ad0a51ebfc8281a5e098ceda2b468e0f936a93b3498b0f11484c4e04cd7be657614ddebe9c08eb0c912431239605e1924009d32afeb965e9c7bbde77bc8efc2ebbc7eb3555286bb7b97fc30fe33806b36aef129d975251a737f0a285fd7cb617b9326211d22924704a2760e235ffa0c125eabb556698120229880b3af0f6dc81336af17fc90f3e889142a5e338a28816c0b6b3944d2f05b7a70189d3e8a19a1e6f6ca0041d4eb165ab4e4aad2f6ec87dc2986263e395c5a5d626bf8847d8b4a70126858f6adda1f39ce0cacf266895856c9ea118418b80c1a37260c7ef73598beb6b2cb3665eece981e249fec4ab8ad2424f1243b0835a7f079a3a9e9c288395a88e70f75eb5610251a416a7189d6e1c3c25a6729d3c9bae65970f8fa48d3ef8f8469ab62c19c3adc04a5c7debea10a910df7d389b183c18cd33fe6b946ebfc5b8a0505968a63122fe0f618e8cf07a978777381bdbafac8024226eee532b76d63ee4a0b45f1f623928afcce21977284868747d7949dd912c8b0894b6a782d2985085f0e629c0c7be7ab19b37e4c5f01a1636f62ee55783b86df8d53698e8b4bbe03fd69322609bb6fdee35cb433d44ec7322d6f1d040f87072bba06ab793bd857c7f754b080b8b04b28c
And what's more, thanks to PHP8.1.0's new $options argument for
hash()
we can expose blake3's XOF like
hash("blake3", "test", options: ["length"=>512/8]): blake3_512
hash("blake3", "test", options: ["length"=>256/8]): blake3_256
hash("blake3", "test", options: ["length"=>8/8]): blake3_8
hash("blake3", "test", options: ["length"=>1000]): blake3_8000that shouldn't be too difficult to implement either! good idea
On Fri, 19 Jan 2024 at 18:43, Hans Henrik Bergan hans@loltek.net
wrote:Can we add the BLAKE3 hash?
Created a PR here: https://github.com/php/php-src/pull/13194
BLAKE3 is a very fast ("blazing fast") cryptographically secure
hash. It is
the latest iteration of the BLAKE hash, which was a SHA3 finalist~
see
https://github.com/BLAKE3-team/BLAKE3 for more info on BLAKE3.In the PR is a portable C implementation, along with optimized
ARM-neon and
x86_64 SSE2, SSE41, AVX2, and AVX512 implementations for GCC+unix and
GCC+windows and MSVC (*MSVC is currently only using the portable
implementation, but it should be easy for a developer equipped with
MSVC to
enable the optimized implementations. I don't have MSVC personally)That means the PR includes ~35 copies of the same algorithm, in
hand-written assembly, optimized for various CPU/compiler/OS
combinations.
Which means the PR is huge.It would be possible to only ship a subset of them (For example,
keeping
just the gcc+unix+SSE2 and gcc+unix+AVX2 and ARM-neon and trash the
rest,
would benefit a lot systems in-the-wild, and reduce the size of the
PR
substantially)It would also be possible to only ship the portable pure C
implementation,
but that would also be detrimental to the performance, which is the
main
motivator for adding BLAKE3 in the first place.But the groundwork to ship them all is already done (see the PR)
Thoughts?
BLAKE3 has 2 default sizes, BLAKE3_256 and BLAKE3_512. Internally the
hashblock size is 512,
With other algo's we have added these different hash sizes, would it be
possible for you to expose the 2 hash sizes.
That's why I suggested implementing separate length algorithms like we have
for SHA3.
Just an etiquette note, please don't top post on the mailing list. [1]
I have no idea what this sentence is replying to, and it makes following the discussion difficult.
Best regards,
Gina P. Banyard
[1] https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md
On Monday, 22 January 2024 at 02:29, tag Knife fenniclog@gmail.com
wrote:That's why I suggested implementing separate length algorithms like we
have
for SHA3.Just an etiquette note, please don't top post on the mailing list. [1]
I have no idea what this sentence is replying to, and it makes following
the discussion difficult.Best regards,
Gina P. Banyard
[1] https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md
Yea I forgot to top post that reply. I just quick replied.
On Fri, 19 Jan 2024 at 21:03, Hans Henrik Bergan divinity76@gmail.com
wrote:
Having looked into it, it seems difficult after all,
I would want a new $options argument forhash_final()
, and some
internal changes to struct php_hash_blake3_ops,
and that internal change would have to be updated for all other hashes
PHP support..
I'm not up for doing that now.And I think it should be a separate PR, after the initial support gets
merged.On Fri, 19 Jan 2024 at 21:42, Hans Henrik Bergan divinity76@gmail.com
wrote:BLAKE3 has 2 default sizes
Nope, only 1 canonical size, 256 bits.
BUT BLAKE3 is XOF, it can be exactly as long as you want it to be:$ echo test | b3sum --length 5
dea2b412aa -
$ echo test | b3sum --length 10
dea2b412aa90f1b43a06 -
$ echo test | b3sum --length 32
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2 -
$ echo test | b3sum --length 64dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2ea67cf3c73a3acbfa2bdab694345d8ecf5e353dd1a3d5a9628aec9bffc3e4cca
$ echo test | b3sum --length 999
dea2b412aa90f1b43a06ca5e8b8feafec45ae1357971322749480f4e1572eaa2ea67cf3c73a3acbfa2bdab694345d8ecf5e353dd1a3d5a9628aec9bffc3e4ccaa32f434df18da6161cabb08b6278dcebca9833fe8d9f65d64db922cecf78c55b521f60dbd77d8ad8378a8f481f2941fedc817d7e1fdeb9c9c9915f3e0a8a8b3cbd4849e21dbe4e359b21224dee5b75bcee0f2083bb8c25559b109727d23b02bde4d2e212529106a1b23be564007909fa23e39c8fdca42a86e75f1568d77a85b0efb0acfa0258907f6d9bfae259234d782d53276f823fe32e29b7165818cbc75e4860188d60f6bb31b00308b1a7293b75e007eaf2de846709bb1856ed398e1c354a093b4f4853b9127ba2e9d85b5336b3e09eb802eef8168f1954c34cc9c61bb933de56790caaff3e03b43f85febfc175e3534e687527a757c2b2e5474efa6db51873da140f5ebc65dca5545b73dd64ac7585fe1d123475e128878962ff8952cd2c8372c4808c4893c8038e6ffb52ef7cf9416ad71588d779c8d60d19c997524b6f756b1d0d5934d41a8e3644fb3fc23e2403bf8b94b95a36f66fb108b6ed824b117f3de9314566bd7042bdd5116e096f0846121ba7034559b234074eac403d2d0f9a4386745375c54d2c22cc970a1cd9836cc9ad1bc3b8c511e5674f05cd5cb8d844c3e802199f0d8b9f3b6e2abd8e830b5768c1539b2d445181fbdcf77c51c330c67aa7b62691d18ecdb7d3124ac4e5fd83a8251ec072740aa4029624ad0a51ebfc8281a5e098ceda2b468e0f936a93b3498b0f11484c4e04cd7be657614ddebe9c08eb0c912431239605e1924009d32afeb965e9c7bbde77bc8efc2ebbc7eb3555286bb7b97fc30fe33806b36aef129d975251a737f0a285fd7cb617b9326211d22924704a2760e235ffa0c125eabb556698120229880b3af0f6dc81336af17fc90f3e889142a5e338a28816c0b6b3944d2f05b7a70189d3e8a19a1e6f6ca0041d4eb165ab4e4aad2f6ec87dc2986263e395c5a5d626bf8847d8b4a70126858f6adda1f39ce0cacf266895856c9ea118418b80c1a37260c7ef73598beb6b2cb3665eece981e249fec4ab8ad2424f1243b0835a7f079a3a9e9c288395a88e70f75eb5610251a416a7189d6e1c3c25a6729d3c9bae65970f8fa48d3ef8f8469ab62c19c3adc04a5c7debea10a910df7d389b183c18cd33fe6b946ebfc5b8a0505968a63122fe0f618e8cf07a978777381bdbafac8024226eee532b76d63ee4a0b45f1f623928afcce21977284868747d7949dd912c8b0894b6a782d2985085f0e629c0c7be7ab19b37e4c5f01a1636f62ee55783b86df8d53698e8b4bbe03fd69322609bb6fdee35cb433d44ec7322d6f1d040f87072bba06ab793bd857c7f754b080b8b04b28c
And what's more, thanks to PHP8.1.0's new $options argument for
hash()
we can expose blake3's XOF like
hash("blake3", "test", options: ["length"=>512/8]): blake3_512
hash("blake3", "test", options: ["length"=>256/8]): blake3_256
hash("blake3", "test", options: ["length"=>8/8]): blake3_8
hash("blake3", "test", options: ["length"=>1000]): blake3_8000that shouldn't be too difficult to implement either! good idea
On Fri, 19 Jan 2024 at 18:43, Hans Henrik Bergan hans@loltek.net
wrote:Can we add the BLAKE3 hash?
Created a PR here: https://github.com/php/php-src/pull/13194
BLAKE3 is a very fast ("blazing fast") cryptographically secure
hash. It is
the latest iteration of the BLAKE hash, which was a SHA3 finalist~
see
https://github.com/BLAKE3-team/BLAKE3 for more info on BLAKE3.In the PR is a portable C implementation, along with optimized
ARM-neon and
x86_64 SSE2, SSE41, AVX2, and AVX512 implementations for GCC+unix and
GCC+windows and MSVC (*MSVC is currently only using the portable
implementation, but it should be easy for a developer equipped with
MSVC to
enable the optimized implementations. I don't have MSVC personally)That means the PR includes ~35 copies of the same algorithm, in
hand-written assembly, optimized for various CPU/compiler/OS
combinations.
Which means the PR is huge.It would be possible to only ship a subset of them (For example,
keeping
just the gcc+unix+SSE2 and gcc+unix+AVX2 and ARM-neon and trash the
rest,
would benefit a lot systems in-the-wild, and reduce the size of the
PR
substantially)It would also be possible to only ship the portable pure C
implementation,
but that would also be detrimental to the performance, which is the
main
motivator for adding BLAKE3 in the first place.But the groundwork to ship them all is already done (see the PR)
Thoughts?
BLAKE3 has 2 default sizes, BLAKE3_256 and BLAKE3_512. Internally the
hashblock size is 512,
With other algo's we have added these different hash sizes, would it be
possible for you to expose the 2 hash sizes.
That's why I suggested implementing separate lengths of the like we have
for SHA3, so we could have BLAKE3_256 and BLAKE3_512 and maybe inbetweens.
That's why I suggested implementing separate lengths of the like we have for SHA3, so we could have BLAKE3_256 and BLAKE3_512 and maybe inbetweens.
we can look into exposing blake3's XOF (arbitrary length) capabilities
after (and if) initial blake3 support gets merged.
would probably look something like hash_final($ctx,
options:["length"=>512/8]); hash("blake3", "x", options:
["length"=>512/8]);
and deserves its own dedicated pull request.
Just an etiquette note, please don't top post on the mailing list. [1]
I have no idea what this sentence is replying to, and it makes following the discussion difficult.
I see, thanks for the heads up.
Maybe vote on it? (that was suggested in the PR too,
https://github.com/php/php-src/pull/13194#issuecomment-1900430400 )
Can think of 6 things
1: Should BLAKE3 be added to PHP? yes/no
2: Should ARM Neon (2007) optimized implementation be bundled? yes/no
3: Should x86_64 SSE2 (2000) optimized implementation be bundled? yes/no
4: Should x86_64 SSE4.1 (2007) optimized implementation be bundled? yes/no
5: Should x86_64 AVX2 (2011) optimized implementation be bundled? yes/no
6: Should x86_64 AVX512 (2016) optimized implementation be bundled? yes/no
(wrote the year processors were actually released, not the year
instructions were proposed/announced)
Arguments against SSE2 and SSE4.1: pretty much all modern CPUs supporting
SSE2/SSE4.1 also support AVX2.
Argument against AVX512: CloudFlare said in a blogpost that when a core
starts executing AVX512 instructions, it decreases the clock speed of
neighboring cores so much that, quote:
OpenSSL serves 10% fewer requests per second. And that is a huge number!
It is equivalent to giving up on two cores, for nothing
and another quote:
If you do not require AVX-512 for some specific high performance tasks, I
suggest you disable AVX-512 execution on your server or desktop, to avoid
accidental AVX-512 throttling.
(ref https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling
)
that AVX512 issue is probably cpu-specific and will probably be mitigated
in newer CPU releases
(idk if AMD is even affected, or if it is purely a Intel issue)
thus they may be worthy of a vote
Maybe vote on it? (that was suggested in the PR too,
https://github.com/php/php-src/pull/13194#issuecomment-1900430400 )Can think of 6 things
1: Should BLAKE3 be added to PHP? yes/no
2: Should ARM Neon (2007) optimized implementation be bundled? yes/no
3: Should x86_64 SSE2 (2000) optimized implementation be bundled? yes/no
4: Should x86_64 SSE4.1 (2007) optimized implementation be bundled? yes/no
5: Should x86_64 AVX2 (2011) optimized implementation be bundled? yes/no
6: Should x86_64 AVX512 (2016) optimized implementation be bundled? yes/no(wrote the year processors were actually released, not the year
instructions were proposed/announced)Arguments against SSE2 and SSE4.1: pretty much all modern CPUs supporting
SSE2/SSE4.1 also support AVX2.Argument against AVX512: CloudFlare said in a blogpost that when a core
starts executing AVX512 instructions, it decreases the clock speed of
neighboring cores so much that, quote:OpenSSL serves 10% fewer requests per second. And that is a huge number!
It is equivalent to giving up on two cores, for nothingand another quote:
If you do not require AVX-512 for some specific high performance tasks, I
suggest you disable AVX-512 execution on your server or desktop, to avoid
accidental AVX-512 throttling.(ref
https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling
)
that AVX512 issue is probably cpu-specific and will probably be mitigated
in newer CPU releases
(idk if AMD is even affected, or if it is purely a Intel issue)thus they may be worthy of a vote
Should we even be considering the specific instruction implementations?
I've always been in the camp
of you are not smarter than the compiler. As even the best human written
ASM code can be slower
than the obscure instructions the compiler might choose to use in a weird
and wonderful way.
Should we even be considering the specific instruction implementations?
I've always been in the camp
Depends on the actual numbers: is there any way to make a comparison that
is relatively stable across architectures?
Would it be feasible to start with the
cross-platform-let-the-compiler-do-its-job version (that somebody may
actually be capable of auditing), and then introduce other versions when
the jump is significant enough?
Marco Pivetta
Depends on the actual numbers: is there any way to make a comparison that
is relatively stable across architectures?Would it be feasible to start with the
cross-platform-let-the-compiler-do-its-job version (that somebody may
actually be capable of auditing), and then introduce other versions when
the jump is significant enough?
don't know about "relatively stable across architectures" but wrote
some benchmarking code, keep reading.
Should we even be considering the specific instruction implementations?
I've always been in the camp
of you are not smarter than the compiler. As even the best human written
ASM code can be slower
than the obscure instructions the compiler might choose to use in a weird
and wonderful way.
The BLAKE3 team is smarter than GCC11.4, even with -march=native
-mtune=native, which is not commonly used in PHP,
the compiler didn't stand a chance against the hand-optimized assembly versions,
wrote some benchmarks, but the TL;DR is:
portable -O2 usually used by PHP managed 1126MB/s,
portable -O2 -march=native managed 533MB/s (wtf? gcc obviously got
something wrong here),
hand-written -O2 SSE2 managed 3144MB/s,
hand-written -O2 SSE41 managed 3332MB/s,
hand-written -O2 avx2 managed 6554MB/s,
hand-writen -O2 AVX512 managed 8913MB/s,
on my AMD Ryzen 9 7950x,
benchmarking code:
https://gist.github.com/divinity76/5729472dd5d77e94cd0acb245aac2226
raw output:
array(6) {
["O2-portable-march"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(29295)
["mb_per_second"]=>
float(533.3674688513398)
}
["O2-portable"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(13876)
["mb_per_second"]=>
float(1126.0449697319111)
}
["O2-sse2"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(4969)
["mb_per_second"]=>
float(3144.4958744214127)
}
["O2-sse41"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(4688)
["mb_per_second"]=>
float(3332.977815699659)
}
["O2-avx2"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(2384)
["mb_per_second"]=>
float(6554.1107382550335)
}
["O2-avx512"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(1753)
["mb_per_second"]=>
float(8913.291500285226)
}
}
Depends on the actual numbers: is there any way to make a comparison that
is relatively stable across architectures?Would it be feasible to start with the
cross-platform-let-the-compiler-do-its-job version (that somebody may
actually be capable of auditing), and then introduce other versions when
the jump is significant enough?don't know about "relatively stable across architectures" but wrote
some benchmarking code, keep reading.Should we even be considering the specific instruction implementations?
I've always been in the camp
of you are not smarter than the compiler. As even the best human written
ASM code can be slower
than the obscure instructions the compiler might choose to use in a weird
and wonderful way.The BLAKE3 team is smarter than GCC11.4, even with -march=native
-mtune=native, which is not commonly used in PHP,
the compiler didn't stand a chance against the hand-optimized assembly
versions,wrote some benchmarks, but the TL;DR is:
portable -O2 usually used by PHP managed 1126MB/s,
portable -O2 -march=native managed 533MB/s (wtf? gcc obviously got
something wrong here),
hand-written -O2 SSE2 managed 3144MB/s,
hand-written -O2 SSE41 managed 3332MB/s,
hand-written -O2 avx2 managed 6554MB/s,
hand-writen -O2 AVX512 managed 8913MB/s,
on my AMD Ryzen 9 7950x,
benchmarking code:
https://gist.github.com/divinity76/5729472dd5d77e94cd0acb245aac2226
raw output:
array(6) {
["O2-portable-march"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(29295)
["mb_per_second"]=>
float(533.3674688513398)
}
["O2-portable"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(13876)
["mb_per_second"]=>
float(1126.0449697319111)
}
["O2-sse2"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(4969)
["mb_per_second"]=>
float(3144.4958744214127)
}
["O2-sse41"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(4688)
["mb_per_second"]=>
float(3332.977815699659)
}
["O2-avx2"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(2384)
["mb_per_second"]=>
float(6554.1107382550335)
}
["O2-avx512"]=>
array(2) {
["microseconds_for_16_kib"]=>
int(1753)
["mb_per_second"]=>
float(8913.291500285226)
}
}
Oh yes, the AVX jump is impressive 😵
just tested ARM Neon optimizations on Oracle Cloud's cheapest ARM VPS:
VM.Standard.A1.Flex, Ubuntu 22.04, GCC11.4,
results:
-O2 portable: 596MB/s
-O2 -march=native portable: 601MB/s
-O2 ARM Neon optimized implementation: 1138MB/s
Again, even with -march=native, the compiler cannot make the portable
implementation nearly as fast as the hand-optimized cpu-specific
implementation.
with
https://github.com/php/php-src/commit/52dba99d47563f38d8ed5f84690a3cb2c1785475
,
the PR ( https://github.com/php/php-src/pull/13194 ) got the first merge
conflict. I fixed it, but what's next?
Could vote on it, if there is little more to discuss?
quick recap
AMD Ryzen 9 7950x:
portable -O2: 1126MB/s,
SSE2: 3144MB/s
SSE4.1: 3332MB/s
AVX2: 6554MB/s
AVX512: 8913MB/s,
Oracle Cloud's ARM VM.Standard.A1.Flex
portable -O2: 596MB/s
Neon: 1138MB/s