Hi,
Attached is a quick patch for PHP 5.2.5 that replaces RSA's copyrighted
implementation of MD5 with my public domain one:
http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
This also results in faster and slightly smaller code (both source and
binary). On a Pentium 3 that I've tested this on, the speedup is 12%
for the portable hashes as implemented in phpass:
http://www.openwall.com/phpass/
Given that most CPU time is spent on various overhead of the PHP
interpreter rather than on MD5 hashing itself (and gprof confirms this),
the 12% speedup seen on the PHP script as a whole means that the speedup
for the MD5 implementation alone is much higher than that. The speedup
should be similar on other little-endian CPUs (other x86 CPUs, x86-64,
Alpha), but smaller on big-endian.
The code can be made cleaner by taking my md5.[ch] files as they are and
introducing two files more for PHP's added functions. I did not do it
for this patch in order for my changes to be less invasive.
I also did not similarly replace the MD5 implementation in hash_md.c,
which obviously will need to be done (once again, I'd prefer that
separate md5.[ch] files are used - and perhaps only one instance of them
included in the PHP distribution).
I also wrote a similar public domain implementation of MD4, which I can
provide for inclusion in hash_md.c if there's any interest.
Finally, it'd be nice if PHP could optionally link against OpenSSL
libcrypto to take advantage of the architecture-specific implementations
of these hashes. My implementations of MD5 and MD4 are function
prototype compatible with OpenSSL's.
I'd appreciate being CC'ed on any follow-ups as I am not on the list.
Thanks,
--
Alexander Peslyak <solar at openwall.com>
GPG key ID: 5B341F15 fp: B3FB 63F4 D7A3 BCCC 6F6E FC55 A2FC 027C 5B34 1F15
http://www.openwall.com - bringing security into open computing environments
On Sun, 9 Dec 2007 04:05:52 +0300, in php.internals solar@openwall.com
(Solar Designer) wrote:
Attached is a quick patch for PHP 5.2.5 that replaces RSA's copyrighted
implementation of MD5 with my public domain one:http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
A bit on a side note regarding the php function md5()
: In general it
is possible to supply an arbitrary number of bits as input to MD5.
However, the implementation of md5()
only takes a sting with octets as
the smallest fragments.
Would it somehow be possible to supply an input where the number of
bits does not add up to a number divisible by eight? Or is this
feature of md5 simply not relevant to anybody?
--
- Peter Brodersen
A bit on a side note regarding the php function
md5()
: In general it
is possible to supply an arbitrary number of bits as input to MD5.
However, the implementation ofmd5()
only takes a sting with octets as
the smallest fragments.Would it somehow be possible to supply an input where the number of
bits does not add up to a number divisible by eight? Or is this
feature of md5 simply not relevant to anybody?
I don't think there's any demand for this. Even the reference
implementation only works on whole octets, so there are no protocol
specs or anything (that I am aware of) that would require computation
of MD5 on a bitstream of arbitrary length.
Alexander
A few days ago, I wrote:
Attached is a quick patch for PHP 5.2.5 that replaces RSA's copyrighted
implementation of MD5 with my public domain one:http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
This also results in faster and slightly smaller code (both source and
binary). On a Pentium 3 that I've tested this on, the speedup is 12%
for the portable hashes as implemented in phpass:
The code from phpass is now being used by phpBB3 (3.0.0 release) and
WordPress (development). It is being considered for Drupal 7.
Wouldn't it be nice for PHP to be more efficient when running those
popular apps, and at the same time have cleaner and smaller code?
Of course, it'd be great if PHP would also integrate the CRYPT_BLOWFISH
code from Suhosin, but that can be discussed separately (if it needs a
discussion).
Given that most CPU time is spent on various overhead of the PHP
interpreter rather than on MD5 hashing itself (and gprof confirms this),
the 12% speedup seen on the PHP script as a whole means that the speedup
for the MD5 implementation alone is much higher than that. The speedup
should be similar on other little-endian CPUs (other x86 CPUs, x86-64,
Alpha), but smaller on big-endian.
A correction: actually, Alpha falls into the same category with
big-endian CPUs here (a smaller speedup - but nevertheless a speedup)
because it lacks hardware support for unaligned accesses. I must have
been too tired when I wrote the above.
I am not quoting the rest of my posting, although I'd appreciate
comments on the suggestions I had made there.
Thanks,
Alexander
Attached is a quick patch for PHP 5.2.5 that replaces RSA's copyrighted
implementation of MD5 with my public domain one:http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
Tried that one and it is about 30% faster indeed (on md5-only benchmark,
32-bit Linux on AMD Opteron. Anybody objects to accepting this?
--
Stanislav Malyshev, Zend Software Architect
stas@zend.com http://www.zend.com/
(408)253-8829 MSN: stas@zend.com
2007/12/20, Stanislav Malyshev stas@zend.com:
Attached is a quick patch for PHP 5.2.5 that replaces RSA's copyrighted
implementation of MD5 with my public domain one:http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
Tried that one and it is about 30% faster indeed (on md5-only benchmark,
32-bit Linux on AMD Opteron. Anybody objects to accepting this?
Just one, but is a mere formality. I didn't see any copyright notice on the
code nor a licensing document attached. I have not much of the legal
mumbo-jumbo comprehension, so, correct me if I'm wrong, shouldn't external
code that's to be included in the php codebase meet these legal formalities
so it doesn't become a liability?
That's all.
Best Regards,
Martin Alterisio
PS: Does anyone knows if using a nickname for authorship is considered
legally valid? I believe it might be valid, think about writers that use
such pseudonyms, but I'm not sure...
http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
Tried that one and it is about 30% faster indeed (on md5-only benchmark,
32-bit Linux on AMD Opteron. Anybody objects to accepting this?Just one, but is a mere formality. I didn't see any copyright notice on the
code nor a licensing document attached. I have not much of the legal
mumbo-jumbo comprehension, so, correct me if I'm wrong, shouldn't external
code that's to be included in the php codebase meet these legal formalities
so it doesn't become a liability?
Good point, the license should be explicitly specified.
I'm going to test the patch on Linux/64bit today.
--
Wbr,
Antony Dovgal
isn't "public domain" specific enough?
2007/12/20, Stanislav Malyshev stas@zend.com:
Attached is a quick patch for PHP 5.2.5 that replaces RSA's copyrighted
implementation of MD5 with my public domain one:http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
Tried that one and it is about 30% faster indeed (on md5-only benchmark,
32-bit Linux on AMD Opteron. Anybody objects to accepting this?Just one, but is a mere formality. I didn't see any copyright notice on the
code nor a licensing document attached. I have not much of the legal
mumbo-jumbo comprehension, so, correct me if I'm wrong, shouldn't external
code that's to be included in the php codebase meet these legal formalities
so it doesn't become a liability?That's all.
Best Regards,
Martin Alterisio
PS: Does anyone knows if using a nickname for authorship is considered
legally valid? I believe it might be valid, think about writers that use
such pseudonyms, but I'm not sure...
--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/
Alexey Zakhlestin wrote:
isn't "public domain" specific enough?
Public Domain actually isn't a universal concept, and it isn't
recognized everywhere in law. A very open BSD style license first,
followed by granting it to the public domain, should cover all bases.
Even somewhere that doesn't recognize public domain should fail over
to accept your BSD license.
No, public domain isn't recognized by copyright law as a copyright notice.
An extra line should be added that says something like "no copyright is
claimed", and attach a year range where the copyright is in effect (since
copyright expires, if I remember correctly). Anyway, the keyword here is
"copyright", which is recognized internationally (which means you should use
this word even if writing the copyright notice in another language).
Without a copyright notice, the code isn't apt for distribution. The author
could claim that he did not give his consent for public distribution and
that the code was an in-house development. Anyway, it's too complicate and I
understand very little. To get the real facts here, you should talk to your
lawyer.
2007/12/21, Alexey Zakhlestin indeyets@gmail.com:
isn't "public domain" specific enough?
2007/12/20, Stanislav Malyshev stas@zend.com:
Attached is a quick patch for PHP 5.2.5 that replaces RSA's
copyrighted
implementation of MD5 with my public domain one:http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
Tried that one and it is about 30% faster indeed (on md5-only
benchmark,
32-bit Linux on AMD Opteron. Anybody objects to accepting this?Just one, but is a mere formality. I didn't see any copyright notice on
the
code nor a licensing document attached. I have not much of the legal
mumbo-jumbo comprehension, so, correct me if I'm wrong, shouldn't
external
code that's to be included in the php codebase meet these legal
formalities
so it doesn't become a liability?That's all.
Best Regards,
Martin Alterisio
PS: Does anyone knows if using a nickname for authorship is considered
legally valid? I believe it might be valid, think about writers that use
such pseudonyms, but I'm not sure...--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/
Not entirely correct. As I pointed out in the other thread, not all
countries have the concept of transferrable copyright. Therefor, a
note should be added that explicitly states that everyone is free to
use it without permission, fees etc. Much like a BSD or MIT license,
but without the additional conditions of preserving copyright notices.
David
Am 21.12.2007 um 14:29 schrieb Martin Alterisio:
No, public domain isn't recognized by copyright law as a copyright
notice.
An extra line should be added that says something like "no copyright
is
claimed", and attach a year range where the copyright is in effect
(since
copyright expires, if I remember correctly). Anyway, the keyword
here is
"copyright", which is recognized internationally (which means you
should use
this word even if writing the copyright notice in another language).Without a copyright notice, the code isn't apt for distribution. The
author
could claim that he did not give his consent for public distribution
and
that the code was an in-house development. Anyway, it's too
complicate and I
understand very little. To get the real facts here, you should talk
to your
lawyer.2007/12/21, Alexey Zakhlestin indeyets@gmail.com:
isn't "public domain" specific enough?
2007/12/20, Stanislav Malyshev stas@zend.com:
Attached is a quick patch for PHP 5.2.5 that replaces RSA's
copyrighted
implementation of MD5 with my public domain one:http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
Tried that one and it is about 30% faster indeed (on md5-only
benchmark,
32-bit Linux on AMD Opteron. Anybody objects to accepting this?Just one, but is a mere formality. I didn't see any copyright
notice on
the
code nor a licensing document attached. I have not much of the legal
mumbo-jumbo comprehension, so, correct me if I'm wrong, shouldn't
external
code that's to be included in the php codebase meet these legal
formalities
so it doesn't become a liability?That's all.
Best Regards,
Martin Alterisio
PS: Does anyone knows if using a nickname for authorship is
considered
legally valid? I believe it might be valid, think about writers
that use
such pseudonyms, but I'm not sure...--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/
Thanks! I didn't know about that.
2007/12/21, David Zülke dz@bitxtender.com:
Not entirely correct. As I pointed out in the other thread, not all
countries have the concept of transferrable copyright. Therefor, a
note should be added that explicitly states that everyone is free to
use it without permission, fees etc. Much like a BSD or MIT license,
but without the additional conditions of preserving copyright notices.David
Am 21.12.2007 um 14:29 schrieb Martin Alterisio:
No, public domain isn't recognized by copyright law as a copyright
notice.
An extra line should be added that says something like "no copyright
is
claimed", and attach a year range where the copyright is in effect
(since
copyright expires, if I remember correctly). Anyway, the keyword
here is
"copyright", which is recognized internationally (which means you
should use
this word even if writing the copyright notice in another language).Without a copyright notice, the code isn't apt for distribution. The
author
could claim that he did not give his consent for public distribution
and
that the code was an in-house development. Anyway, it's too
complicate and I
understand very little. To get the real facts here, you should talk
to your
lawyer.2007/12/21, Alexey Zakhlestin indeyets@gmail.com:
isn't "public domain" specific enough?
2007/12/20, Stanislav Malyshev stas@zend.com:
Attached is a quick patch for PHP 5.2.5 that replaces RSA's
copyrighted
implementation of MD5 with my public domain one:http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
Tried that one and it is about 30% faster indeed (on md5-only
benchmark,
32-bit Linux on AMD Opteron. Anybody objects to accepting this?Just one, but is a mere formality. I didn't see any copyright
notice on
the
code nor a licensing document attached. I have not much of the legal
mumbo-jumbo comprehension, so, correct me if I'm wrong, shouldn't
external
code that's to be included in the php codebase meet these legal
formalities
so it doesn't become a liability?That's all.
Best Regards,
Martin Alterisio
PS: Does anyone knows if using a nickname for authorship is
considered
legally valid? I believe it might be valid, think about writers
that use
such pseudonyms, but I'm not sure...--
Alexey Zakhlestin
http://blog.milkfarmsoft.com/
Hello Solar,
thanks again for offering this. What we require is to have the PHP License
in the code. If you are fine with this we will gladly replace the existing
code with yours. You can of course put more header information in there to
stress out further that it is your code and where it can be found. The key
is that we indeed try to get rid of anything that is not under the PHP
License.
marcus
Sunday, December 9, 2007, 2:05:52 AM, you wrote:
Hi,
Attached is a quick patch for PHP 5.2.5 that replaces RSA's copyrighted
implementation of MD5 with my public domain one:
http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
This also results in faster and slightly smaller code (both source and
binary). On a Pentium 3 that I've tested this on, the speedup is 12%
for the portable hashes as implemented in phpass:
http://www.openwall.com/phpass/
Given that most CPU time is spent on various overhead of the PHP
interpreter rather than on MD5 hashing itself (and gprof confirms this),
the 12% speedup seen on the PHP script as a whole means that the speedup
for the MD5 implementation alone is much higher than that. The speedup
should be similar on other little-endian CPUs (other x86 CPUs, x86-64,
Alpha), but smaller on big-endian.
The code can be made cleaner by taking my md5.[ch] files as they are and
introducing two files more for PHP's added functions. I did not do it
for this patch in order for my changes to be less invasive.
I also did not similarly replace the MD5 implementation in hash_md.c,
which obviously will need to be done (once again, I'd prefer that
separate md5.[ch] files are used - and perhaps only one instance of them
included in the PHP distribution).
I also wrote a similar public domain implementation of MD4, which I can
provide for inclusion in hash_md.c if there's any interest.
Finally, it'd be nice if PHP could optionally link against OpenSSL
libcrypto to take advantage of the architecture-specific implementations
of these hashes. My implementations of MD5 and MD4 are function
prototype compatible with OpenSSL's.
I'd appreciate being CC'ed on any follow-ups as I am not on the list.
Thanks,
Best regards,
Marcus
I also wrote a similar public domain implementation of MD4, which I can
provide for inclusion in hash_md.c if there's any interest.
No objections from me on replacing the implementations in ext/hash.
Those were written with functional correctness in mind, not speed. So
long as the test vectors in ext/hash/tests continue to pass, I don't see
why faster algorithms wouldn't be quite welcome...
-Sara
Hi,
We are going to include your md5()
implementation into php-5.3.0.
I confirm at least 25% md5()
speedup on my Core2 3GHz, however license
issues are not clear.
We are going to distribute files under standard PHP license including
your original copyright notes.
The files which are going to be committed are attached.
Please confirm your agreement.
Thanks. Dmitry.
Solar Designer wrote:
Hi,
Attached is a quick patch for PHP 5.2.5 that replaces RSA's copyrighted
implementation of MD5 with my public domain one:http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/popa3d/popa3d/md5/
This also results in faster and slightly smaller code (both source and
binary). On a Pentium 3 that I've tested this on, the speedup is 12%
for the portable hashes as implemented in phpass:http://www.openwall.com/phpass/
Given that most CPU time is spent on various overhead of the PHP
interpreter rather than on MD5 hashing itself (and gprof confirms this),
the 12% speedup seen on the PHP script as a whole means that the speedup
for the MD5 implementation alone is much higher than that. The speedup
should be similar on other little-endian CPUs (other x86 CPUs, x86-64,
Alpha), but smaller on big-endian.The code can be made cleaner by taking my md5.[ch] files as they are and
introducing two files more for PHP's added functions. I did not do it
for this patch in order for my changes to be less invasive.I also did not similarly replace the MD5 implementation in hash_md.c,
which obviously will need to be done (once again, I'd prefer that
separate md5.[ch] files are used - and perhaps only one instance of them
included in the PHP distribution).I also wrote a similar public domain implementation of MD4, which I can
provide for inclusion in hash_md.c if there's any interest.Finally, it'd be nice if PHP could optionally link against OpenSSL
libcrypto to take advantage of the architecture-specific implementations
of these hashes. My implementations of MD5 and MD4 are function
prototype compatible with OpenSSL's.I'd appreciate being CC'ed on any follow-ups as I am not on the list.
Thanks,
Dmitry,
/*
- This is an OpenSSL-compatible implementation of the RSA Data Security,
- Inc. MD5 Message-Digest Algorithm (RFC 1321).
[...]
looks like your editor has gone wild on that comment in md5.c ;-)
johannes
Hi Dmitry and all,
First of all, please accept my apologies for failing to find the time to
participate in the licensing issues discussion in December. It is a
topic that I would like to discuss and arrive at a conclusion as I often
happen to write code that I'd like to release to the public under the most
relaxed terms possible. I thought that not claiming copyright (or even
disclaiming copyright) and placing the code into the public domain would
be it, but apparently in many (most? all?) jurisdictions there's no
explicitly specified way for someone to place their works into the
public domain (although the concept of public domain does exist - and
stuff "falls" there as old copyrights expire) and now it has also been
mentioned that some jurisdictions don't even recognize public domain at
all (I have not yet seen/heard a lawyer state that, though).
A possible solution could be to simultaneously try to place stuff in the
public domain with a statement to that extent and license it to the
public under very liberal terms. One issue with it is that I have to
not claim or disclaim copyright in order to place a work of mine into
the public domain, yet I have to be the copyright holder in order to
license that work. Maybe this can be taken care of with a severability
clause, making either the public domain or the license work in any given
jurisdiction. But I'd rather see/hear a lawyer comment on that before I
possibly go that route.
That said, a lot of software that we use has been placed in the public
domain by its authors. This includes some software by D. J. Bernstein,
perhaps best known as the author of qmail, who is also known for the
Bernstein vs. United States litigation - http://cr.yp.to/export.html -
so perhaps he should know the law. Then, public domain is officially
recognized as being compatible with GNU GPL by the FSF -
http://www.fsf.org/licensing/licenses/ - and is apparently recognized by
the OSI - http://opensource.org/node/239
We are going to include your
md5()
implementation into php-5.3.0.
Great!
I confirm at least 25%
md5()
speedup on my Core2 3GHz, however license
issues are not clear.
We are going to distribute files under standard PHP license including
your original copyright notes.
The files which are going to be committed are attached.Please confirm your agreement.
Confirmed. Please note, however, that there were no "copyright notes"
on my original files; instead, there was an authorship note and a public
domain statement.
I also have some comments on the modified files:
| Copyright (c) 1997-2008 The PHP Group |
...
| Author: Solar Designer <solar at openwall.com> |
So you claim copyright to a modified version of my code, that I had
placed in the public domain. This is fine by me.
I do not formally require it (in fact, I can't), but maybe the "Author"
line could be changed to either:
| Original author: Solar Designer <solar at openwall.com> |
or:
| Authors: Solar Designer <solar at openwall.com> with further |
| modifications by others. |
(or you can make it more explicit, e.g. "... by The PHP Group" if that
is appropriate - or whatever).
/* MD5 context. */
typedef struct {
php_uint32 lo, hi;
php_uint32 a, b, c, d;
unsigned char buffer[64];
php_uint32 block[16];
} PHP_MD5_CTX;
Maybe it would be better to do:
typedef php_uint32 MD5_u32plus;
and use the latter type. This would reduce the number of changes
between my version of the code and yours, making it easier for you to
sync to any newer versions of the code that I might make.
| Author: Solar Designer <solar at openwall.com> |
If you do choose to change this in the .h file, then do the same in the
.c, obviously.
#if (defined(APPLE) || defined(APPLE_CC)) && (defined(BIG_ENDIAN) || defined(LITTLE_ENDIAN))
if defined(LITTLE_ENDIAN)
undef WORDS_BIGENDIAN
else
if defined(BIG_ENDIAN)
define WORDS_BIGENDIAN
endif
endif
#endif
This looks wrong to me. One of the specific properties of my
implementation is that it does not strictly depend on the endianness
being correctly specified at compile-time (and at all, for that matter).
However, if you do happen to use the (little-endian and unaligned-OK)
optimized code on a system that is not in fact little-endian or does not
in fact tolerate unaligned accesses, then problems will arise! So any
#if's you use must assume (might-be-big-endian and might-disallow-unaligned)
by default.
In fact, I am only aware of three widespread and general-purpose
architectures that satisfy the criteria for the optimized code:
#if defined(i386) || defined(x86_64) || defined(vax)
Thus, I suggest that you leave the above #if intact, the way it was in
the patch that I submitted. Do not explicitly check for any endianness
macros - this is bound to cause problems.
/*
- SET reads 4 input bytes in little-endian byte order and stores them
- in a properly aligned word in host byte order.
* The check for little-endian architectures that tolerate unaligned
* memory accesses is just an optimization. Nothing will break if it
* doesn't work.
*/
#ifndef WORDS_BIGENDIAN
define SET(n) \
(*(php_uint32 *)&ptr[(n) * 4])
define GET(n) \
SET(n)
#else
...
As explained above, I strongly recommend that you revert your "#ifndef
WORDS_BIGENDIAN" to my "#if ..."
What if an architecture is big-endian, but WORDS_BIGENDIAN just happens
to not be specified? You'll have incorrect results (not MD5), whereas
with my version of the code, everything will be just fine.
Similarly, regardless of endianness, if WORDS_BIGENDIAN is not specified
(maybe because the architecture is in fact little-endian), but the
architecture does not tolerate unaligned accesses (at all or supports
them with kernel emulation), things will go wrong (SIGBUS or very poor
performance and a flood of kernel messages). This issue can't occur
with my original #if that only lists specific known-safe architectures.
data = body(ctx, data, size & ~(unsigned long)0x3f);
If you change all of my unsigned long's to size_t, you should change
this one as well.
When on a 64-bit system (userland pointer size), your size_t better be
64-bit as well (I have not checked whether this is necessarily the case;
I hope so).
PHPAPI void PHP_MD5Final(unsigned char *result, PHP_MD5_CTX *ctx)
{
unsigned long used, free;
Here's another one.
Thanks,
Alexander Peslyak <solar at openwall.com>
GPG key ID: 5B341F15 fp: B3FB 63F4 D7A3 BCCC 6F6E FC55 A2FC 027C 5B34 1F15
http://www.openwall.com - bringing security into open computing environments
Hi Alexander,
Stupid question maybe, but why can't you use your given name for this?
Caveat: please don't let this email stand in the way of your giving your
gift.
- Steph
----- Original Message -----
From: "Solar Designer" solar@openwall.com
To: "Dmitry Stogov" dmitry@zend.com
Cc: "Stanislav Malyshev" stas@zend.com; "Andi Gutmans" andi@zend.com;
"PHP Internals List" internals@lists.php.net
Sent: Tuesday, February 05, 2008 11:50 PM
Subject: Re: [PHP-DEV] faster & public domain MD5 implementation
Hi Dmitry and all,
First of all, please accept my apologies for failing to find the time to
participate in the licensing issues discussion in December. It is a
topic that I would like to discuss and arrive at a conclusion as I often
happen to write code that I'd like to release to the public under the most
relaxed terms possible. I thought that not claiming copyright (or even
disclaiming copyright) and placing the code into the public domain would
be it, but apparently in many (most? all?) jurisdictions there's no
explicitly specified way for someone to place their works into the
public domain (although the concept of public domain does exist - and
stuff "falls" there as old copyrights expire) and now it has also been
mentioned that some jurisdictions don't even recognize public domain at
all (I have not yet seen/heard a lawyer state that, though).A possible solution could be to simultaneously try to place stuff in the
public domain with a statement to that extent and license it to the
public under very liberal terms. One issue with it is that I have to
not claim or disclaim copyright in order to place a work of mine into
the public domain, yet I have to be the copyright holder in order to
license that work. Maybe this can be taken care of with a severability
clause, making either the public domain or the license work in any given
jurisdiction. But I'd rather see/hear a lawyer comment on that before I
possibly go that route.That said, a lot of software that we use has been placed in the public
domain by its authors. This includes some software by D. J. Bernstein,
perhaps best known as the author of qmail, who is also known for the
Bernstein vs. United States litigation - http://cr.yp.to/export.html -
so perhaps he should know the law. Then, public domain is officially
recognized as being compatible with GNU GPL by the FSF -
http://www.fsf.org/licensing/licenses/ - and is apparently recognized by
the OSI - http://opensource.org/node/239We are going to include your
md5()
implementation into php-5.3.0.Great!
I confirm at least 25%
md5()
speedup on my Core2 3GHz, however license
issues are not clear.
We are going to distribute files under standard PHP license including
your original copyright notes.
The files which are going to be committed are attached.Please confirm your agreement.
Confirmed. Please note, however, that there were no "copyright notes"
on my original files; instead, there was an authorship note and a public
domain statement.I also have some comments on the modified files:
| Copyright (c) 1997-2008 The PHP Group
|
...
| Author: Solar Designer <solar at openwall.com>
|So you claim copyright to a modified version of my code, that I had
placed in the public domain. This is fine by me.I do not formally require it (in fact, I can't), but maybe the "Author"
line could be changed to either:| Original author: Solar Designer <solar at openwall.com> |
or:
| Authors: Solar Designer <solar at openwall.com> with further |
| modifications by others. |(or you can make it more explicit, e.g. "... by The PHP Group" if that
is appropriate - or whatever)./* MD5 context. */
typedef struct {
php_uint32 lo, hi;
php_uint32 a, b, c, d;
unsigned char buffer[64];
php_uint32 block[16];
} PHP_MD5_CTX;Maybe it would be better to do:
typedef php_uint32 MD5_u32plus;
and use the latter type. This would reduce the number of changes
between my version of the code and yours, making it easier for you to
sync to any newer versions of the code that I might make.| Author: Solar Designer <solar at openwall.com>
|If you do choose to change this in the .h file, then do the same in the
.c, obviously.#if (defined(APPLE) || defined(APPLE_CC)) &&
(defined(BIG_ENDIAN) || defined(LITTLE_ENDIAN))if defined(LITTLE_ENDIAN)
undef WORDS_BIGENDIAN
else
if defined(BIG_ENDIAN)
define WORDS_BIGENDIAN
endif
endif
#endif
This looks wrong to me. One of the specific properties of my
implementation is that it does not strictly depend on the endianness
being correctly specified at compile-time (and at all, for that matter).
However, if you do happen to use the (little-endian and unaligned-OK)
optimized code on a system that is not in fact little-endian or does not
in fact tolerate unaligned accesses, then problems will arise! So any
#if's you use must assume (might-be-big-endian and
might-disallow-unaligned)
by default.In fact, I am only aware of three widespread and general-purpose
architectures that satisfy the criteria for the optimized code:#if defined(i386) || defined(x86_64) || defined(vax)
Thus, I suggest that you leave the above #if intact, the way it was in
the patch that I submitted. Do not explicitly check for any endianness
macros - this is bound to cause problems./*
- SET reads 4 input bytes in little-endian byte order and stores them
- in a properly aligned word in host byte order.
* The check for little-endian architectures that tolerate
unaligned
* memory accesses is just an optimization. Nothing will break if
it
* doesn't work.
*/
#ifndef WORDS_BIGENDIAN
define SET(n) \
(*(php_uint32 *)&ptr[(n) * 4])
define GET(n) \
SET(n)
#else
...As explained above, I strongly recommend that you revert your "#ifndef
WORDS_BIGENDIAN" to my "#if ..."What if an architecture is big-endian, but WORDS_BIGENDIAN just happens
to not be specified? You'll have incorrect results (not MD5), whereas
with my version of the code, everything will be just fine.Similarly, regardless of endianness, if WORDS_BIGENDIAN is not specified
(maybe because the architecture is in fact little-endian), but the
architecture does not tolerate unaligned accesses (at all or supports
them with kernel emulation), things will go wrong (SIGBUS or very poor
performance and a flood of kernel messages). This issue can't occur
with my original #if that only lists specific known-safe architectures.data = body(ctx, data, size & ~(unsigned long)0x3f);
If you change all of my unsigned long's to size_t, you should change
this one as well.When on a 64-bit system (userland pointer size), your size_t better be
64-bit as well (I have not checked whether this is necessarily the case;
I hope so).PHPAPI void PHP_MD5Final(unsigned char *result, PHP_MD5_CTX *ctx)
{
unsigned long used, free;Here's another one.
Thanks,
Alexander Peslyak <solar at openwall.com>
GPG key ID: 5B341F15 fp: B3FB 63F4 D7A3 BCCC 6F6E FC55 A2FC 027C 5B34
1F15
http://www.openwall.com - bringing security into open computing
environments
Hi Steph,
Stupid question maybe, but why can't you use your given name for this?
I can, and I don't mind you replacing all occurrences of "Solar
Designer" with "Alexander Peslyak" in these two files - or would you
like me to do that myself? I don't care much about this. I am simply
better known in the "security community" as Solar Designer, although
this is changing and the e-mail address probably gives me away despite
of your attempt to hide me under my real name. ;-)
Caveat: please don't let this email stand in the way of your giving your
gift.
Your question is just fine, and I might have more "gifts" for PHP
(also related to cryptographic hashes) if this one goes in well.
Thanks,
Alexander
Hi Alexander,
I've updated the files according to your notes, except
php_uint32/MD5_u32plus.
However new version breaks ext/hash md4. (ext/hash/tests/md4.phpt is
broken).
Sara, did you have a solution for this issue?
Or could you look into it please?
Thanks. Dmitry.
Solar Designer wrote:
Hi Dmitry and all,
First of all, please accept my apologies for failing to find the time to
participate in the licensing issues discussion in December. It is a
topic that I would like to discuss and arrive at a conclusion as I often
happen to write code that I'd like to release to the public under the most
relaxed terms possible. I thought that not claiming copyright (or even
disclaiming copyright) and placing the code into the public domain would
be it, but apparently in many (most? all?) jurisdictions there's no
explicitly specified way for someone to place their works into the
public domain (although the concept of public domain does exist - and
stuff "falls" there as old copyrights expire) and now it has also been
mentioned that some jurisdictions don't even recognize public domain at
all (I have not yet seen/heard a lawyer state that, though).A possible solution could be to simultaneously try to place stuff in the
public domain with a statement to that extent and license it to the
public under very liberal terms. One issue with it is that I have to
not claim or disclaim copyright in order to place a work of mine into
the public domain, yet I have to be the copyright holder in order to
license that work. Maybe this can be taken care of with a severability
clause, making either the public domain or the license work in any given
jurisdiction. But I'd rather see/hear a lawyer comment on that before I
possibly go that route.That said, a lot of software that we use has been placed in the public
domain by its authors. This includes some software by D. J. Bernstein,
perhaps best known as the author of qmail, who is also known for the
Bernstein vs. United States litigation - http://cr.yp.to/export.html -
so perhaps he should know the law. Then, public domain is officially
recognized as being compatible with GNU GPL by the FSF -
http://www.fsf.org/licensing/licenses/ - and is apparently recognized by
the OSI - http://opensource.org/node/239We are going to include your
md5()
implementation into php-5.3.0.Great!
I confirm at least 25%
md5()
speedup on my Core2 3GHz, however license
issues are not clear.
We are going to distribute files under standard PHP license including
your original copyright notes.
The files which are going to be committed are attached.Please confirm your agreement.
Confirmed. Please note, however, that there were no "copyright notes"
on my original files; instead, there was an authorship note and a public
domain statement.I also have some comments on the modified files:
| Copyright (c) 1997-2008 The PHP Group |
...
| Author: Solar Designer <solar at openwall.com> |So you claim copyright to a modified version of my code, that I had
placed in the public domain. This is fine by me.I do not formally require it (in fact, I can't), but maybe the "Author"
line could be changed to either:| Original author: Solar Designer <solar at openwall.com> |
or:
| Authors: Solar Designer <solar at openwall.com> with further |
| modifications by others. |(or you can make it more explicit, e.g. "... by The PHP Group" if that
is appropriate - or whatever)./* MD5 context. */
typedef struct {
php_uint32 lo, hi;
php_uint32 a, b, c, d;
unsigned char buffer[64];
php_uint32 block[16];
} PHP_MD5_CTX;Maybe it would be better to do:
typedef php_uint32 MD5_u32plus;
and use the latter type. This would reduce the number of changes
between my version of the code and yours, making it easier for you to
sync to any newer versions of the code that I might make.| Author: Solar Designer <solar at openwall.com> |
If you do choose to change this in the .h file, then do the same in the
.c, obviously.#if (defined(APPLE) || defined(APPLE_CC)) && (defined(BIG_ENDIAN) || defined(LITTLE_ENDIAN))
if defined(LITTLE_ENDIAN)
undef WORDS_BIGENDIAN
else
if defined(BIG_ENDIAN)
define WORDS_BIGENDIAN
endif
endif
#endif
This looks wrong to me. One of the specific properties of my
implementation is that it does not strictly depend on the endianness
being correctly specified at compile-time (and at all, for that matter).
However, if you do happen to use the (little-endian and unaligned-OK)
optimized code on a system that is not in fact little-endian or does not
in fact tolerate unaligned accesses, then problems will arise! So any
#if's you use must assume (might-be-big-endian and might-disallow-unaligned)
by default.In fact, I am only aware of three widespread and general-purpose
architectures that satisfy the criteria for the optimized code:#if defined(i386) || defined(x86_64) || defined(vax)
Thus, I suggest that you leave the above #if intact, the way it was in
the patch that I submitted. Do not explicitly check for any endianness
macros - this is bound to cause problems./*
- SET reads 4 input bytes in little-endian byte order and stores them
- in a properly aligned word in host byte order.
* The check for little-endian architectures that tolerate unaligned
* memory accesses is just an optimization. Nothing will break if it
* doesn't work.
*/
#ifndef WORDS_BIGENDIAN
define SET(n) \
(*(php_uint32 *)&ptr[(n) * 4])
define GET(n) \
SET(n)
#else
...As explained above, I strongly recommend that you revert your "#ifndef
WORDS_BIGENDIAN" to my "#if ..."What if an architecture is big-endian, but WORDS_BIGENDIAN just happens
to not be specified? You'll have incorrect results (not MD5), whereas
with my version of the code, everything will be just fine.Similarly, regardless of endianness, if WORDS_BIGENDIAN is not specified
(maybe because the architecture is in fact little-endian), but the
architecture does not tolerate unaligned accesses (at all or supports
them with kernel emulation), things will go wrong (SIGBUS or very poor
performance and a flood of kernel messages). This issue can't occur
with my original #if that only lists specific known-safe architectures.data = body(ctx, data, size & ~(unsigned long)0x3f);
If you change all of my unsigned long's to size_t, you should change
this one as well.When on a 64-bit system (userland pointer size), your size_t better be
64-bit as well (I have not checked whether this is necessarily the case;
I hope so).PHPAPI void PHP_MD5Final(unsigned char *result, PHP_MD5_CTX *ctx)
{
unsigned long used, free;Here's another one.
Thanks,
Alexander Peslyak <solar at openwall.com>
GPG key ID: 5B341F15 fp: B3FB 63F4 D7A3 BCCC 6F6E FC55 A2FC 027C 5B34 1F15
http://www.openwall.com - bringing security into open computing environments
Hi Dmitry,
I've updated the files according to your notes, except
php_uint32/MD5_u32plus.
...and except for adding a note that I'm not the only author (not of the
modified code). That's fine, if it's your preference, I don't care much
either way.
However new version breaks ext/hash md4. (ext/hash/tests/md4.phpt is
broken).
That's weird. Maybe you meant ext/hash/tests/md5.phpt, not md4? That
would be somewhat weird, too, as ext/hash/hash_md.c contains its own
copy of the MD5 code (to be replaced later) - but I suppose there could
be a symbol clash.
Did you check that PHP md5()
still works correctly after your changes?
(The patch I had submitted was fine in this respect, passing my tests.)
Some more comments on the files:
- This is an OpenSSL-compatible implementation of the RSA Data Security,
- Inc. MD5 Message-Digest Algorithm (RFC 1321).
* Written by Solar Designer <solar at openwall.com> in 2001, and placed
* in the public domain. There's absolutely no warranty.
*
* This differs from Colin Plumb's older public domain implementation in
...
* and avoid compile-time configuration.
Why are you re-formatting these comments in such a weird way, making
lines exceed 80 chars for long comments?
PHPAPI void PHP_MD5Final(unsigned char *result, PHP_MD5_CTX *ctx)
{
php_uint32 used, free;
These should be size_t, not php_uint32.
PHPAPI void make_digest_ex(char *md5str, const unsigned char *digest, int len);
You could want to change this to "size_t len" as well, although this
function existed before and used "int". Obviously, if you do change
it, you need to do so in both files and perhaps also in some callers.
Maybe this should be done with a separate commit (a change not directly
related to replacing the MD5 implementation).
Thanks,
--
Alexander Peslyak <solar at openwall.com>
GPG key ID: 5B341F15 fp: B3FB 63F4 D7A3 BCCC 6F6E FC55 A2FC 027C 5B34 1F15
http://www.openwall.com - bringing security into open computing environments
A followup on my own posting:
...and except for adding a note that I'm not the only author (not of the
modified code). That's fine, if it's your preference, I don't care much
either way.
Actually, I am also not the author of some of the code that you're
leaving intact in PHP's md5.c - that's functions make_digest(),
make_digest_ex(), php_if_md5(), and php_if_md5_file(). My contribution
is limited to replacing RSA's implementation of MD5 itself; I did not
contribute replacements for these wrapper functions that are specific
to PHP.
The original PHP's md5.h had "Author: Rasmus Lerdorf" on it, but I am
not sure what that applied to (the header file is small and is hardly
subject to copyright). The original PHP's md5.c had:
| Author: Lachlan Roche
...
- md5.c - Copyright 1997 Lachlan Roche
-
md5_file()
added by Alessandro Astarita aleast@capri.it
I am not sure who the code that remains after our changes should be
attributed to. Maybe you could check the commits history to figure this
out, if it matters.
One thing I'm sure of is that it's OK to drop RSA's copyright and
license notice, because we're replacing that code in its entirety - and
you already did.
However new version breaks ext/hash md4. (ext/hash/tests/md4.phpt is
broken).That's weird. Maybe you meant ext/hash/tests/md5.phpt, not md4? That
would be somewhat weird, too, as ext/hash/hash_md.c contains its own
copy of the MD5 code (to be replaced later) - but I suppose there could
be a symbol clash.
I've recalled that the MD5 code in hash_md.c is within:
#ifdef PHP_HASH_MD5_NOT_IN_CORE
...
#endif /* PHP_HASH_MD5_NOT_IN_CORE */
So if our changes broke the MD5 implementation somehow, it is no
surprise that ext/hash/tests/md5.phpt would fail.
However, I've diff'ed the latest md5.[ch] that you posted against those
I had submitted (and tested) and I failed to notice any obvious bugs
(worse than the php_uint32 vs. size_t issue in PHP_MD5Final(), which
should not cause those tests on short strings to fail).
So this needs further testing on a clean build and, if necessary,
debugging... maybe there's some issue specific to the program at large
that is not obvious when looking just at the changes.
Of course, I've been assuming that php_uint32 is in fact an unsigned
32-bit integer data type.
Alexander
However new version breaks ext/hash md4. (ext/hash/tests/md4.phpt is
broken).
Oh, I am able to reproduce this with my original patch:
PASS hmac-md5 algorithm [ext/hash/tests/hmac-md5.phpt]
PASS md2 algorithm [ext/hash/tests/md2.phpt]
FAIL md4 algorithm [ext/hash/tests/md4.phpt]
PASS md5 algorithm [ext/hash/tests/md5.phpt]
PASS ripemd128 algorithm [ext/hash/tests/ripemd128.phpt]
It didn't occur to me to pay attention to the MD4 test after patching
MD5. Really weird. I'll look into it.
Alexander
I wrote:
Oh, I am able to reproduce this with my original patch:
PASS hmac-md5 algorithm [ext/hash/tests/hmac-md5.phpt]
PASS md2 algorithm [ext/hash/tests/md2.phpt]
FAIL md4 algorithm [ext/hash/tests/md4.phpt]
PASS md5 algorithm [ext/hash/tests/md5.phpt]
PASS ripemd128 algorithm [ext/hash/tests/ripemd128.phpt]It didn't occur to me to pay attention to the MD4 test after patching
MD5. Really weird. I'll look into it.
OK, I think I found it. ext/hash/php_hash_md.h has this:
#define PHP_MD4Init PHP_MD5Init
which breaks when the two implementations are not that similar anymore.
Replacing the MD4 implementation with mine as well would fix this (or
hide the bug, depending on your point of view), but for now I think the
right fix would be to define a PHP_MD4Init() function explicitly.
Alexander
I wrote:
OK, I think I found it. ext/hash/php_hash_md.h has this:
#define PHP_MD4Init PHP_MD5Init
which breaks when the two implementations are not that similar anymore.
Replacing the MD4 implementation with mine as well would fix this (or
hide the bug, depending on your point of view), but for now I think the
right fix would be to define a PHP_MD4Init() function explicitly.
The patch is attached. It contains two other tiny changes:
-
Replaces two of MD4's basic functions with more optimal versions
(faster and smaller code). -
Corrects a typo in a nearby comment.
This patch may be applied independently and before the MD5 replacement
patch - but it is required for the MD5 replacement patch.
Alexander
Thank you for fix.
I assume now the patch is ready to commit.
I'll commit it tomorrow in case of no objections.
Thanks. Dmitry.
Solar Designer wrote:
I wrote:
OK, I think I found it. ext/hash/php_hash_md.h has this:
#define PHP_MD4Init PHP_MD5Init
which breaks when the two implementations are not that similar anymore.
Replacing the MD4 implementation with mine as well would fix this (or
hide the bug, depending on your point of view), but for now I think the
right fix would be to define a PHP_MD4Init() function explicitly.The patch is attached. It contains two other tiny changes:
Replaces two of MD4's basic functions with more optimal versions
(faster and smaller code).Corrects a typo in a nearby comment.
This patch may be applied independently and before the MD5 replacement
patch - but it is required for the MD5 replacement patch.Alexander
I assume now the patch is ready to commit.
I'll commit it tomorrow in case of no objections.
Yes, it is. No objections from me.
Just two minor points:
-
You have not yet fixed the size_t vs. php_uint32 issue in
PHP_MD5Final(), leaving it inconsistent with PHP_MD5Update(). This
should not affect correctness of the code, but it's just weird. -
The way you have re-formatted my comments still looks weird to me.
Thanks,
Alexander
Hi!
Nice addition but can you please try to keep on thread for the
discussions? Thanks :-)
I assume now the patch is ready to commit.
I'll commit it tomorrow in case of no objections.Yes, it is. No objections from me.
Just two minor points:
You have not yet fixed the size_t vs. php_uint32 issue in
PHP_MD5Final(), leaving it inconsistent with PHP_MD5Update(). This
should not affect correctness of the code, but it's just weird.The way you have re-formatted my comments still looks weird to me.
Thanks,
Alexander
--
--
Pierre
http://blog.thepimp.net | http://www.libgd.org
Hi Pierre,
Nice addition but can you please try to keep on thread for the
discussions? Thanks :-)
Are you referring to the changing Subject? If so, I prefer Subjects
that reflect message content, whereas threading can, should, and usually
does rely on Message-ID, In-Reply-To, and References headers. If this
mailing list has a different rule - such as to keep the Subject fixed
even when discussion deviates - I am willing to follow it indeed... In
fact, this is what I did now (kept the Subject although this message is
not on the patch anymore), even though I dislike having to do it.
Alexander
Hi Alexander,
Hi Pierre,
Nice addition but can you please try to keep on thread for the
discussions? Thanks :-)Are you referring to the changing Subject?
Yes, three times in a row made me thought that it would be nice to
keep one thread to discuss this change :)
If so, I prefer Subjects
that reflect message content, whereas threading can, should, and usually
does rely on Message-ID, In-Reply-To, and References headers. If this
mailing list has a different rule - such as to keep the Subject fixed
even when discussion deviates - I am willing to follow it indeed... In
fact, this is what I did now (kept the Subject although this message is
not on the patch anymore), even though I dislike having to do it.
No big deal but I don't see a different topic, it is about your patch
and to replace the existing implementation :)
Anyway, great work!!
Cheers,
Hi,
Solar Designer wrote:
I assume now the patch is ready to commit.
I'll commit it tomorrow in case of no objections.Yes, it is. No objections from me.
Just two minor points:
- You have not yet fixed the size_t vs. php_uint32 issue in
PHP_MD5Final(), leaving it inconsistent with PHP_MD5Update(). This
should not affect correctness of the code, but it's just weird.
According to algorithm "used" and "free" in PHP_MD5Final cannot be more
than 64, so I don't see any reason for unnecessary conversions. Looking
more careful I think they must be changed into php_uint32 in
PHP_MD5Update too.
- The way you have re-formatted my comments still looks weird to me.
ops. sorry for that. I'll put them in proper place before commit.
Thanks. Dmitry.
Thanks,
Alexander
I'm wondered, how did I change the comment style, because I didn't do it
by hand. :)
Dmitry.
Dmitry Stogov wrote:
Hi,
Solar Designer wrote:
I assume now the patch is ready to commit.
I'll commit it tomorrow in case of no objections.Yes, it is. No objections from me.
Just two minor points:
- You have not yet fixed the size_t vs. php_uint32 issue in
PHP_MD5Final(), leaving it inconsistent with PHP_MD5Update(). This
should not affect correctness of the code, but it's just weird.According to algorithm "used" and "free" in PHP_MD5Final cannot be more
than 64, so I don't see any reason for unnecessary conversions. Looking
more careful I think they must be changed into php_uint32 in
PHP_MD5Update too.
- The way you have re-formatted my comments still looks weird to me.
ops. sorry for that. I'll put them in proper place before commit.
Thanks. Dmitry.
Thanks,
Alexander
I'm wondered, how did I change the comment style, because I didn't do it
by hand. :)
The comments in the patch look a bit odd (looks like they were pasted to vim
without :set paste).
Regards,
Stefan
=)
--
Wbr,
Antony Dovgal
According to algorithm "used" and "free" in PHP_MD5Final cannot be more
than 64, so I don't see any reason for unnecessary conversions. Looking
more careful I think they must be changed into php_uint32 in
PHP_MD5Update too.
You're correct, except that it is non-obvious which incurs more type
conversions and which of those conversions have a runtime cost and which
don't. For example, "&ctx->buffer[used]" on a 64-bit system may have to
expand "used" to 64 bits for effective address calculation, which may or
may not cost (depends on properties of the architecture and the compiler).
Similarly, passing "free" as the size argument to memcpy() may have to
expand it to 64 bits to match the ABI and/or the prototype for memcpy().
OK, I've tried to count those likely conversions - and my conclusion is
that having "used" and "free" at size_t in both functions results in
fewer conversions (or even none) when size_t matches the pointer size
and php_uint32 doesn't. Specifically, the only places where anything
(that touches these two variables) would need to be zero-extended are:
used = saved_lo & 0x3f;
...
used = ctx->lo & 0x3f;
In both cases, we have a bitwise AND anyway - so the compiler might
instead expand the constant to 64-bit (or use an instruction that
expands its immediate argument to 64 bits instead of to 32 bits).
Anyway, this is obviously not important, so I am fine with your latest
patch the way it is, if you prefer to have it that way.
Thanks,
Alexander
On Thursday 07 February 2008 14:21:15 Dmitry Stogov wrote:
Thank you for fix.
I assume now the patch is ready to commit.
I'll commit it tomorrow in case of no objections.Thanks. Dmitry.
The comments in the patch look a bit odd (looks like they were pasted to vim
without :set paste).
Regards,
Stefan
On Thursday 07 February 2008 14:21:15 Dmitry Stogov wrote:
Thank you for fix.
I assume now the patch is ready to commit.
I'll commit it tomorrow in case of no objections.Thanks. Dmitry.
The comments in the patch look a bit odd (looks like they were pasted to vim
without :set paste).
Yeah, that's what caused it, I believe.
--
Wbr,
Antony Dovgal