Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:35285 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 6989 invoked by uid 1010); 7 Feb 2008 10:11:45 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 6974 invoked from network); 7 Feb 2008 10:11:45 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 7 Feb 2008 10:11:45 -0000 Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 212.25.124.162 as permitted sender) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 212.25.124.162 mail.zend.com Linux 2.5 (sometimes 2.4) (4) Received: from [212.25.124.162] ([212.25.124.162:62915] helo=mail.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D5/54-10179-E59DAA74 for ; Thu, 07 Feb 2008 05:11:43 -0500 Received: (qmail 21174 invoked from network); 7 Feb 2008 10:11:38 -0000 Received: from unknown (HELO ?10.1.20.2?) (10.1.20.2) by mail.zend.net with SMTP; 7 Feb 2008 10:11:38 -0000 Message-ID: <47AAD95A.8010109@zend.com> Date: Thu, 07 Feb 2008 13:11:38 +0300 User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: Solar Designer , Sara Golemon CC: Stanislav Malyshev , Andi Gutmans , PHP Internals List References: <20071209010552.GA12561@openwall.com> <47A849D0.8050508@zend.com> <20080205235055.GA19309@openwall.com> In-Reply-To: <20080205235055.GA19309@openwall.com> Content-Type: multipart/mixed; boundary="------------020309080801090609010402" Subject: Re: [PHP-DEV] faster & public domain MD5 implementation From: dmitry@zend.com (Dmitry Stogov) --------------020309080801090609010402 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi Alexander, I've updated the files according to your notes, except php_uint32/MD5_u32plus. However new version breaks ext/hash md4. (ext/hash/tests/md4.phpt is broken). Sara, did you have a solution for this issue? Or could you look into it please? Thanks. Dmitry. Solar Designer wrote: > Hi Dmitry and all, > > First of all, please accept my apologies for failing to find the time to > participate in the licensing issues discussion in December. It is a > topic that I would like to discuss and arrive at a conclusion as I often > happen to write code that I'd like to release to the public under the most > relaxed terms possible. I thought that not claiming copyright (or even > disclaiming copyright) and placing the code into the public domain would > be it, but apparently in many (most? all?) jurisdictions there's no > explicitly specified way for someone to place their works into the > public domain (although the concept of public domain does exist - and > stuff "falls" there as old copyrights expire) and now it has also been > mentioned that some jurisdictions don't even recognize public domain at > all (I have not yet seen/heard a lawyer state that, though). > > A possible solution could be to simultaneously try to place stuff in the > public domain with a statement to that extent and license it to the > public under very liberal terms. One issue with it is that I have to > not claim or disclaim copyright in order to place a work of mine into > the public domain, yet I have to be the copyright holder in order to > license that work. Maybe this can be taken care of with a severability > clause, making either the public domain or the license work in any given > jurisdiction. But I'd rather see/hear a lawyer comment on that before I > possibly go that route. > > That said, a lot of software that we use has been placed in the public > domain by its authors. This includes some software by D. J. Bernstein, > perhaps best known as the author of qmail, who is also known for the > Bernstein vs. United States litigation - http://cr.yp.to/export.html - > so perhaps he should know the law. Then, public domain is officially > recognized as being compatible with GNU GPL by the FSF - > http://www.fsf.org/licensing/licenses/ - and is apparently recognized by > the OSI - http://opensource.org/node/239 > > On Tue, Feb 05, 2008 at 02:34:40PM +0300, Dmitry Stogov wrote: >> We are going to include your md5() implementation into php-5.3.0. > > Great! > >> I confirm at least 25% md5() speedup on my Core2 3GHz, however license >> issues are not clear. >> We are going to distribute files under standard PHP license including >> your original copyright notes. >> The files which are going to be committed are attached. >> >> Please confirm your agreement. > > Confirmed. Please note, however, that there were no "copyright notes" > on my original files; instead, there was an authorship note and a public > domain statement. > > I also have some comments on the modified files: > >> | Copyright (c) 1997-2008 The PHP Group | > ... >> | Author: Solar Designer | > > So you claim copyright to a modified version of my code, that I had > placed in the public domain. This is fine by me. > > I do not formally require it (in fact, I can't), but maybe the "Author" > line could be changed to either: > > | Original author: Solar Designer | > > or: > > | Authors: Solar Designer with further | > | modifications by others. | > > (or you can make it more explicit, e.g. "... by The PHP Group" if that > is appropriate - or whatever). > >> /* MD5 context. */ >> typedef struct { >> php_uint32 lo, hi; >> php_uint32 a, b, c, d; >> unsigned char buffer[64]; >> php_uint32 block[16]; >> } PHP_MD5_CTX; > > Maybe it would be better to do: > > typedef php_uint32 MD5_u32plus; > > and use the latter type. This would reduce the number of changes > between my version of the code and yours, making it easier for you to > sync to any newer versions of the code that I might make. > >> | Author: Solar Designer | > > If you do choose to change this in the .h file, then do the same in the > .c, obviously. > >> #if (defined(__APPLE__) || defined(__APPLE_CC__)) && (defined(__BIG_ENDIAN__) || defined(__LITTLE_ENDIAN__)) >> # if defined(__LITTLE_ENDIAN__) >> # undef WORDS_BIGENDIAN >> # else >> # if defined(__BIG_ENDIAN__) >> # define WORDS_BIGENDIAN >> # endif >> # endif >> #endif > > This looks wrong to me. One of the specific properties of my > implementation is that it does not strictly depend on the endianness > being correctly specified at compile-time (and at all, for that matter). > However, if you do happen to use the (little-endian and unaligned-OK) > optimized code on a system that is not in fact little-endian or does not > in fact tolerate unaligned accesses, then problems will arise! So any > #if's you use must assume (might-be-big-endian and might-disallow-unaligned) > by default. > > In fact, I am only aware of three widespread and general-purpose > architectures that satisfy the criteria for the optimized code: > > #if defined(__i386__) || defined(__x86_64__) || defined(__vax__) > > Thus, I suggest that you leave the above #if intact, the way it was in > the patch that I submitted. Do not explicitly check for any endianness > macros - this is bound to cause problems. > >> /* >> * * SET reads 4 input bytes in little-endian byte order and stores them >> * * in a properly aligned word in host byte order. >> * * >> * * The check for little-endian architectures that tolerate unaligned >> * * memory accesses is just an optimization. Nothing will break if it >> * * doesn't work. >> * */ >> #ifndef WORDS_BIGENDIAN >> # define SET(n) \ >> (*(php_uint32 *)&ptr[(n) * 4]) >> # define GET(n) \ >> SET(n) >> #else > ... > > As explained above, I strongly recommend that you revert your "#ifndef > WORDS_BIGENDIAN" to my "#if ..." > > What if an architecture is big-endian, but WORDS_BIGENDIAN just happens > to not be specified? You'll have incorrect results (not MD5), whereas > with my version of the code, everything will be just fine. > > Similarly, regardless of endianness, if WORDS_BIGENDIAN is not specified > (maybe because the architecture is in fact little-endian), but the > architecture does not tolerate unaligned accesses (at all or supports > them with kernel emulation), things will go wrong (SIGBUS or very poor > performance and a flood of kernel messages). This issue can't occur > with my original #if that only lists specific known-safe architectures. > >> data = body(ctx, data, size & ~(unsigned long)0x3f); > > If you change all of my unsigned long's to size_t, you should change > this one as well. > > When on a 64-bit system (userland pointer size), your size_t better be > 64-bit as well (I have not checked whether this is necessarily the case; > I hope so). > >> PHPAPI void PHP_MD5Final(unsigned char *result, PHP_MD5_CTX *ctx) >> { >> unsigned long used, free; > > Here's another one. > > Thanks, > > Alexander Peslyak > GPG key ID: 5B341F15 fp: B3FB 63F4 D7A3 BCCC 6F6E FC55 A2FC 027C 5B34 1F15 > http://www.openwall.com - bringing security into open computing environments --------------020309080801090609010402 Content-Type: text/plain; name="md5.c" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="md5.c" /* +----------------------------------------------------------------------+ | PHP Version 5 | +----------------------------------------------------------------------+ | Copyright (c) 1997-2008 The PHP Group | +----------------------------------------------------------------------+ | This source file is subject to version 3.01 of the PHP license, | | that is bundled with this package in the file LICENSE, and is | | available through the world-wide-web at the following url: | | http://www.php.net/license/3_01.txt | | If you did not receive a copy of the PHP license and are unable to | | obtain it through the world-wide-web, please send a note to | | license@php.net so we can mail you a copy immediately. | +----------------------------------------------------------------------+ | Author: Alexander Peslyak (Solar Designer) | +----------------------------------------------------------------------+ */ /* $Id:$ */ #include "php.h" #include "md5.h" PHPAPI void make_digest(char *md5str, const unsigned char *digest) /* {{{ */ { make_digest_ex(md5str, digest, 16); } /* }}} */ PHPAPI void make_digest_ex(char *md5str, const unsigned char *digest, int len) /* {{{ */ { static const char hexits[17] = "0123456789abcdef"; int i; for (i = 0; i < len; i++) { md5str[i * 2] = hexits[digest[i] >> 4]; md5str[(i * 2) + 1] = hexits[digest[i] & 0x0F]; } md5str[len * 2] = '\0'; } /* }}} */ /* {{{ proto string md5(string str, [ bool raw_output]) Calculate the md5 hash of a string */ PHP_NAMED_FUNCTION(php_if_md5) { char *arg; int arg_len; zend_bool raw_output = 0; char md5str[33]; PHP_MD5_CTX context; unsigned char digest[16]; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s|b", &arg, &arg_len, &raw_output) == FAILURE) { return; } md5str[0] = '\0'; PHP_MD5Init(&context); PHP_MD5Update(&context, arg, arg_len); PHP_MD5Final(digest, &context); if (raw_output) { RETURN_STRINGL(digest, 16, 1); } else { make_digest_ex(md5str, digest, 16); RETVAL_STRING(md5str, 1); } } /* }}} */ /* {{{ proto string md5_file(string filename [, bool raw_output]) Calculate the md5 hash of given filename */ PHP_NAMED_FUNCTION(php_if_md5_file) { char *arg; int arg_len; zend_bool raw_output = 0; char md5str[33]; unsigned char buf[1024]; unsigned char digest[16]; PHP_MD5_CTX context; int n; php_stream *stream; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s|b", &arg, &arg_len, &raw_output) == FAILURE) { return; } stream = php_stream_open_wrapper(arg, "rb", REPORT_ERRORS | ENFORCE_SAFE_MODE, NULL); if (!stream) { RETURN_FALSE; } PHP_MD5Init(&context); while ((n = php_stream_read(stream, buf, sizeof(buf))) > 0) { PHP_MD5Update(&context, buf, n); } PHP_MD5Final(digest, &context); php_stream_close(stream); if (n<0) { RETURN_FALSE; } if (raw_output) { RETURN_STRINGL(digest, 16, 1); } else { make_digest_ex(md5str, digest, 16); RETVAL_STRING(md5str, 1); } } /* }}} */ /* * * This is an OpenSSL-compatible implementation of the RSA Data Security, * * Inc. MD5 Message-Digest Algorithm (RFC 1321). * * * * Written by Solar Designer in 2001, and placed * * in the public domain. There's absolutely no warranty. * * * * This differs from Colin Plumb's older public domain implementation in * * that no 32-bit integer data type is required, there's no compile-time * * endianness configuration, and the function prototypes match OpenSSL's. * * The primary goals are portability and ease of use. * * * * This implementation is meant to be fast, but not as fast as possible. * * Some known optimizations are not included to reduce source code size * * and avoid compile-time configuration. * */ #include /* * * The basic MD5 functions. * * * * F and G are optimized compared to their RFC 1321 definitions for * * architectures that lack an AND-NOT instruction, just like in Colin Plumb's * * implementation. * */ #define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z)))) #define G(x, y, z) ((y) ^ ((z) & ((x) ^ (y)))) #define H(x, y, z) ((x) ^ (y) ^ (z)) #define I(x, y, z) ((y) ^ ((x) | ~(z))) /* * * The MD5 transformation for all four rounds. * */ #define STEP(f, a, b, c, d, x, t, s) \ (a) += f((b), (c), (d)) + (x) + (t); \ (a) = (((a) << (s)) | (((a) & 0xffffffff) >> (32 - (s)))); \ (a) += (b); /* * * SET reads 4 input bytes in little-endian byte order and stores them * * in a properly aligned word in host byte order. * * * * The check for little-endian architectures that tolerate unaligned * * memory accesses is just an optimization. Nothing will break if it * * doesn't work. * */ #if defined(__i386__) || defined(__x86_64__) || defined(__vax__) # define SET(n) \ (*(php_uint32 *)&ptr[(n) * 4]) # define GET(n) \ SET(n) #else # define SET(n) \ (ctx->block[(n)] = \ (php_uint32)ptr[(n) * 4] | \ ((php_uint32)ptr[(n) * 4 + 1] << 8) | \ ((php_uint32)ptr[(n) * 4 + 2] << 16) | \ ((php_uint32)ptr[(n) * 4 + 3] << 24)) # define GET(n) \ (ctx->block[(n)]) #endif /* * * This processes one or more 64-byte data blocks, but does NOT update * * the bit counters. There are no alignment requirements. * */ static const void *body(PHP_MD5_CTX *ctx, const void *data, size_t size) { const unsigned char *ptr; php_uint32 a, b, c, d; php_uint32 saved_a, saved_b, saved_c, saved_d; ptr = data; a = ctx->a; b = ctx->b; c = ctx->c; d = ctx->d; do { saved_a = a; saved_b = b; saved_c = c; saved_d = d; /* Round 1 */ STEP(F, a, b, c, d, SET(0), 0xd76aa478, 7) STEP(F, d, a, b, c, SET(1), 0xe8c7b756, 12) STEP(F, c, d, a, b, SET(2), 0x242070db, 17) STEP(F, b, c, d, a, SET(3), 0xc1bdceee, 22) STEP(F, a, b, c, d, SET(4), 0xf57c0faf, 7) STEP(F, d, a, b, c, SET(5), 0x4787c62a, 12) STEP(F, c, d, a, b, SET(6), 0xa8304613, 17) STEP(F, b, c, d, a, SET(7), 0xfd469501, 22) STEP(F, a, b, c, d, SET(8), 0x698098d8, 7) STEP(F, d, a, b, c, SET(9), 0x8b44f7af, 12) STEP(F, c, d, a, b, SET(10), 0xffff5bb1, 17) STEP(F, b, c, d, a, SET(11), 0x895cd7be, 22) STEP(F, a, b, c, d, SET(12), 0x6b901122, 7) STEP(F, d, a, b, c, SET(13), 0xfd987193, 12) STEP(F, c, d, a, b, SET(14), 0xa679438e, 17) STEP(F, b, c, d, a, SET(15), 0x49b40821, 22) /* Round 2 */ STEP(G, a, b, c, d, GET(1), 0xf61e2562, 5) STEP(G, d, a, b, c, GET(6), 0xc040b340, 9) STEP(G, c, d, a, b, GET(11), 0x265e5a51, 14) STEP(G, b, c, d, a, GET(0), 0xe9b6c7aa, 20) STEP(G, a, b, c, d, GET(5), 0xd62f105d, 5) STEP(G, d, a, b, c, GET(10), 0x02441453, 9) STEP(G, c, d, a, b, GET(15), 0xd8a1e681, 14) STEP(G, b, c, d, a, GET(4), 0xe7d3fbc8, 20) STEP(G, a, b, c, d, GET(9), 0x21e1cde6, 5) STEP(G, d, a, b, c, GET(14), 0xc33707d6, 9) STEP(G, c, d, a, b, GET(3), 0xf4d50d87, 14) STEP(G, b, c, d, a, GET(8), 0x455a14ed, 20) STEP(G, a, b, c, d, GET(13), 0xa9e3e905, 5) STEP(G, d, a, b, c, GET(2), 0xfcefa3f8, 9) STEP(G, c, d, a, b, GET(7), 0x676f02d9, 14) STEP(G, b, c, d, a, GET(12), 0x8d2a4c8a, 20) /* Round 3 */ STEP(H, a, b, c, d, GET(5), 0xfffa3942, 4) STEP(H, d, a, b, c, GET(8), 0x8771f681, 11) STEP(H, c, d, a, b, GET(11), 0x6d9d6122, 16) STEP(H, b, c, d, a, GET(14), 0xfde5380c, 23) STEP(H, a, b, c, d, GET(1), 0xa4beea44, 4) STEP(H, d, a, b, c, GET(4), 0x4bdecfa9, 11) STEP(H, c, d, a, b, GET(7), 0xf6bb4b60, 16) STEP(H, b, c, d, a, GET(10), 0xbebfbc70, 23) STEP(H, a, b, c, d, GET(13), 0x289b7ec6, 4) STEP(H, d, a, b, c, GET(0), 0xeaa127fa, 11) STEP(H, c, d, a, b, GET(3), 0xd4ef3085, 16) STEP(H, b, c, d, a, GET(6), 0x04881d05, 23) STEP(H, a, b, c, d, GET(9), 0xd9d4d039, 4) STEP(H, d, a, b, c, GET(12), 0xe6db99e5, 11) STEP(H, c, d, a, b, GET(15), 0x1fa27cf8, 16) STEP(H, b, c, d, a, GET(2), 0xc4ac5665, 23) /* Round 4 */ STEP(I, a, b, c, d, GET(0), 0xf4292244, 6) STEP(I, d, a, b, c, GET(7), 0x432aff97, 10) STEP(I, c, d, a, b, GET(14), 0xab9423a7, 15) STEP(I, b, c, d, a, GET(5), 0xfc93a039, 21) STEP(I, a, b, c, d, GET(12), 0x655b59c3, 6) STEP(I, d, a, b, c, GET(3), 0x8f0ccc92, 10) STEP(I, c, d, a, b, GET(10), 0xffeff47d, 15) STEP(I, b, c, d, a, GET(1), 0x85845dd1, 21) STEP(I, a, b, c, d, GET(8), 0x6fa87e4f, 6) STEP(I, d, a, b, c, GET(15), 0xfe2ce6e0, 10) STEP(I, c, d, a, b, GET(6), 0xa3014314, 15) STEP(I, b, c, d, a, GET(13), 0x4e0811a1, 21) STEP(I, a, b, c, d, GET(4), 0xf7537e82, 6) STEP(I, d, a, b, c, GET(11), 0xbd3af235, 10) STEP(I, c, d, a, b, GET(2), 0x2ad7d2bb, 15) STEP(I, b, c, d, a, GET(9), 0xeb86d391, 21) a += saved_a; b += saved_b; c += saved_c; d += saved_d; ptr += 64; } while (size -= 64); ctx->a = a; ctx->b = b; ctx->c = c; ctx->d = d; return ptr; } PHPAPI void PHP_MD5Init(PHP_MD5_CTX *ctx) { ctx->a = 0x67452301; ctx->b = 0xefcdab89; ctx->c = 0x98badcfe; ctx->d = 0x10325476; ctx->lo = 0; ctx->hi = 0; } PHPAPI void PHP_MD5Update(PHP_MD5_CTX *ctx, const void *data, size_t size) { php_uint32 saved_lo; size_t used, free; saved_lo = ctx->lo; if ((ctx->lo = (saved_lo + size) & 0x1fffffff) < saved_lo) { ctx->hi++; } ctx->hi += size >> 29; used = saved_lo & 0x3f; if (used) { free = 64 - used; if (size < free) { memcpy(&ctx->buffer[used], data, size); return; } memcpy(&ctx->buffer[used], data, free); data = (unsigned char *)data + free; size -= free; body(ctx, ctx->buffer, 64); } if (size >= 64) { data = body(ctx, data, size & ~(size_t)0x3f); size &= 0x3f; } memcpy(ctx->buffer, data, size); } PHPAPI void PHP_MD5Final(unsigned char *result, PHP_MD5_CTX *ctx) { php_uint32 used, free; used = ctx->lo & 0x3f; ctx->buffer[used++] = 0x80; free = 64 - used; if (free < 8) { memset(&ctx->buffer[used], 0, free); body(ctx, ctx->buffer, 64); used = 0; free = 64; } memset(&ctx->buffer[used], 0, free - 8); ctx->lo <<= 3; ctx->buffer[56] = ctx->lo; ctx->buffer[57] = ctx->lo >> 8; ctx->buffer[58] = ctx->lo >> 16; ctx->buffer[59] = ctx->lo >> 24; ctx->buffer[60] = ctx->hi; ctx->buffer[61] = ctx->hi >> 8; ctx->buffer[62] = ctx->hi >> 16; ctx->buffer[63] = ctx->hi >> 24; body(ctx, ctx->buffer, 64); result[0] = ctx->a; result[1] = ctx->a >> 8; result[2] = ctx->a >> 16; result[3] = ctx->a >> 24; result[4] = ctx->b; result[5] = ctx->b >> 8; result[6] = ctx->b >> 16; result[7] = ctx->b >> 24; result[8] = ctx->c; result[9] = ctx->c >> 8; result[10] = ctx->c >> 16; result[11] = ctx->c >> 24; result[12] = ctx->d; result[13] = ctx->d >> 8; result[14] = ctx->d >> 16; result[15] = ctx->d >> 24; memset(ctx, 0, sizeof(*ctx)); } --------------020309080801090609010402 Content-Type: text/plain; name="md5.h" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="md5.h" /* +----------------------------------------------------------------------+ | PHP Version 5 | +----------------------------------------------------------------------+ | Copyright (c) 1997-2008 The PHP Group | +----------------------------------------------------------------------+ | This source file is subject to version 3.01 of the PHP license, | | that is bundled with this package in the file LICENSE, and is | | available through the world-wide-web at the following url: | | http://www.php.net/license/3_01.txt | | If you did not receive a copy of the PHP license and are unable to | | obtain it through the world-wide-web, please send a note to | | license@php.net so we can mail you a copy immediately. | +----------------------------------------------------------------------+ | Author: Alexander Peslyak (Solar Designer) | +----------------------------------------------------------------------+ */ /* $Id:$ */ #ifndef MD5_H #define MD5_H PHPAPI void make_digest(char *md5str, const unsigned char *digest); PHPAPI void make_digest_ex(char *md5str, const unsigned char *digest, int len); PHP_NAMED_FUNCTION(php_if_md5); PHP_NAMED_FUNCTION(php_if_md5_file); #include "ext/standard/basic_functions.h" /* * * This is an OpenSSL-compatible implementation of the RSA Data Security, * * Inc. MD5 Message-Digest Algorithm (RFC 1321). * * * * Written by Solar Designer in 2001, and placed * * in the public domain. There's absolutely no warranty. * * * * See md5.c for more information. * */ /* MD5 context. */ typedef struct { php_uint32 lo, hi; php_uint32 a, b, c, d; unsigned char buffer[64]; php_uint32 block[16]; } PHP_MD5_CTX; PHPAPI void PHP_MD5Init(PHP_MD5_CTX *ctx); PHPAPI void PHP_MD5Update(PHP_MD5_CTX *ctx, const void *data, size_t size); PHPAPI void PHP_MD5Final(unsigned char *result, PHP_MD5_CTX *ctx); #endif --------------020309080801090609010402--