Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:74240 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 48677 invoked from network); 15 May 2014 22:03:29 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 May 2014 22:03:29 -0000 Authentication-Results: pb1.pair.com smtp.mail=smalyshev@sugarcrm.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=smalyshev@sugarcrm.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain sugarcrm.com designates 108.166.43.123 as permitted sender) X-PHP-List-Original-Sender: smalyshev@sugarcrm.com X-Host-Fingerprint: 108.166.43.123 smtp123.ord1c.emailsrvr.com Linux 2.6 Received: from [108.166.43.123] ([108.166.43.123:40689] helo=smtp123.ord1c.emailsrvr.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 48/B1-33681-FA935735 for ; Thu, 15 May 2014 18:03:28 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp8.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id E8B561A11F8; Thu, 15 May 2014 18:03:24 -0400 (EDT) X-Virus-Scanned: OK Received: by smtp8.relay.ord1c.emailsrvr.com (Authenticated sender: smalyshev-AT-sugarcrm.com) with ESMTPSA id 949141A0576; Thu, 15 May 2014 18:03:24 -0400 (EDT) Message-ID: <537539AC.8080906@sugarcrm.com> Date: Thu, 15 May 2014 15:03:24 -0700 Organization: SugarCRM User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Pierre Joye , PHP internals References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] on memory usage with the 64bit patch, and interpretation of various numbers From: smalyshev@sugarcrm.com (Stas Malyshev) Hi! > ### It's The Correct Data Type > > The C89 spec indicates in 3.3.3.4 ( > http://port70.net/~nsz/c/c89/rationale/c3.html#size-95t-3-3-3-4 ) that > the size_t type was created specifically for usage in this context. It > is always, 100% guaranteed to be able to hold the bounds of every > possible array element. Strings in C are simply char arrays. Here is my problem with it - we don't need a type that allows to hold the bounds of every possible array element. It's like buying a house that could contain all your relatives, acquaintances, friends and people that you have ever met if they would decide to come to you to stay all at once. Too expensive and very impractical. We're using unified string sizes now, and 99% of the strings we're using never even reach limits of 16 bits, let alone come close to limits of int. Carrying around a 64-bit value to store that is just waste. We'd be just storing megabytes of zeroes without any use. Whatever theoretical reasons there are for generic application to use that, they hardly can be applied to very specialized and supposed to be highly optimized case as the language engine is. It's a fine argument in the generic case, but we do not have the generic case here. > ### It's The Secure Data Type "Security" is quickly becoming a thought-terminating cliche, both in programming and outside. "It's necessary for security", ergo we should pay whatever it takes for it and not question it. It's not right and we should not fall into this trap. We can and should question it, and evaluate the costs carefully. In most cases where we deal with possible overflows, 64-bit value can be overflown just as 32-bit can. Most integer overflows we've had in PHP were demonstrable on both 32-bit and 64-bit platforms. > size_t (and ptrdiff_t) are the only C89 types that are 100% guaranteed > to be able to hold the size of any possible object that the compiler > will support. Other types will vary depending on the data model that > the compiler supports, as the spec only defines minimum widths. Again, as I pointed out, it is fine in theory but in practice we a) don't need that and b) it doesn't actually add much security-wise as these values still can be overflown with the same ease in most cases (e.g., see recent fixed array size bug - same overflow regardless of length var size). Yes, we should be careful at mixing int with size-types, and that's why I welcome introduction of size-types to emphasize this. However, making them 64-bit is a different question for me. > This is so important that CERT issued a coding standard for it: > INT01-C ( https://www.securecoding.cert.org/confluence/display/seccode/INT01-C.+Use+rsize_t+or+size_t+for+all+integer+values+representing+the+size+of+an+object > ). This is a very good generic advice for writing generic functions like copy() example there. However, again, we don't have generic case here. We can have our copy()'s and emalloc()'s work with size_t. However, when we talking about zval, it's a bit different story. > One of the reasons is that it's difficult to do overflow checks in a > portable way. See VU#162289: https://www.kb.cert.org/vuls/id/162289 . I agree, doing it right may be tricky. However, we already have primitives for doing it that work and are optimized for common platforms. It is a solved problem. So bringing it as an argument does not make much sense - we don't need an additional effort to do this difficult thing, because we've already done it. Of course, these checks have to be actually used - but they would have to be used in 64-bit case too! > ### About Long Strings > > The fact that changing to size_t allows strings (and arrays) to be > > 4gb is a side-effect. A welcome one, but a side effect none the less. > The primary reason to use it is that it's the correct data type, and > gives you the most safety and security. Here I must disagree again, as I see inflating the string length variable as the most unwelcome side effect. Which we may yet find a way to tolerate it and work around it, but by itself it is nothing but a drag for anybody but 0.0001% of PHP developers who actually finds it necessary to stuff 4G strings into PHP. > But that's at the structure level. Let's look at what actually happens > in practice. Dmitry himself also provides these answers. The average > memory increase is 8% for Wordpress, and 6% for ZF1. > > Let's put that 8% in context. Wordpress used 12MB, and now it uses > 13MB. 1MB more. That's not overly significant. ZF used 29MB. Now it > uses 31MB. Still not overly significant. I think it is pretty significant. If we could reduce memory usage by 6-8%, would we consider it a win? I think we would. Thus, we should consider the same increase a loss. However, the bigger loss may be in inflating the sizes of frequently-used structures like zend_string. I think we should look very closely at how we can reduce the memory impact and not just dismiss it as insignificant. I like the idea of the patch, and the cleanup of the types and 64-bit support has been long overdue. However, I would hate to pay for that by dragging literally megabytes of zeroes around for no purpose but to satisfy an abstract requirement written for generic case. -- Stanislav Malyshev, Software Architect SugarCRM: http://www.sugarcrm.com/ (408)454-6900 ext. 227