Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:74240
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain sugarcrm.com designates 108.166.43.123 as permitted sender)
Message-ID: <537539AC.8080906@sugarcrm.com>
Date: Thu, 15 May 2014 15:03:24 -0700
Organization: SugarCRM
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: Pierre Joye <pierre.php@gmail.com>, 
 PHP internals <internals@lists.php.net>
References: <CAEZPtU6OfLM7M31TeoO587aeVgskEy9ZrUaMv-t7sq0g2spSdw@mail.gmail.com>
In-Reply-To: <CAEZPtU6OfLM7M31TeoO587aeVgskEy9ZrUaMv-t7sq0g2spSdw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [PHP-DEV] on memory usage with the 64bit patch, and interpretation
 of various numbers
From: smalyshev@sugarcrm.com (Stas Malyshev)

Hi!

> ### It's The Correct Data Type
> 
> The C89 spec indicates in 3.3.3.4 (
> http://port70.net/~nsz/c/c89/rationale/c3.html#size-95t-3-3-3-4 ) that
> the size_t type was created specifically for usage in this context. It
> is always, 100% guaranteed to be able to hold the bounds of every
> possible array element. Strings in C are simply char arrays.

Here is my problem with it - we don't need a type that allows to hold
the bounds of every possible array element. It's like buying a house
that could contain all your relatives, acquaintances, friends and people
that you have ever met if they would decide to come to you to stay all
at once. Too expensive and very impractical. We're using unified string
sizes now, and 99% of the strings we're using never even reach limits of
16 bits, let alone come close to limits of int. Carrying around a 64-bit
value to store that is just waste. We'd be just storing megabytes of
zeroes without any use. Whatever theoretical reasons there are for
generic application to use that, they hardly can be applied to very
specialized and supposed to be highly optimized case as the language
engine is. It's a fine argument in the generic case, but we do not have
the generic case here.

> ### It's The Secure Data Type

"Security" is quickly becoming a thought-terminating cliche, both in
programming and outside. "It's necessary for security", ergo we should
pay whatever it takes for it and not question it. It's not right and we
should not fall into this trap. We can and should question it, and
evaluate the costs carefully.

In most cases where we deal with possible overflows, 64-bit value can be
overflown just as 32-bit can. Most integer overflows we've had in PHP
were demonstrable on both 32-bit and 64-bit platforms.

> size_t (and ptrdiff_t) are the only C89 types that are 100% guaranteed
> to be able to hold the size of any possible object that the compiler
> will support. Other types will vary depending on the data model that
> the compiler supports, as the spec only defines minimum widths.

Again, as I pointed out, it is fine in theory but in practice we a)
don't need that and b) it doesn't actually add much security-wise as
these values still can be overflown with the same ease in most cases
(e.g., see recent fixed array size bug - same overflow regardless of
length var size). Yes, we should be careful at mixing int with
size-types, and that's why I welcome introduction of size-types to
emphasize this. However, making them 64-bit is a different question for me.

> This is so important that CERT issued a coding standard for it:
> INT01-C ( https://www.securecoding.cert.org/confluence/display/seccode/INT01-C.+Use+rsize_t+or+size_t+for+all+integer+values+representing+the+size+of+an+object
> ).

This is a very good generic advice for writing generic functions like
copy() example there. However, again, we don't have generic case here.
We can have our copy()'s and emalloc()'s work with size_t. However, when
we talking about zval, it's a bit different story.

> One of the reasons is that it's difficult to do overflow checks in a
> portable way. See VU#162289: https://www.kb.cert.org/vuls/id/162289 .

I agree, doing it right may be tricky. However, we already have
primitives for doing it that work and are optimized for common
platforms. It is a solved problem. So bringing it as an argument does
not make much sense - we don't need an additional effort to do this
difficult thing, because we've already done it. Of course, these checks
have to be actually used - but they would have to be used in 64-bit case
too!

> ### About Long Strings
> 
> The fact that changing to size_t allows strings (and arrays) to be >
> 4gb is a side-effect. A welcome one, but a side effect none the less.
> The primary reason to use it is that it's the correct data type, and
> gives you the most safety and security.

Here I must disagree again, as I see inflating the string length
variable as the most unwelcome side effect. Which we may yet find a way
to tolerate it and work around it, but by itself it is nothing but a
drag for anybody but 0.0001% of PHP developers who actually finds it
necessary to stuff 4G strings into PHP.

> But that's at the structure level. Let's look at what actually happens
> in practice. Dmitry himself also provides these answers. The average
> memory increase is 8% for Wordpress, and 6% for ZF1.
> 
> Let's put that 8% in context. Wordpress used 12MB, and now it uses
> 13MB. 1MB more. That's not overly significant. ZF used 29MB. Now it
> uses 31MB. Still not overly significant.

I think it is pretty significant. If we could reduce memory usage by
6-8%, would we consider it a win? I think we would. Thus, we should
consider the same increase a loss. However, the bigger loss may be in
inflating the sizes of frequently-used structures like zend_string.

I think we should look very closely at how we can reduce the memory
impact and not just dismiss it as insignificant. I like the idea of the
patch, and the cleanup of the types and 64-bit support has been long
overdue. However, I would hate to pay for that by dragging literally
megabytes of zeroes around for no purpose but to satisfy an abstract
requirement written for generic case.
-- 
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227