Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:71435
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: error (pb1.pair.com: domain zend.com from 74.125.82.169 cause and error)
MIME-Version: 1.0
In-Reply-To: <a292b56d7384cd89f5a0070323864059.squirrel@webmail.klapt.com>
References: <fa05c43bc2fff5420759e05dee19c99f.squirrel@webmail.klapt.com>
	<CAF+90c8mDBqe+E=cWxpD-U48Ho0QV3iHiwS9pOSEbUn_uZTVnQ@mail.gmail.com>
	<CA+9eiLvhRqFAunM8GCpqE5ja2jXKCzFMKdtF0NkBT11nu=36MQ@mail.gmail.com>
	<a292b56d7384cd89f5a0070323864059.squirrel@webmail.klapt.com>
Date: Thu, 23 Jan 2014 16:37:30 +0400
Message-ID: <CA+9eiLsERkgpqmC+NV5YFSwNQ3EgO4J6_km9vjrb_4wCm3T=Gg@mail.gmail.com>
To: Anatol Belski <ab@php.net>
Cc: Nikita Popov <nikita.ppv@gmail.com>, 
	PHP Developers Mailing List <internals@lists.php.net>
Content-Type: multipart/alternative; boundary=e89a8f6433783c0f2504f0a2818a
Subject: Re: [PHP-DEV] [RFC] 64 bit platform improvements for string length
 and integer
From: dmitry@zend.com (Dmitry Stogov)

--e89a8f6433783c0f2504f0a2818a
Content-Type: text/plain; charset=UTF-8

Hi Anatol,


On Thu, Jan 23, 2014 at 3:18 PM, Anatol Belski <ab@php.net> wrote:

> Hi Dmitry,
>
> greateful thanks for taking a look at this patch.
>
> On Thu, January 23, 2014 09:42, Dmitry Stogov wrote:
> > I completely agree with Nikita.
> > Why to rename LONG->INT STRLEN->STRSIZE in thousands places?
> > Why not just define zend_long and zend_ulong to be 64-bit on 64-bit
> > platforms and use them instead of int, ulint, zend_int, zend_uint, long,
> > ulong where it's necessary.
> >
> > Anatol, I understood your point about catching incompatibility code at
> > compile-time, but I'm not sure if the new features cost such huge code
> > base changes.
> Firstly it's the historical reason as while porting it "had" to break so
> one could easy see the relevant places. After that - it's also a
> transformation in the mind, as that int placeholders do not depend on a
> fixed datatype anymore.
>

historical reason of the patch itself doesn't matter :)
If there's no significant reason to introduce new names lets use old ones.

Even with new names anyone is able to change macros but not types.
It's better to use some smarter compile-time protection.
I would try to play with "compile time assert" (in GCC it must be even
possible to use __builtin_types_compatible_b()).
For example:

#define CONCAT_TOKENS(a, b) a ## b
#define EXPAND_THEN_CONCAT(a, b) CONCAT_TOKENS(a, b)
#define COMPILE_TIME_ASSERT(expr) do { \
                enum { EXPAND_THEN_CONCAT(ASSERT_line_,__LINE__) = 1 /
!!(expr) }; \
        } while(0)

int main()
{
        int a;

        COMPILE_TIME_ASSERT(__builtin_types_compatible_p(typeof(a), long));
        COMPILE_TIME_ASSERT(sizeof(a) == 8);
        return 0;
}


>
> > 1) 64-bit integers on Windows (they are already 64-bit on other systems)
> > 2) 64-bit string length. I don't think many people are interested in
> that.
> >  Fortunately, the patch doesn't change the zval size, so it shouldn't
> > make a lot of harm. However, usage of "zend_size_t" instead of "int" is a
> > bit annoying. I would change it into the same "zend_long" or
> "zend_ulong".
> >
> The original patch was for size_t only. With only that it were Linux/Unix
> only improvement, as size_t is 64 bit on Windows so it'd have to stay int
> or become just unsigned. Omitting the size_t change and doing int64 were
> only improvement on Windows.  Adding both is the three-way improvement -
> Linux and Windows with biggest possible strings, Windows with 64 bit
> integers. So in fact, doing only one of those wouldn't IMHO justify all
> the effort.
>

After some thoughts I think that usage of "size_t" is a good thing for the
future support of X32 ABI.

X86:                  sizeof(int)=4 sizeof(long)=4 sizeof(sizet_t)=4
X86-64 (Linux):   sizeof(int)=4 sizeof(long)=8 sizeof(sizet_t)=8
X86-64 (Win64):  sizeof(int)=4 sizeof(long)=4 sizeof(sizet_t)=8
X32:                   sizeof(int)=4 sizeof(long)=8 sizeof(sizet_t)=4


>
> Additional Windows improvement "for free" is the whole file API
> exhausting, so large file objects and offsets.
>
> That's true, the possibility to process gigabytes of data in memory will
> not be needed every day, however the presence of it is something else.
> Like, why should I be interested on something not available anyway?
>
> Keeping size_t separated semantically is good for several reasons. It's
> clean with the specification. Should it come to 64 bit integer on 32 bit
> platform, it's easier to continue (merging size_t and ulong would break
> this option). And, one day it can come to 128 bit integers (for what
> reasons ever). I know, it's science fiction now, but wasn't a RAM size of
> 1Gb so 10 years ago? You never know. So size_t separated from unsigned int
> is a good thing imho and keeps some interesting options open for the
> future.
>

Agreed.


> At the end line, the new vs. old names is the last thing I personally
> would ultimately hang on, given the essential modification is in place.
> However the code clearly expressing what happens is something I'd call
> more appropriate with such a big substantial change.
>

It's questionable. I' mot sure what names are better but big changes are
always annoying.
May be it makes sense to extend votiing with options ("with renaming" and
"without renaming")

Thanks. Dmitry.


>
> Best regards
>
> Anatol
>
>
>
>

--e89a8f6433783c0f2504f0a2818a--