Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:67596 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 77341 invoked from network); 31 May 2013 19:11:37 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 31 May 2013 19:11:37 -0000 Authentication-Results: pb1.pair.com smtp.mail=ircmaxell@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=ircmaxell@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.128.169 as permitted sender) X-PHP-List-Original-Sender: ircmaxell@gmail.com X-Host-Fingerprint: 209.85.128.169 mail-ve0-f169.google.com Received: from [209.85.128.169] ([209.85.128.169:39882] helo=mail-ve0-f169.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 42/9E-02347-8E5F8A15 for ; Fri, 31 May 2013 15:11:37 -0400 Received: by mail-ve0-f169.google.com with SMTP id m1so1418198ves.14 for ; Fri, 31 May 2013 12:11:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=hgfEzOM9hgUcWOGWQAyFVfyI8L5595l7qr63iDeELnU=; b=ajY2HQEF8OmoAqSxCsl5gKpH4zDojtwZfS5H7CboUdmD2ZxtY9uAIbS2uD4vKHstLm +TSMq2zPiuwn+0gUVpfAahmLQO4R4c0J/bzyLIGZPQe3hoULRoKvaFXb6kfPYpKqUqM2 VnWyCvX+bC9faJQmW5u2F/CJd/jeZIMOWLxmBlnXiaJXHOe8BosGnKr34K8sSTeVkyIF uRxaQxpaWuyF9VQRt4Ruj3TPpcDUbjJgbQHPTGPNbZ+PO48senHurtc3VDduKqu/8oOe hjW47HnPo37MkiubNIq3URq9TwiS1w3IDTsbUf7XRVt9U9bReka2gMT+VRj5AmRNUv2e qSig== MIME-Version: 1.0 X-Received: by 10.52.16.201 with SMTP id i9mr9739340vdd.58.1370027494355; Fri, 31 May 2013 12:11:34 -0700 (PDT) Received: by 10.58.217.197 with HTTP; Fri, 31 May 2013 12:11:34 -0700 (PDT) Date: Fri, 31 May 2013 15:11:34 -0400 Message-ID: To: "internals@lists.php.net" Content-Type: multipart/alternative; boundary=bcaec5040c06223bfe04de086249 Subject: 5.NEXT Integer and String type modifications From: ircmaxell@gmail.com (Anthony Ferrara) --bcaec5040c06223bfe04de086249 Content-Type: text/plain; charset=ISO-8859-1 Hello all, I want to start an idea thread (or at least get a conversation going) about cleaning up the core integer data type and string lengths. Here's my ideas: 1. Change string length in the ZVAL from int to size_t - http://lxr.php.net/xref/PHP_5_5/Zend/zend.h#321 2. Change long in the ZVAL (lval) to a system-determined 64bit fixed size There are two reasons for this. First, on VS compiles (windows), the current long size is always 32 bit. So that means even 64 bit compiles may or may not have 64 bit ints. The second reason is that right now PHP can't really handle strings >= 2^31 characters even on 64 bit compiles. The problem gets pretty comical: $ php -d memory_limit=499g -r "\$string = str_repeat('x',pow(2, 32)) . str_repeat('x', pow(2,4)); var_dump(strlen(\$string));" int(16) Obviously there's a pretty significant ABI break here. I propose a "tweak" of the Z_* macros to "fix" that. Basically, Z_STRLEN() will cast the result to an int. This is the same behavior as today, and will mean that existing extensions continue to function exactly as today. But new extensions (and elsewhere in core) can use a new macro Z_STRSIZE() which will return the native size_t. Likewise we can do the same for the long data type (Z_LVAL() returns a long, and Z_PHPLVAL() returns a php_long (which is a typedef of a 64 bit compiler specific type). It'll also require 2 new zend_parse_parameters types (one for php_long and one for the string len using size_t instead)... Additionally, I'd propose a set of central helpers to cast back and forth between php_long and long, as well as int to size_t (with overflow checks, allowing us to do errors on detected overflows instead of silently ignoring them as today). It would be a *gigantic* patch, but the userland effects should be minimal (the only changes would be supporting longer strings, and consistent 64 bit int support). The performance considerations should be minimal for non-legacy code (as both would still be using native data types)... What do you think? What am I missing from this? Or is this just a horrific idea (given the current implementation details)...? Anthony --bcaec5040c06223bfe04de086249--