Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:68042 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 43787 invoked from network); 3 Jul 2013 14:12:20 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 3 Jul 2013 14:12:20 -0000 Authentication-Results: pb1.pair.com header.from=ircmaxell@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=ircmaxell@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.212.52 as permitted sender) X-PHP-List-Original-Sender: ircmaxell@gmail.com X-Host-Fingerprint: 209.85.212.52 mail-vb0-f52.google.com Received: from [209.85.212.52] ([209.85.212.52:53948] helo=mail-vb0-f52.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 99/D1-30639-34134D15 for ; Wed, 03 Jul 2013 10:12:19 -0400 Received: by mail-vb0-f52.google.com with SMTP id f12so144224vbg.25 for ; Wed, 03 Jul 2013 07:12:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=B4YJVlgkVDIFAJrXRNqLNtcER306U6VE9ClkAfwTffo=; b=A8U4MziieAc9xWS082XhCRIXrk3Sp23ajVEwEuyyIPEPJ6C837p3EGGoe9LvUU+sln eF2V7O9Kp6MQvNoYBeoPPzDMQTv2w5Ne2bpYShe1/QfJR4yD5v3d+EcDYfmNOoalXrHU hFinOE6zHLu6wjNZ+q66Uau+GV0BTgQuRMXziHNh8dPLS3ft/P4xO+ZV9YkrHqkuNH7W nYmCdWY40ixYpbjtYtYQS/zlmsWO1CpccJFzYGGcd5Pzip0owhDHNt8RzbN0Knkc9YXX p3A82jzLD0RaIkcMSyzTJLp8kWu3N19gZcGbU4ONe5nE8oKJsCLS+8jf8X7Mjb9s2Dje oizQ== MIME-Version: 1.0 X-Received: by 10.52.236.199 with SMTP id uw7mr307652vdc.18.1372860736100; Wed, 03 Jul 2013 07:12:16 -0700 (PDT) Received: by 10.58.94.201 with HTTP; Wed, 3 Jul 2013 07:12:15 -0700 (PDT) Date: Wed, 3 Jul 2013 10:12:15 -0400 Message-ID: To: "internals@lists.php.net" Content-Type: multipart/alternative; boundary=089e0111d9248061a404e09c0c02 Subject: String Size Refactor Progress From: ircmaxell@gmail.com (Anthony Ferrara) --089e0111d9248061a404e09c0c02 Content-Type: text/plain; charset=ISO-8859-1 Hey all, So I've started the refactor to change the stored string size from int to size_t. I've got it compiling and the tests mostly passing (not all), when run with --disable-all and --disable-cgi. There are definitely still issues with the patch (there are some weird segfaults in certain times, which are caught by the tests), but it's progressing really nicely. Here's what I did: I created a new build option: --enable-zstrlen. This turns off the new match, and type-defs and defines everything back to how it was before. This is really useful for testing changes to ensure that they still work. Then, I defined a series of new types: #ifdef ZEND_USE_LEGACY_STRING_TYPES #define zend_str_size_int int #define zend_str_size_uint unsigned int #define zend_str_size_size_t size_t #define zend_str_size_long long typedef int zend_str_size; #else #define zend_str_size_int zend_str_size #define zend_str_size_uint zend_str_size #define zend_str_size_size_t zend_str_size #define zend_str_size_long zend_str_size typedef size_t zend_str_size; #endif Any API that accepted a string size parameter, I replace with one of the zend_str_size_* definitions. I chose to do this instead of just changing it directly to zend_str_size, as it should make extension developer's lives easier by supporting the intermediate types (with their own define lines for older versions of the API). These are intended to be removed after 1 or 2 releases, replacing everything with just zend_str_size. Due to a problem with zend_parse_parameters, I added two new identifiers: S and P. They act just like s and p, except that they return zend_str_size instead of int. When `--enable-zstrlen` is not enabled, I disable s and p, and changed ZPP to rase an E_ERROR on unknown parameter types. The E_ERROR change is not intended to go into production, but instead just makes life A LOT easier refactoring modules one at a time. Here's what's left to do: I've only really got the basic system working (for limited definitions of working). There's a ton of extensions that need migrating, and tons of parts of the core that i haven't fully migrated yet. I've migrated php_pcre.c over, but pcrelib still uses int for string sizes. This is going to be a much larger refactor, and I wanted to see people's thoughts prior to digging into it. Substr needs to be refactored to use size_t. Right now, I just raise an error if Z_STRSIZE > INT_MAX (or an overflow would happen). I'd love to see that cleaned up more. My general process has been to enable an extension, fix the compile errors (typically due to removing Z_STRLEN*). Then run through the extension, searching for int and replacing where appropriate with zend_str_size (within a function) or zend_str_size_* in an API. Then run the tests for that extension, and fix the issues as they come up. Finally, recompile with -Werror and fix all of the warnings (yay!)... Lessons Learned So Far How this system is working today, I have no idea. There are SOOO many issues in string handling just due to types. I've seen int, unsigned int, size_t, long, unsigned long and others, silently cast back and forth (implicit casts too). Some really weird things going on... Here's the branch: https://github.com/ircmaxell/php-src/tree/string_size_refactor_take_2 And the diff: https://github.com/ircmaxell/php-src/compare/string_size_refactor_take_2 If you want to help out, please let me know and let's try to coordinate so we don't step on each other's toes... Thanks! Anthony --089e0111d9248061a404e09c0c02--