Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:101718 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 79336 invoked from network); 28 Jan 2018 12:12:28 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 28 Jan 2018 12:12:28 -0000 Authentication-Results: pb1.pair.com smtp.mail=are.you.winning@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=are.you.winning@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.48 as permitted sender) X-PHP-List-Original-Sender: are.you.winning@gmail.com X-Host-Fingerprint: 74.125.82.48 mail-wm0-f48.google.com Received: from [74.125.82.48] ([74.125.82.48:34607] helo=mail-wm0-f48.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 46/44-24062-B2EBD6A5 for ; Sun, 28 Jan 2018 07:12:27 -0500 Received: by mail-wm0-f48.google.com with SMTP id j21so32496110wmh.1 for ; Sun, 28 Jan 2018 04:12:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:from:date:message-id:subject:to; bh=bNsV5vaXj0HheznoYxTEvCoAaTMUOZgumkT9E7HxgRg=; b=KbgXTz2aGtUg64tsZnDwpQB2b0krkAoUOFWjxBEvlY1iGOht/R0BLgniamlSXLJth3 eWLUYIET0IVzQi8MV1ApheA1hCO9K4SFGih4q6FEMm2cSwdZ9Ugj+LGMFWFqML8y4WjE 0F47CZnpFdvGrsK7AafuIBT2ZxmdgO9DHlVz3WTLHwfB12Ki3k6aEqeWzw1ATTRzmasK ixJnYT5pc21VpOK+XS90AddUddLY0vufPD1Gw93qk8yBa9QQEGrhq4u0M9WYORT7sTkH lqlKBQQcNyVzldYylD2tiiLxM8Fyca1VCvKx4kbNzketYPa9P9d2mNnK1NaHTsugrvLT rpCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to; bh=bNsV5vaXj0HheznoYxTEvCoAaTMUOZgumkT9E7HxgRg=; b=BdWvlxfVnE1jv2+3QEPm/lNP+wW2t27L87ObXO3EqG96JSAD7l3cuCfWP5nwIQ08ga iy1Z04eGwVM3Q+DR0F7K7eMuFZ8rj+pQ3vAAnr//XEkKM3DgWE7a80FTQnaPfKt6/SiY 0FWcNVtihC249LzW3i707jwRpVmoDZJVn1vkeBCwI3Y0MSahgTzbB+Din0QKpA1d43He pC84EglVnjlFtfG+qjdCSn0eTR9xmAd2MQkDgn7Jy7tB9lX4FPhzOA4Uh/x5Bnq6Cll7 P7oB9HBOkzTRbhrcA0mYncyZnA+aBC4M2QK6+TuWwpkUzDuE/TunFA6nNxg01nSeq0Cv fV+Q== X-Gm-Message-State: AKwxytcasZFAyjpHqpjXOPwDjp4CT3x+/PMnNpQuWTdty1KuL1pXSsOA DWx5L5ITp+mxJbZQhQE3vIwkFg1py58GzfGhztnSZn8X X-Google-Smtp-Source: AH8x2279UgOVy0L7r6uaCv/WtoW+CNq/RouuGWa4SDT1aDvUs8REqz5thY9Uarnldhbaxyga/QWie+SN0SE5UJOlF0k= X-Received: by 10.80.152.65 with SMTP id h1mr42364858edb.227.1517141544359; Sun, 28 Jan 2018 04:12:24 -0800 (PST) MIME-Version: 1.0 Sender: are.you.winning@gmail.com Received: by 10.80.164.198 with HTTP; Sun, 28 Jan 2018 04:12:23 -0800 (PST) Date: Sun, 28 Jan 2018 12:12:23 +0000 X-Google-Sender-Auth: ByLDzfOGtNV30QzHHLLK-5aEOUw Message-ID: To: PHP Internals List Content-Type: multipart/alternative; boundary="f403045c0b48d359d60563d50a3a" Subject: unpack() offset and consumed data measurement From: cw@daverandom.com (Chris Wright) --f403045c0b48d359d60563d50a3a Content-Type: text/plain; charset="UTF-8" Morning all Since PHP 7.1 the unpack() function has a (still undocumented) optional 3rd argument that allows the caller to specify the offset in the input data where parsing should start. While this is a useful feature, it is currently impossible to know how many bytes of the input were consumed for some format specifiers, such as Z*, f, d and anything else that does not consume a universally constant amount of data. It is typically possible to determine this externally, but not without some clumsy measurements either of the returned value or (in the case of system-dependent numeric types) inspecting the length of the string returned by pack() for those specifiers. It can also get complicated when using things like x and X, which adjust the offset without producing data in the returned value. Additionally, computing the new position in the input buffer separately from the format string risks the two diverging if one is modified and the other is either not updated, or updated incorrectly. Many binary data formats are sufficiently complex that unpacking a large structure requires multiple calls to unpack(), as often there are nuances that cannot be directly expressed with the current specifier format, such as strings prefixed with a length indicator. Here is some code that demonstrates the problem: /* This is the only way to know for certain how big float is on the local system */ define('FLOAT_WIDTH', strlen(pack('f', 0.0))); /* an exaggerated example using two variable width codes and a code that does not produce output but modifies the input buffer offset */ $pieces = unpack('f/X/Z*', $data, $offset); /* we now have to modify the offset before we can continue to unpack data */ $offset += FLOAT_WIDTH // f - 1 // x + strlen($pieces[3]); // Z* I would like to look at adding a 4th optional argument, taken by-ref, which will be populated with the number of buffer bytes consumed by the unpack() operation. This would enable the above code to be rewritten like so: $pieces = unpack('f/X/Z*', $data, $offset, $consumed); $offset += $consumed; Not only is this code much simpler and less susceptible to breakage, it is (IMHO) clearer to read as well. Does anyone have any objections to/thoughts about this? If not I will work up a patch in the coming week. Thanks, Chris --f403045c0b48d359d60563d50a3a--