Morning all
Since PHP 7.1 the unpack()
function has a (still undocumented) optional 3rd
argument that allows the caller to specify the offset in the input data
where parsing should start. While this is a useful feature, it is currently
impossible to know how many bytes of the input were consumed for some
format specifiers, such as Z*, f, d and anything else that does not consume
a universally constant amount of data.
It is typically possible to determine this externally, but not without some
clumsy measurements either of the returned value or (in the case of
system-dependent numeric types) inspecting the length of the string
returned by pack()
for those specifiers. It can also get complicated when
using things like x and X, which adjust the offset without producing data
in the returned value.
Additionally, computing the new position in the input buffer separately
from the format string risks the two diverging if one is modified and the
other is either not updated, or updated incorrectly.
Many binary data formats are sufficiently complex that unpacking a large
structure requires multiple calls to unpack()
, as often there are nuances
that cannot be directly expressed with the current specifier format, such
as strings prefixed with a length indicator.
Here is some code that demonstrates the problem:
/* This is the only way to know for certain how big float is on the
local system */
define('FLOAT_WIDTH', strlen(pack('f', 0.0)));
/* an exaggerated example using two variable width codes and a code that
does not produce output but modifies the input buffer offset */
$pieces = unpack('f/X/Z*', $data, $offset);
/* we now have to modify the offset before we can continue to unpack
data /
$offset += FLOAT_WIDTH // f
- 1 // x
+ strlen($pieces[3]); // Z
I would like to look at adding a 4th optional argument, taken by-ref, which
will be populated with the number of buffer bytes consumed by the unpack()
operation. This would enable the above code to be rewritten like so:
$pieces = unpack('f/X/Z*', $data, $offset, $consumed);
$offset += $consumed;
Not only is this code much simpler and less susceptible to breakage, it is
(IMHO) clearer to read as well.
Does anyone have any objections to/thoughts about this? If not I will work
up a patch in the coming week.
Thanks, Chris
Since PHP 7.1 the
unpack()
function has a (still undocumented) optional 3rd
argument […]
JFTR: documented with
http://svn.php.net/viewvc?view=revision&revision=344003.
--
Christoph M. Becker
Since PHP 7.1 the
unpack()
function has a (still undocumented) optional
3rd
argument […]JFTR: documented with
http://svn.php.net/viewvc?view=revision&revision=344003.
Thanks!
Here is some code that demonstrates the problem:
/* This is the only way to know for certain how big float is on the
local system */
define('FLOAT_WIDTH', strlen(pack('f', 0.0)));/* an exaggerated example using two variable width codes and a code
that
does not produce output but modifies the input buffer offset /
$pieces = unpack('f/X/Z', $data, $offset);/* we now have to modify the offset before we can continue to unpack
data /
$offset += FLOAT_WIDTH // f
- 1 // x
+ strlen($pieces[3]); // Z
Re-reading this mail I have noticed there was a small mistake in the code
sample, in that I forgot to include the terminating null byte for the Z*
data.
This (unintentionally) demonstrates the exact reason I would like to add
this, as it's very easy to accidentally write subtle bugs.