Hi:
Enjoyed Andrei's talk at the NYPHP Conference last week about unicode in
PHP 6. He mentioned that when unicode.semantics is on, strlen()
will
return the number of characters rather than the number of bytes, like
mb_string() does or strlen()
if mbstring.func_overload is on.
The hitch here is there are situations where one needs to know how many
bytes are in a string. Is there a function I've overlooked that does
this or will do this, please?
Thanks,
--Dan
--
T H E A N A L Y S I S A N D S O L U T I O N S C O M P A N Y
data intensive web and database programming
http://www.AnalysisAndSolutions.com/
4015 7th Ave #4, Brooklyn NY 11232 v: 718-854-0335 f: 718-854-0409
Enjoyed Andrei's talk at the NYPHP Conference last week about unicode in
PHP 6. He mentioned that when unicode.semantics is on,strlen()
will
return the number of characters rather than the number of bytes, like
mb_string() does orstrlen()
if mbstring.func_overload is on.The hitch here is there are situations where one needs to know how many
bytes are in a string. Is there a function I've overlooked that does
this or will do this, please?
My first question is: Why do you need to know the number of bytes occupied
by a textual string? Is it because you want to work with binary strings?
Because that's still very possible:
Even with unicode.semantics=on, the binary string type may be explicitly
used in a few ways:
$a = b"This string contains an 0xF0 byte: \xF0";
$alen = strlen($a);
This being the simplest, the lowercase b (or u) characters denote a string
as being a binary (or unicode) string explicitly. Leaving these specifiers
off yield whatever type is appropriate to unicode.semantics.
In other cases, such as reading from a binary mode file:
$fp = fopen('foo.bin', 'rb');
$str = fread($fp, 100);
The string returned is always returned as a binary string regardless of
unicode semantics. When reading a text-mode file conversely:
$fp = fopen('foo.txt', 'rt');
$str = fread($fp, 100);
The type of string returned will depend on the unicode.semantics switch (in
order to ensure maximum BC, since scripts designed for windows already use
text mode to handle linebreak transformation).
-Sara