Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:24172 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 5257 invoked by uid 1010); 21 Jun 2006 20:10:08 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 5241 invoked from network); 21 Jun 2006 20:10:08 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Jun 2006 20:10:08 -0000 X-PHP-List-Original-Sender: pollita@php.net X-Host-Fingerprint: 65.111.164.201 danica.alphaweb.net Linux 2.4/2.6 Received: from ([65.111.164.201:58808] helo=danica.alphaweb.net) by pb1.pair.com (ecelerity 2.1.1.3 r(11751M)) with ESMTP id 72/43-62414-EE2A9944 for ; Wed, 21 Jun 2006 15:50:07 -0400 Received: from dhcp-139-92.ohr.berkeley.edu ([169.229.139.92] helo=peiscg33m) by danica.alphaweb.net with esmtpsa (TLS-1.0:RSA_ARCFOUR_MD5:16) (Exim 4.50) id 1Ft8hs-000449-9D; Wed, 21 Jun 2006 15:49:44 -0400 Message-ID: <005a01c6956b$ea50a1f0$5c8be5a9@ohr.berkeley.edu> Reply-To: "Sara Golemon" To: "Daniel Convissor" Cc: References: <20060621171450.GA2632@panix.com> Date: Wed, 21 Jun 2006 12:50:03 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1506 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1506 Subject: Re: strlen() under unicode.semantics From: pollita@php.net ("Sara Golemon") > Enjoyed Andrei's talk at the NYPHP Conference last week about unicode in > PHP 6. He mentioned that when unicode.semantics is on, strlen() will > return the number of characters rather than the number of bytes, like > mb_string() does or strlen() if mbstring.func_overload is on. > > The hitch here is there are situations where one needs to know how many > bytes are in a string. Is there a function I've overlooked that does > this or will do this, please? > My first question is: Why do you need to know the number of bytes occupied by a textual string? Is it because you want to work with binary strings? Because that's still very possible: Even with unicode.semantics=on, the binary string type may be explicitly used in a few ways: $a = b"This string contains an 0xF0 byte: \xF0"; $alen = strlen($a); This being the simplest, the lowercase b (or u) characters denote a string as being a binary (or unicode) string explicitly. Leaving these specifiers off yield whatever type is appropriate to unicode.semantics. In other cases, such as reading from a binary mode file: $fp = fopen('foo.bin', 'rb'); $str = fread($fp, 100); The string returned is always returned as a binary string regardless of unicode semantics. When reading a text-mode file conversely: $fp = fopen('foo.txt', 'rt'); $str = fread($fp, 100); The type of string returned will depend on the unicode.semantics switch (in order to ensure maximum BC, since scripts designed for windows already use text mode to handle linebreak transformation). -Sara