Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30195 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 79251 invoked by uid 1010); 15 Jun 2007 11:15:28 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 79236 invoked from network); 15 Jun 2007 11:15:28 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 Jun 2007 11:15:28 -0000 Received: from [127.0.0.1] ([127.0.0.1:12256]) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ECSTREAM id 47/23-48513-0D472764 for ; Fri, 15 Jun 2007 07:15:28 -0400 Authentication-Results: pb1.pair.com header.from=robert@typo3.org; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=robert@typo3.org; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain typo3.org from 217.72.131.73 cause and error) X-PHP-List-Original-Sender: robert@typo3.org X-Host-Fingerprint: 217.72.131.73 mail.elios.de Linux 2.4/2.6 Received: from [217.72.131.73] ([217.72.131.73:58871] helo=mail.elios.de) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8D/90-48513-FE762764 for ; Fri, 15 Jun 2007 06:20:32 -0400 Received: from [192.168.0.4] (pD9EE6517.dip.t-dialin.net [217.238.101.23]) by mail.elios.de (Postfix) with ESMTP id 773455042E5 for ; Fri, 15 Jun 2007 12:20:28 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v752.3) X-Gpgmail-State: !signed Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: <7A58222C-FA15-47D5-ACFC-0B14CF4D12A7@typo3.org> Content-Transfer-Encoding: 7bit Date: Fri, 15 Jun 2007 12:25:07 +0200 To: internals@lists.php.net X-Mailer: Apple Mail (2.752.3) Subject: PHP6 an unicode in practice - wanted behaviour for array indices? From: robert@typo3.org (Robert Lemke) Hi folks, I am currently developing the next major version of TYPO3, a PHP based content management system. As we take advantage of the new Unicode features (among other things) it will be completely based on and rely on PHP6 with unicode.semantics turned on. (Sidenode: My veto would be that unicode.semantics would be turned on by default, but I trust on that you take the right decision and then we'll adapt to it). We bypassed all backward compatibility issues by just starting from scratch again ... Anyway, using the latest CVS version, I stumbled over a behaviour I didn't expect and I'd like to ask you if that is a wanted behaviour or a bug. Consider the following code (unicode semantics = on): preg_match('/(?P\w),/', 'a,b,c,d', $matches); echo (isset($matches['character']) ? 'yes ' : 'no '); The expected output is "yes", but it returns "no". The reason is that the index "character" in the array returned by preg_match seemst to be a binary string, not unicode as I expected: preg_match('/(?P\w),/', 'a,b,c,d', $matches); echo (isset($matches[(binary)'character']) ? 'yes ' : 'no '); echo (isset($matches[(unicode)'character']) ? 'yes ' : 'no '); This code outputs "yes no". A bit worse even: preg_match('/(?P\w),/', 'a,b,c,d', $matches); echo (isset($matches[(unicode)'character']) ? 'yes ' : 'no '); extract($matches); echo(gettype($character) . ' '); $matches = compact($character); echo (isset($matches[(unicode)'character']) ? 'yes ' : 'no '); The output is "no unicode no". Sorry if I understood something wrong here, but to me it looks like a little inconsistency. Cheers, Robert