Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:30204 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 19948 invoked by uid 1010); 17 Jun 2007 01:08:59 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 19933 invoked from network); 17 Jun 2007 01:08:59 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Jun 2007 01:08:59 -0000 Authentication-Results: pb1.pair.com smtp.mail=andrei@gravitonic.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=andrei@gravitonic.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain gravitonic.com from 204.11.219.139 cause and error) X-PHP-List-Original-Sender: andrei@gravitonic.com X-Host-Fingerprint: 204.11.219.139 mail.lerdorf.com Received: from [204.11.219.139] ([204.11.219.139:43316] helo=mail.lerdorf.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2A/A8-09547-AA984764 for ; Sat, 16 Jun 2007 21:08:58 -0400 Received: from [192.168.11.2] (c-24-6-152-247.hsd1.ca.comcast.net [24.6.152.247]) (authenticated bits=0) by mail.lerdorf.com (8.14.1/8.14.1/Debian-4) with ESMTP id l5H18sPe026549; Sat, 16 Jun 2007 18:08:54 -0700 In-Reply-To: <7A58222C-FA15-47D5-ACFC-0B14CF4D12A7@typo3.org> References: <7A58222C-FA15-47D5-ACFC-0B14CF4D12A7@typo3.org> Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: <69273B62-2EE0-4DDD-B54C-478F30778487@gravitonic.com> Cc: internals@lists.php.net Content-Transfer-Encoding: 7bit Date: Sat, 16 Jun 2007 18:08:52 -0700 To: Robert Lemke X-Mailer: Apple Mail (2.752.2) X-Virus-Scanned: ClamAV 0.90.3/3440/Sat Jun 16 14:25:04 2007 on colo.lerdorf.com X-Virus-Status: Clean Subject: Re: [PHP-DEV] PHP6 an unicode in practice - wanted behaviour for array indices? From: andrei@gravitonic.com (Andrei Zmievski) Yeah, this should be fixed. I'll try to do it over the weekend. -Andrei On Jun 15, 2007, at 3:25 AM, Robert Lemke wrote: > Hi folks, > > I am currently developing the next major version of TYPO3, a PHP > based content management system. As we take advantage of the new > Unicode features (among other things) it will be completely based > on and rely on PHP6 with unicode.semantics turned on. (Sidenode: My > veto would be that unicode.semantics would be turned on by default, > but I trust on that you take the right decision and then we'll > adapt to it). We bypassed all backward compatibility issues by just > starting from scratch again ... > > Anyway, using the latest CVS version, I stumbled over a behaviour I > didn't expect and I'd like to ask you if that is a wanted behaviour > or a bug. > > Consider the following code (unicode semantics = on): > > preg_match('/(?P\w),/', 'a,b,c,d', $matches); > echo (isset($matches['character']) ? 'yes ' : 'no '); > > The expected output is "yes", but it returns "no". The reason is > that the index "character" in the > array returned by preg_match seemst to be a binary string, not > unicode as I expected: > > preg_match('/(?P\w),/', 'a,b,c,d', $matches); > echo (isset($matches[(binary)'character']) ? 'yes ' : 'no '); > echo (isset($matches[(unicode)'character']) ? 'yes ' : 'no '); > > This code outputs "yes no". > > A bit worse even: > > preg_match('/(?P\w),/', 'a,b,c,d', $matches); > echo (isset($matches[(unicode)'character']) ? 'yes ' : 'no '); > extract($matches); > echo(gettype($character) . ' '); > $matches = compact($character); > echo (isset($matches[(unicode)'character']) ? 'yes ' : 'no '); > > The output is "no unicode no". > > Sorry if I understood something wrong here, but to me it looks like > a little inconsistency. > > Cheers, > Robert > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php