Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:63745 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 33300 invoked from network); 5 Nov 2012 16:04:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 5 Nov 2012 16:04:14 -0000 Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 209.85.220.170 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 209.85.220.170 mail-vc0-f170.google.com Received: from [209.85.220.170] ([209.85.220.170:57579] helo=mail-vc0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 6C/C9-00811-D73E7905 for ; Mon, 05 Nov 2012 11:04:14 -0500 Received: by mail-vc0-f170.google.com with SMTP id fo14so6552364vcb.29 for ; Mon, 05 Nov 2012 08:04:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-gm-message-state; bh=t3mf7lXSHyy+Zf5qWPR16Bfdc0XatJJpLe+jAZogvD4=; b=XPphKDZOIPWtPecRiI1zCpjl0WnFMl3o54jzTC7+MPgb3VwPPd+OwPpXbq6Wtu9m1W liPRJXUkJAW1g0EoJFq/BrLNTDQ8Rnxvz9PhsKPg7iTCT9w7w0kePf+bbYHKyXyZvM2Z Nxhm9zJEbMGSlJpJAAHE/9BlnOOK3tv25CBE5fZy/POLekD50Pih8G7sd653Y5p4HIgY edRK5/12GsPE3Bw4JTQo66TsFhqhgqKYV6RegoMvW4P3/n9P/SXy+CcZwdlZm8mMlq2u FSK1hYVF8y8GcidZynRL5pmnDwkKdCxcE4Eow1ysPMQpnsCmFw8Oe8SOnoHXTqPwo0He NWrA== Received: by 10.58.74.40 with SMTP id q8mr10164878vev.36.1352131450668; Mon, 05 Nov 2012 08:04:10 -0800 (PST) Received: from [192.168.200.148] (c-50-131-44-225.hsd1.ca.comcast.net. [50.131.44.225]) by mx.google.com with ESMTPS id k4sm9614762vdg.2.2012.11.05.08.04.07 (version=SSLv3 cipher=OTHER); Mon, 05 Nov 2012 08:04:07 -0800 (PST) Message-ID: <5097E376.6040709@lerdorf.com> Date: Mon, 05 Nov 2012 08:04:06 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.2 MIME-Version: 1.0 To: =?UTF-8?B?SmVhbi1Tw6liYXN0aWVuIEhlZGRl?= CC: internals References: <5fce29a0cb5467c00eeb267dd38fd788@localhost> In-Reply-To: <5fce29a0cb5467c00eeb267dd38fd788@localhost> X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Gm-Message-State: ALoCoQkZwYi5AZYBZfNpVE3iscw0C8os+TGSYTpGELRedP8VeBMj1L+ymUQ6/tsM4OKjbCfqNpBg Subject: Re: [PHP-DEV] Incomprehension with preg_match and utf8 From: rasmus@lerdorf.com (Rasmus Lerdorf) On 11/05/2012 01:57 AM, Jean-Sébastien Hedde wrote: > Hi, > > I'm facing an issue with preg_match and an UTF8 string. > > The pattern is : /^[[:alnum:]\s\-\'%]+$/u > The string : Régis > > If I read the manual preg_match should return 0 ("In UTF-8 mode, > characters with values greater than 128 do not match any of the POSIX > character classes.") but I've got 1 in some case : > > On a Windows host > php 5.2.12 - (PCRE 7.9 2009-04-11) : preg_match === 1 > > On the same centos host : > php 5.2.10 (Rémi's RPM) - (PCRE 6.6 06-Feb-2006) : preg_match === 0 > php 5.4.8 (my build) - (PCRE 8.12 2011-01-15) : preg_match === 1 > > On an other Centos host : > php 5.4.0 (Rémi's RPM) - (PCRE 7.8 2008-09-05) > > How this can be possible ? I think the documentation is wrong on that. In Unicode mode [[:alnum:]] actually becomes \p{Xan} which should match Unicode chars as well, but only if PCRE was compiled with Unicode support. So I suspect you don't actually have a Unicode-capable PCRE build in some cases there. -Rasmus