Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:67479 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 84950 invoked from network); 24 May 2013 06:34:30 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 May 2013 06:34:30 -0000 Authentication-Results: pb1.pair.com smtp.mail=martin.keckeis1@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=martin.keckeis1@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.214.179 as permitted sender) X-PHP-List-Original-Sender: martin.keckeis1@gmail.com X-Host-Fingerprint: 209.85.214.179 mail-ob0-f179.google.com Received: from [209.85.214.179] ([209.85.214.179:42314] helo=mail-ob0-f179.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 7C/6D-16824-5F90F915 for ; Fri, 24 May 2013 02:34:29 -0400 Received: by mail-ob0-f179.google.com with SMTP id wo10so1639456obc.10 for ; Thu, 23 May 2013 23:34:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=DzcZaqsT4pT6w6NsHQBLKnxzxkJLa1zrHtiyd/lGXDA=; b=yprztYgwhoiYqqegHWld1j+tmimmw0tlgMAPHRyXD8yxMvulLN+31fI0WgczuqX2yq Lq7tAk0r1Q3gsNFv61UExs6KN+e99fbi4flSrDJqf3l8zlt4uRH+xlSqAcP2c+gZglMD 3PPaAwRWHYeAPQ3uHaJrkqPk2aVOSK4YiZT1ylAYo+gAvSvLiZ18NerNl/OrTXLWvrys WzP4i1Yrbm70bZodwDHYHgyDPYy27TuJRNtgC7FZDqOhdfP4jL656zr8IUa0d4V3bUdp 1T+HyBtk3s3odcK/rzRFY7VwpYV0IXkKWNjBozv/DMbA97F+gk7UBzcMSgLQgLQ47zUY ofZQ== MIME-Version: 1.0 X-Received: by 10.182.33.40 with SMTP id o8mr10799504obi.39.1369377265459; Thu, 23 May 2013 23:34:25 -0700 (PDT) Received: by 10.182.149.70 with HTTP; Thu, 23 May 2013 23:34:25 -0700 (PDT) In-Reply-To: <61BC4F17-86D9-4CBD-B185-58A2D4AFAE5F@rouvenwessling.de> References: <61BC4F17-86D9-4CBD-B185-58A2D4AFAE5F@rouvenwessling.de> Date: Fri, 24 May 2013 08:34:25 +0200 Message-ID: To: =?UTF-8?Q?Rouven_We=C3=9Fling?= Cc: PHP internals Content-Type: multipart/alternative; boundary=089e01161b5c78a63e04dd70fd97 Subject: Re: [PHP-DEV] Proposal for better UTF-8 handling From: martin.keckeis1@gmail.com (Martin Keckeis) --089e01161b5c78a63e04dd70fd97 Content-Type: text/plain; charset=UTF-8 Hello Rouven, the lack of "good" UTF-8 support is a long topic in PHP and improvement (at least i think) are very welcome at this place! Before I write an RFC I'd like to get some feedback what you think about > adding the following functions to PHP 5.6 (possibly more to follow): > utf8_is_valid, utf8_strlen, utf8_substr, utf8_strpos, utf8_strrpos, > utf8_str_split, utf8_strrev, utf8_recover, utf8_chr, utf8_ord, > string_is_ascii. > > Most of them (exceptions are utf8_chr, utf8_is_valid, utf8_recover and > string_is_ascii) are currently written in a way that they emit a warning > when they encounter invalid UTF-8 and return with null. This should > encourage applications to check their input with utf8_is_valid and either > stop further processing or to fall back to utf8_recover to get a valid > string. This should improve security since there are attack vectors when > malformed sequences get interpreted as another encoding. > > I'm currently using the multibyte from the "mb_" functions and i'm generally happy with it. For me it's no problem with a custom webserver to use this extension. The biggest problem with the extension i had is that there is no each function from the standard string functions available. I think most famous: mb_str_replace Maybe to think off: Why not combine your things with the mb_ extension? For emmiting a warning you could use a configuration either in ini file or calling a function to set it. I would rather like one complete "mb/utf-8" lib that even one more. Like you have already written, there are already some out there....and for core i would currently preferr "mb_" because they are available since PHP4 and stable. --089e01161b5c78a63e04dd70fd97--