Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:47811 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 80593 invoked from network); 6 Apr 2010 17:55:06 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Apr 2010 17:55:06 -0000 Authentication-Results: pb1.pair.com smtp.mail=rasmus@lerdorf.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=rasmus@lerdorf.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain lerdorf.com from 209.85.220.223 cause and error) X-PHP-List-Original-Sender: rasmus@lerdorf.com X-Host-Fingerprint: 209.85.220.223 mail-fx0-f223.google.com Received: from [209.85.220.223] ([209.85.220.223:49913] helo=mail-fx0-f223.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2D/2B-19932-6757BBB4 for ; Tue, 06 Apr 2010 13:55:03 -0400 Received: by fxm23 with SMTP id 23so164719fxm.1 for ; Tue, 06 Apr 2010 10:54:59 -0700 (PDT) Received: by 10.102.207.1 with SMTP id e1mr4046484mug.122.1270576499248; Tue, 06 Apr 2010 10:54:59 -0700 (PDT) Received: from [192.168.200.22] (c-98-234-184-167.hsd1.ca.comcast.net [98.234.184.167]) by mx.google.com with ESMTPS id 7sm2744203mup.3.2010.04.06.10.54.55 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 06 Apr 2010 10:54:58 -0700 (PDT) Message-ID: <4BBB756E.6000905@lerdorf.com> Date: Tue, 06 Apr 2010 10:54:54 -0700 User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10pre) Gecko/20100404 Shredder/3.0.5pre MIME-Version: 1.0 To: Scott MacVicar CC: Justin Dearing , internals@lists.php.net, scottmac@php.net References: <4BBB70B4.9050503@lerdorf.com> <79B651DA-34A4-4596-9204-47A47211BB27@macvicar.net> In-Reply-To: <79B651DA-34A4-4596-9204-47A47211BB27@macvicar.net> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] What gruntwork needs to be done From: rasmus@lerdorf.com (Rasmus Lerdorf) On 04/06/2010 10:47 AM, Scott MacVicar wrote: > On Apr 6, 2010, at 10:34 AM, Rasmus Lerdorf wrote: > >> On 04/06/2010 10:08 AM, Justin Dearing wrote: >>> So pending review an acceptance by Dmitry, I've written my first patch for >>> PHP. While there is a good chance I will need to make further revisions to >>> the test or code, I don't know what that is. >>> >>> However, I've got some free time at the moment, and I'd like to make use of >>> some of the sunk costs of figuring out how to hack PHP. So I know that in >>> general there is a lot of work to be done. I also know that there are >>> plenty of open bugs, tests to be written, etc etc. What I am looking for is >>> someone to say is "here are the next 10 bugs I will work on can you write me >>> test" or "I wrote this patch on linux, I need someone to make it work on >>> windows too" or, "Party X complains of this but refuses to fill out a proper >>> bug report." >> >> Here is a straightforward (but not easy) one: >> >> http://bugs.php.net/bug.php?id=47435 >> >> The php_filter_validate_ip() function in ext/filter/logical_filters.c >> needs those reserved IPV6 ranges added to the FORMAT_IPV6 case in the >> switch statement there when FILTER_FLAG_NO_RES_RANGE is set. I say it >> isn't super easy because we don't have much in the way of ipv6 parsing >> in PHP yet, so it will probably involve finding some decent code that >> can expand an ipv6 notation into something we can logically separate. >> That might also mean a rewrite of the _php_filter_validate_ipv6() >> function in the same file. >> >> Another one, if you are interested in encoding issues: >> >> http://bugs.php.net/bug.php?id=49687 >> >> I don't necessarily agree with Scott that it is wrong to expect >> addslashes() to validate the input string. It could call >> get_next_char() the same way php_escape_html_entities_ex() in >> ext/standard/html.c does. And we need that utf8_decode() fix mentioned >> in the report reviewed/committed if it hasn't been already. >> > > I fixed utf8_decode and I had a patch for adding utf8_validate which is probably suitable for 5.4. > > http://whisky.macvicar.net/patches/utf8-string.diff.txt > > It's not quite done, I had intentions of adding support for using truncate, simple true / false for valid or the unicode replacement character. My only issue with this is that it essentially duplicates the utf8 part of get_next_char() from html.c. I'd like to see cs parsing in one place instead of spread out all over the code tree. The get_next_char() function also supports other charsets, so we could have a more generic cs_validate() function along with utf8_validate(). -Rasmus