Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:86506 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 29830 invoked from network); 7 Jun 2015 19:37:39 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 7 Jun 2015 19:37:39 -0000 Authentication-Results: pb1.pair.com smtp.mail=jakub.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=jakub.php@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.213.180 as permitted sender) X-PHP-List-Original-Sender: jakub.php@gmail.com X-Host-Fingerprint: 209.85.213.180 mail-ig0-f180.google.com Received: from [209.85.213.180] ([209.85.213.180:37849] helo=mail-ig0-f180.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id E4/30-27423-18D94755 for ; Sun, 07 Jun 2015 15:37:37 -0400 Received: by igbsb11 with SMTP id sb11so44535928igb.0 for ; Sun, 07 Jun 2015 12:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=RgpEnn81qvrKTjIucBqwj170Mh686zAD2UpQwTU9mpE=; b=jD9RmpiRHzHhMsat4ZxogCzb5zNqiPPdiLlW/LJdi6aoKFQLSmFtYjOmqyOQBzjRpN RqQCFefvRBOOYTXbRG6x+CO+/Qk4IKhP6RC1lWk9pCVhTvQ4F+5DOiCKkhX2eKK6PS8U 6YQFbKvCumEP/KiFLj9kNAG4TMFLzx9fIY7OraEqRPU/8ZFNan5IoBZCZgRNOIWLXnMI TfVkDxPA0bMVrzvWD3XOIW3bBG0U/rqMS7O/XyVTKEqN7iZZTFk0Bw0X9yibxsVVayP9 OtI+v8B6OVaXMAE4q07Elym5hrkKkH3pFyEtdIdfSUE5ZdCmCl4sekyGRDHrSWU+j/qU W3XA== MIME-Version: 1.0 X-Received: by 10.50.30.197 with SMTP id u5mr9537535igh.9.1433705854132; Sun, 07 Jun 2015 12:37:34 -0700 (PDT) Sender: jakub.php@gmail.com Received: by 10.107.153.74 with HTTP; Sun, 7 Jun 2015 12:37:34 -0700 (PDT) In-Reply-To: References: Date: Sun, 7 Jun 2015 20:37:34 +0100 X-Google-Sender-Auth: fDWL76bJB19l3aLs3MP9vo96C5w Message-ID: To: PHP internals list Content-Type: multipart/alternative; boundary=047d7bdc11be25ea230517f2a804 Subject: Re: JSON unicode escape issue and new constants From: bukka@php.net (Jakub Zelenka) --047d7bdc11be25ea230517f2a804 Content-Type: text/plain; charset=UTF-8 Hi, On Thu, May 28, 2015 at 7:53 PM, Jakub Zelenka wrote: > Hi, > > There are two issues (reported bugs but not really bugs) in json_decode > related to \u escape. > > First one is > json_decode('{"\u0000": 1}'); > reported in https://bugs.php.net/bug.php?id=68546 > > That code result in fatal error due to using malformed property (private > props starting with \0). I don't think that anything parsed in json_decode > should result in a fatal error. That's why I would like to introduce a new > json error called JSON_ERROR_MANGLED_PROPERTY_NAME . > > I have just created a PR for that: https://github.com/php/php-src/pull/1332 . So if any objecting (e.g. error name), then shout now before I merge it to master... > > Second one is > json_decode('"\ud834"'); > which relusts non UTF string from JSON decoder. This is conformant to the > JSON RFC 7159 as noted in section 8.2: > > However, the ABNF in this specification allows member names and > string values to contain bit sequences that cannot encode Unicode > characters; for example, "\uDEAD" (a single unpaired UTF-16 > surrogate). Instances of this have been observed, for example, when > a library truncates a UTF-16 string without checking whether the > truncation split a surrogate pair. The behavior of software that > receives JSON texts containing such values is unpredictable; for > example, implementations might return different values for the length > of a string value or even suffer fatal runtime exceptions. > > > As the behavior is unpredictable, the current default result seems > reasonable because PHP strings are not internally unicode encode. However > there might be cases when user want to make sure that he/she gets unicode > string. In that case I would like to add an option called: > JSON_VALID_ESCAPED_UNICODE which will emit error called JSON_ERROR_UTF16 > when such escape appears. I implemented this in jsond long time ago and > think that it would be useful for the json as well. > > > I have been thinking about this a bit more and I would like to make the error by default and not introduce a new option for that. The RFC actually calls that behavior unpredictable and allows raising error so it's not against the RFC. It also makes sense because other parsers (e.g. Python 2 and 3) do the same. I can't imagine anyone relaying on \uDEAD producing invalid unicode. I think that there are much more users that actually expects valid unicode always produced by json_decode which is not the case at the moment. So it really does not make sense to keep it in PHP 7. If there are no objection, I will create a PR next week. Cheers Jakub --047d7bdc11be25ea230517f2a804--