Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:86469 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 66300 invoked from network); 1 Jun 2015 18:07:32 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 1 Jun 2015 18:07:32 -0000 Authentication-Results: pb1.pair.com smtp.mail=jakub.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=jakub.php@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.213.179 as permitted sender) X-PHP-List-Original-Sender: jakub.php@gmail.com X-Host-Fingerprint: 209.85.213.179 mail-ig0-f179.google.com Received: from [209.85.213.179] ([209.85.213.179:34325] helo=mail-ig0-f179.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id DC/D0-59119-06F9C655 for ; Mon, 01 Jun 2015 14:07:29 -0400 Received: by igbhj9 with SMTP id hj9so67851597igb.1 for ; Mon, 01 Jun 2015 11:07:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=FTa5RAK7gHyCHKX6IjiHjXd/SiMSwTy5ynRN5z6c45c=; b=W4kGhWOB1EaGomsQEjFAupeblQmJorLsH2GhuX1RK2JTSffYDso3hDHPWA4+Cuq0Oo O96p3E55Jcx4eMXPUYE/dn+6i5UVmP0uYydhbxsd4z1xybeoY7hZubCt4WS63nGJVfSH nKQa4h3yNpMnzCjpm/+3hRgjPWJNCarQt8cDp4ELd1cHNkS+WYkBKd32Wi11MWN8bCzI M1VFBUuBiHVcxNLF10vtzIik6EeP6qyDpeyUardRE5RK5XIKlIUvn2BPuwloj2mhreT7 O7At8QaHPW3WqG7pSoBIsneUbrkoG+FHjLSfu673T7vVpr7zhEtDgKnyRBigffSLK35Y YiFg== MIME-Version: 1.0 X-Received: by 10.50.147.10 with SMTP id tg10mr15379339igb.36.1433182046216; Mon, 01 Jun 2015 11:07:26 -0700 (PDT) Sender: jakub.php@gmail.com Received: by 10.107.153.74 with HTTP; Mon, 1 Jun 2015 11:07:26 -0700 (PDT) In-Reply-To: References: Date: Mon, 1 Jun 2015 19:07:26 +0100 X-Google-Sender-Auth: hT3cPmi3uqucrdsMpJJOZAgDg4k Message-ID: To: Yasuo Ohgaki Cc: PHP internals list Content-Type: multipart/alternative; boundary=089e0122a2d0c36c2b051778b2b5 Subject: Re: [PHP-DEV] JSON unicode escape issue and new constants From: bukka@php.net (Jakub Zelenka) --089e0122a2d0c36c2b051778b2b5 Content-Type: text/plain; charset=UTF-8 Hi Yasuo, On Mon, Jun 1, 2015 at 1:10 AM, Yasuo Ohgaki wrote: > > > Any invalid chars as variable/property name should be handled as invalid. > > Valid variable name: '[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*' > http://php.net/manual/en/language.variables.basics.php > > This violates JSON spec, but if user would like to allow invalid names. It > should be an option rather than the default. IMO. > > [yohgaki@dev ~]$ php > $o = new StdClass; > $o->{123} = 11; > > var_dump($o); > ?> > > class stdClass#1 (1) { > public $123 => > int(11) > } > [yohgaki@dev ~]$ php > $o = new StdClass; > $o->123; > > var_dump($o); > ?> > > PHP Parse error: syntax error, unexpected '123' (T_LNUMBER), expecting > identifier (T_STRING) or variable (T_VARIABLE) or '{' or '$' in - on line 3 > > As you showed in your example, these names are not invalid, you just need to enclose them ( $o->{"123"} ). This is a basic PHP thing and JSON parser should not worry about users that don't know that. > > Since JSON string must be UTF-8/16/32, any invalid UTF sequence > could be treated as invalid. > > 8.1. Character Encoding > > JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default > encoding is UTF-8, and JSON texts that are encoded in UTF-8 are > interoperable in the sense that they will be read successfully by the > maximum number of implementations; there are many implementations > that cannot successfully read texts in other encodings (such as > UTF-16 and UTF-32). > > Implementations MUST NOT add a byte order mark to the beginning of a > JSON text. In the interests of interoperability, implementations > that parse JSON texts MAY ignore the presence of a byte order mark > rather than treating it as an error. > https://tools.ietf.org/html/rfc7159#section-8.1 > > I prefer BOM as invalid sequence and raising error/return NULL. > > PHP JSON parser accepts only UTF-8 and this is already correctly validated so I don't see any issue here either. > > JSON_ERROR_UTF16 would be better defined as JSON_ERROR_UTF as > JSON accepts valid UTF sequence. > The thing is that we have already JSON_ERROR_UTF8 error that is raised when input binary string is invalid. So the JSON_ERROR_UTF16 was meant to distinguish these two errors. I'm happy for other ideas but not sure about JSON_ERROR_UTF as it might be confusing with JSON_ERROR_UTF8. > It's also better to reject any invalid UTF sequence, not limited to > Unicode escaped > (\uXXXX) string. If it does not validate Unicode sequence, I would add the > validation. > The single surrogate is actually the only case when it can result in invalid unicode string. > JSON does not forbid object property begins with digits. I'm not sure how > currently handled, but it should result in error like NULL. IMO. > As noted above: see http://3v4l.org/sJo8p > Since OWASP starts advocating Unicode escape for all names and values in > JSON, I would like to have ability to encode all chars as \uXXXX by > default. > i.e. Escape all \r, \n, a, b, c, 0, 1, 2, etc as \uXXXX by default, > disable \uXXXX > encoding as an option. > I think that we are a bit late for such change as it is a bit bigger and also a BC break which would require RFC. > > BTW, any progress on disabling automatic float conversion against float > like > values? This is mandatory, IMHO. > The RFC ( https://wiki.php.net/rfc/json_numeric_as_string ) is under discussion: https://www.mail-archive.com/internals@lists.php.net/msg78683.html Cheers Jakub --089e0122a2d0c36c2b051778b2b5--