Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118524 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 44851 invoked from network); 26 Aug 2022 23:31:35 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 26 Aug 2022 23:31:35 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 4F00E1804B5 for ; Fri, 26 Aug 2022 16:31:34 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-vk1-f171.google.com (mail-vk1-f171.google.com [209.85.221.171]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 26 Aug 2022 16:31:33 -0700 (PDT) Received: by mail-vk1-f171.google.com with SMTP id g185so1342022vkb.13 for ; Fri, 26 Aug 2022 16:31:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc; bh=4VVlcLgXzROuDjHl0PoGImnKGiaiMlJ/uTsIUqRNE6I=; b=Mpk4dPiJl/cWm4svnr9UaocwsluoZirE9PPdZGtlTdXd9db+SdhPMFfqLL5e5P66yT i2Smq5IRgSuDuqDjyXqe8NHoT+arjMMPBAiMV3TbMx78AoZAE3TrVmR/XGmGhWfiMxG2 sFH3RC7tCdqBkOyh7is4HAkfdRrqKoHfHy6+S953ZwF4XRGfbltYHBSRL53GjQbdsmSM xgYUSb9llyRYeKO0y7Fw7IpSMGaU4sMcDIUc8hJrSAIhg4mK9IMpSBJqT/ZXY8EW2sM7 aUWQDSUQxdevP9yYTgxTGd8LJCVr7tJ4nkHoCvvk6bTONcM/56XcHIT4NXj3u8yt9R1+ msRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc; bh=4VVlcLgXzROuDjHl0PoGImnKGiaiMlJ/uTsIUqRNE6I=; b=i348ZIH4XHPMSERQlVaDdOIAPIf0WHXXr7TYRURUULAzfDs27UHw4frgjKqpsNqmCl VyZDrH2MBbmq158WBLDSkaToNzw6izhHH+mnUqgwF17lJn3dJgz7oK7UqNC/OVkxzSZP 2Io+e8H6E2Otmb2qA/nAxvsCZO15dFjoyyY/fBiB/v/YJIx2ipKlF5SsOPSiFkGHVRcF BF92qpnU3gPPJGKYkmLRxYsBC/gCbRy9xDfu6XF5bxCu193MEBPc1rkVEA2akY9QJKvU Bz4dvADZKVronUvWRVmB0PH7grQ0gzZLBLL+bYKCsUFDpJ49TQ6p08lGvZ1KnL/Vk63/ Pqbw== X-Gm-Message-State: ACgBeo1QjnbS857DWnn4RxCWcGEq5EEFu7l9Kq6kFpKhjtv7HaIJmZTP e9ppN8e3OxJrBB2C0LDA7mjUE0IKziRghwxyk20Y6NKEoC4= X-Google-Smtp-Source: AA6agR7ITCtUTraK5iAsFilioXQ+YVj8pwsywHRMkjg8FCxGX2pTWYBMj+RYYemFbCl9nyefVmRx5Wq+aAyOByUKJVs= X-Received: by 2002:a1f:1b95:0:b0:378:1fef:61aa with SMTP id b143-20020a1f1b95000000b003781fef61aamr639389vkb.30.1661556692954; Fri, 26 Aug 2022 16:31:32 -0700 (PDT) MIME-Version: 1.0 References: <8D53AD5B-7CFC-4820-9EE4-FEB365D327A8@woofle.net> In-Reply-To: Date: Sat, 27 Aug 2022 01:31:21 +0200 Message-ID: To: PHP Internals List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] RFC json_validate() - status: Under Discussion From: dev.juan.morales@gmail.com (juan carlos morales) I now provide an update of the discussion. The good, the bads, the open questions, etc. All of this will go into the RFC also, as requested by the procedure in https://wiki.php.net/rfc/howto "Listen to the feedback, and try to answer/resolve all questions. Update your RFC to document all the issues and discussions. Cover both the positive and negative arguments. Put the RFC URL into all your replies." =3D=3D=3D UPDATES =3D=3D=3D - Different users have tested the functionality and obtained the promissed results. Also their feedback about it was positive. - Most part of the community in the mailing list showed a positive opinion about this RFC, and looks forward for its integration into PHP. - The ones that checked the code also agree that is small implementation, easy to mantain, and at the same time provides a big benefit for such small implementation. - The community got involve very actively in the discussion of the RFC and provided all kind of useful feedback, and also took the time to test json_validate() by themselves. =3D=3D=3D Bad reasons for json_validate() provided by the community =3D=3D= =3D - One member of the mailing list expressed that: 1) Incorporating such a small implementation that can be achieve with userland code is not a good idea. Quote: "If we keep the tendency to pollute already bloated standard library with an army of small functions that could have not exists and be replaced with normal PHP counterparts IMHO we'll end with frustration from developers as I believe DX slowly falls down here." 2) json_validate() would only be useful for edge cases. Quote: "A `json_decode()` is a substitute that IMO solves 99% of use cases. If I'd follow your logic and accept every small addition that handles 1% of use cases, somebody will raise another RFC for simplexml_validate_string or yaml_validate and the next PhpToken::valid= ate. All above can be valid if we trust that people normally validate 300MB payloads to do nothing if they DON'T fail and there is nothing strange about that." 3) The user also provided an implementation of a JSON parser written in pure PHP. https://gist.github.com/brzuchal/37e888d9b13937891c3e05fead504= 2bc =3D=3D=3D Good reasons for json_validate() provided by the community =3D=3D= =3D @@@ Use cases provided by some members, I quote: - "Yes well-formed JSON from a trusted source tends to be small-ish. But a validation function also needs to deal with non-well-formed JSON, otherwise you would not need to validate it." - "If with a new function (json_validate()) it becomes much easier to defend against a Denial-of-Service attack for some parts of a JSON API, then this can be a good addition just for security reasons." - "fast / efficient validation of a common communication format reduces the attack surface for Denial-of-Service attacks." @@@ Memory usage - During the test of json_validate() from some users, they were happy about the memory usage that was zero in most cases (which is the main benefit out this feature). Someone also did a test with a very large string (75 MB) and only a few bytes were needed as reported by him; also the same user reported an execution speed improvement by a 20-25% over using json_decode(). @@@ Reasons not to depend on userland JSON parsers Even possible to write an excellent JSON parser in PHP like one of the members in the mailing list provided us, there are good reasons for dont relying on userland solutions. # 1 - User Tim D=C3=BCsterhus provided nice thoughts about this, in favor to json_validate(), ... I quote him: - "Writing a JSON parser is non-trivial as evidenced by: https://github.com/nst/JSONTestSuite. I expect userland implementations to be subtly buggy in edge cases. The JSON parser in PHP 7.0+ is certainly more battle-tested and in fact it appears to pass all of the tests in the linked test suite." - "Even if the userland implementation is written very carefully, it might behave differently than the native implementation used by json_decode() (e.g. because the latter is buggy for some reason or because the correct behavior is undefined). This would imply that an input string that was successfully validated by your userland parser might ultimately fail to parse when passed to json_decode(). This is exactly what you don't want to happen." (Some other members including me, also share this opinion.) # 2 - The JSON parser in PHP follows an special convention, marked in the PHP documentation. # 3 - We already have a JSON parser in PHP, that is used by json_decode(); reusing the existing JSON Parser provides 100% compatibility between the validation of a json-string, and the decoding of it. # 4 - The user Larry Gafield also provided good reason to integrate this implementation into PHP. I quote him: "The heuristic I use is that an API should be "reasonably complete" in one location. Having a half-assed API in C and the rest left to inconsistent and redundant user-space implementations is a terrible API; the same would apply for a user-space library that is half-assed and leaves the rest to "someone else to write." Naturally "reasonably complete" is a somewhat squishy term, which is why it's a heuristic. By that metric, yes, str_starts_with() and friends absolutely belonged in core, because we already have a bunch of string functions and str_starts_with() is by a wide margin the most common usage of strpos(). By the same token, yes, json_validate() makes sense to include in the main API, which means in C. If there's a performance benefit to doing so as well, that makes it an easy sell for me." =3D=3D=3D Changes in the implementation =3D=3D=3D @@@ THROW EXCEPTION ON ERROR - The ability to throw an exception on error will be remove from the implementation, as this was pointed not only by the users in the mailing list, but also during code review. There are totally valid arguments to remove this capability. =3D=3D=3D Changes in the RFC =3D=3D=3D - I removed 3 of the provided examples because did not adjust to the RFC purpose. - I still need to add the provided use cases provided by the community into the RFC, where json_validate() will make a good impact. - Updating the RFC requires time, that is why the mailing list will be updated before the RFC itself. =3D=3D=3D Clarification about the name =3D=3D=3D - In the beginning I named the function "is_json()", but I was not following the convention written in: https://github.com/php/php-src/blob/master/CODING_STANDARDS.md#user-functio= nsmethods-naming-conventions That is why I adjusted the implementation with the name "json_validate()" suggested not only during code review, but also by some of the people in the internals-mailing list. =3D=3D=3D Open issues/concerns =3D=3D=3D @@@ Usage of JSON_INVALID_UTF8_IGNORE @@@ - I have my doubts now, because of this codes: ```