Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118502 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 67551 invoked from network); 26 Aug 2022 10:15:50 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 26 Aug 2022 10:15:50 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8B550180550 for ; Fri, 26 Aug 2022 03:15:49 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS24940 176.9.0.0/16 X-Spam-Virus: No X-Envelope-From: Received: from chrono.xqk7.com (chrono.xqk7.com [176.9.45.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 26 Aug 2022 03:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bastelstu.be; s=mail20171119; t=1661508947; bh=VzLZZGvUafVi3tcGtNySuFpLYKoP+LBP+lF9lM/DqV0=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=KF4/EAORzIhgXh4p+gQrkFDnI8TGupnzvGt1XEiZTx05jUyVU87AIq0/IGAoRSnoY PMSJACfPp3DsgdgbXoKc0PI1YfnEjz9wAe7PZFP5MEeu83ZHwiD8k79jiZJzStO9d/ lcZoeFbSVCs5pMDEOYFI89q06DpYsIjNs6LDSuULVoHPOMhqD3O2zZmGYNI7cGxMWA Khc9iAndsp/3rPHWabuOBLMrsXlzYW2HnD0xRQxSdcGzs6oj22mqJJJr2NlfUr8alg 7kWxn8HpfpymLg5J1GBoKu5PPrWNJonSyvehmS2ZLWu/z72TBf7XYG1jbfu4NdzFdI zsXytNefpLyhg== Message-ID: <4e9741c0-a338-f9af-4d78-705db6bcf5b4@bastelstu.be> Date: Fri, 26 Aug 2022 12:15:47 +0200 MIME-Version: 1.0 Content-Language: en-US To: Hans Henrik Bergan , =?UTF-8?Q?Micha=c5=82_Marcin_Brzuchalski?= Cc: Dusk , David Gebler , juan carlos morales , PHP Internals List References: <8D53AD5B-7CFC-4820-9EE4-FEB365D327A8@woofle.net> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] RFC json_validate() - status: Under Discussion From: tim@bastelstu.be (=?UTF-8?Q?Tim_D=c3=bcsterhus?=) Hi On 8/26/22 11:14, Hans Henrik Bergan wrote: >> you can't efficiently validate JSON in userland > > Has anyone actually put that claim to the test? Has anyone actually made a > userland json validator (not just wrap json_decode()/json_last_error()) for > performance comparison? > ( if not, https://www.json.org/JSON_checker/JSON_checker.c would probably > be a good start) > Worded like "you can't efficiently" the claim is false. Of course you can memory-efficiently validate the input by traversing the string byte by byte and keeping track of the nesting. However the points that make a userland implementation infeasible are: 1. Writing a JSON parser is non-trivial as evidenced by: https://github.com/nst/JSONTestSuite. I expect userland implementations to be subtly buggy in edge cases. The JSON parser in PHP 7.0+ is certainly more battle-tested and in fact it appears to pass all of the tests in the linked test suite. 2. Even if the userland implementation is written very carefully, it might behave differently than the native implementation used by json_decode() (e.g. because the latter is buggy for some reason or because the correct behavior is undefined). This would imply that an input string that was successfully validated by your userland parser might ultimately fail to parse when passed to json_decode(). This is exactly what you don't want to happen. Best regards Tim Düsterhus