Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118504 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 78070 invoked from network); 26 Aug 2022 10:48:40 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 26 Aug 2022 10:48:40 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8808018054F for ; Fri, 26 Aug 2022 03:48:39 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 26 Aug 2022 03:48:39 -0700 (PDT) Received: by mail-vs1-f45.google.com with SMTP id c3so1259078vsc.6 for ; Fri, 26 Aug 2022 03:48:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=wthNHPLt+N4RMWYblJZw7Ll+UqfoYfY204iGC6bHp3M=; b=JOykj4QD8GQngdsTGTlpxeIfWznjgXodIYPtyf3DJOiwFEL5SwtqzAOmkdMn969CPF yLwPI5HFNe7AC11X+aZO4W1feCQYFH+AJSlhIC7wnigP5gPC+Ux7KHnN2GPJ9fFEoqi4 3MsqZnPIwV7IQ5Ksu5aiWgIUJ9QHip1ArJO3J3LdXbooEP0GQHl/sujpKxfQNCZ3EHI4 3nvA6g7EqGbGTGI/0bWJog67KHLdypu2UeJtAUY6SopZ2m5Ba+u0AyuJqLiB6l47obOB +v+VtVImPX9TJ4Ag0PW8bDh4/uV+Ww3j2ii2K5Cq3x9w5DgH18wI0HyOnoPHP5CGkpQG pSlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=wthNHPLt+N4RMWYblJZw7Ll+UqfoYfY204iGC6bHp3M=; b=c8XjP58YM/Fe4BbGFSBcnKiYGwdmGZrJ/ZPGIkEDSEm/v492tHqk565Jf3SaGUCJcG dmYNPHHayl8SL5+wUiMaFNkUFwD/9+VgXBFC/QGjgIT5XfSEpT8gqvxkmkSAVYeyWMMX xm99FbsyzMsoGEdOBj5rkJE5AyPAS+/Ex+IToJ+IlaoszOoryAuwuRgziNlk3w6cw9OF XP9vhpOss/MFM/u7lYYWjhtGwOkUfudoOX7azfPv7bueXQg8vN9z9+B4+pWcK3tnnCvM wEq09bfHWLaZ/9LDZdoFqa/xcNet1xa93vQiQqskKlUE7QedydWRnB4tscBL+a0sDfrH F4Ew== X-Gm-Message-State: ACgBeo1UFzGTg8xKWlMfMQYKtFsuWYQNZ+V5lUumghrG/KIUTygq6Knq g/T9Ru0itGBzTVJYdh3m5xJGQHN14TtT8ScY/rQ= X-Google-Smtp-Source: AA6agR6x3lRggK7h1VrYTfWN9CJNfgFN9hNVVEHK+JLCQ8CSM0bNI51noHbDoDaiDlScW1KR8EnaUdR/EucH+UKu3Bw= X-Received: by 2002:a67:fdda:0:b0:390:a2fd:b551 with SMTP id l26-20020a67fdda000000b00390a2fdb551mr2344184vsq.63.1661510918440; Fri, 26 Aug 2022 03:48:38 -0700 (PDT) MIME-Version: 1.0 References: <8D53AD5B-7CFC-4820-9EE4-FEB365D327A8@woofle.net> <4e9741c0-a338-f9af-4d78-705db6bcf5b4@bastelstu.be> In-Reply-To: <4e9741c0-a338-f9af-4d78-705db6bcf5b4@bastelstu.be> Date: Fri, 26 Aug 2022 12:48:27 +0200 Message-ID: To: =?UTF-8?Q?Tim_D=C3=BCsterhus?= Cc: Hans Henrik Bergan , Dusk , David Gebler , juan carlos morales , PHP Internals List Content-Type: multipart/alternative; boundary="000000000000158f8305e722a8ac" Subject: Re: [PHP-DEV] RFC json_validate() - status: Under Discussion From: michal.brzuchalski@gmail.com (=?UTF-8?Q?Micha=C5=82_Marcin_Brzuchalski?=) --000000000000158f8305e722a8ac Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Tim, pt., 26 sie 2022 o 12:15 Tim D=C3=BCsterhus napisa=C5=82= (a): > Hi > > On 8/26/22 11:14, Hans Henrik Bergan wrote: > >> you can't efficiently validate JSON in userland > > > > Has anyone actually put that claim to the test? Has anyone actually mad= e > a > > userland json validator (not just wrap json_decode()/json_last_error()) > for > > performance comparison? > > ( if not, https://www.json.org/JSON_checker/JSON_checker.c would > probably > > be a good start) > > > > Worded like "you can't efficiently" the claim is false. Of course you > can memory-efficiently validate the input by traversing the string byte > by byte and keeping track of the nesting. > > However the points that make a userland implementation infeasible are: > > 1. Writing a JSON parser is non-trivial as evidenced by: > https://github.com/nst/JSONTestSuite. I expect userland implementations > to be subtly buggy in edge cases. The JSON parser in PHP 7.0+ is > certainly more battle-tested and in fact it appears to pass all of the > tests in the linked test suite. > > 2. Even if the userland implementation is written very carefully, it > might behave differently than the native implementation used by > json_decode() (e.g. because the latter is buggy for some reason or > because the correct behavior is undefined). This would imply that an > input string that was successfully validated by your userland parser > might ultimately fail to parse when passed to json_decode(). This is > exactly what you don't want to happen. > Now this is an argument I could think of. But that one is not even mentioned in RFC. The JSON_checker.c example delivered by json.org is probably not something impossible as it required around 1h of work to port it see working implementation here https://gist.github.com/brzuchal/37e888d9b13937891c3e05fead5042bc Cheers, Micha=C5=82 Marcin Brzuchalski --000000000000158f8305e722a8ac--