Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118342 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 68085 invoked from network); 2 Aug 2022 15:08:51 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 2 Aug 2022 15:08:51 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id CA721180506 for ; Tue, 2 Aug 2022 10:08:40 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS2635 192.0.84.0/24 X-Spam-Virus: No X-Envelope-From: Received: from mx1.dfw.automattic.com (mx1.dfw.automattic.com [192.0.84.151]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 2 Aug 2022 10:08:40 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by mx1.dfw.automattic.com (Postfix) with ESMTP id D34281D07B9 for ; Tue, 2 Aug 2022 17:08:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; h=content-type:content-type:x-mailer:mime-version:message-id :subject:subject:from:from:date:date:received:received:received :received:received:received; s=automattic1; t=1659460119; bh=aG/ ckFQhvXzBuykTEf8hCTy/ezb0IMgEDt8hU4sQzdg=; b=kNVFpSvhsAr9QSZtRhg oGjiihJaWx4LN4yZQT/tBCqV9rCep+dbQPcFRxmYefHao2zbwdLV7DTUnun+gYzi qw3hzWt/0Po4+10bp9A/oRH3LSkYwBlhj7r18WnbeHxW4BrMYi8VbGj+8q2Gar+m 5KcAKjmPqhcGZOWM0IcDg6gxCfREoEC4D/WeaI7GXwsbw27Q2YBL1kO4nOMPLyDE 9xs6uP3FPCDsSCSHH40Ej3fklVqukR38dgTYeRde20Va7esXVd8or+5L+yvKMl85 9384dKhcbXXVFOaLndqXYXc/S7JBQbZRgR8lAf8Xq/cquQ5KRlEchDhykBEJQTNh xmQ== X-Virus-Scanned: Debian amavisd-new at wordpress.com Received: from mx1.dfw.automattic.com ([127.0.0.1]) by localhost (mx1.dfw.automattic.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2aUFGt156xaF for ; Tue, 2 Aug 2022 17:08:39 +0000 (UTC) Received: from smtp-gw.dca.automattic.com (smtp-gw.dca.automattic.com [192.0.97.210]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx1.dfw.automattic.com (Postfix) with ESMTPS id 799981D0675 for ; Tue, 2 Aug 2022 17:08:39 +0000 (UTC) Authentication-Results: mail.automattic.com; dkim=pass (2048-bit key; unprotected) header.d=automattic.com header.i=@automattic.com header.b="d7WfKhYe"; dkim-atps=neutral Received: from smtp-gw.dca.automattic.com (localhost.localdomain [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw.dca.automattic.com (Postfix) with ESMTPS id 1E380A096B for ; Tue, 2 Aug 2022 17:08:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=automattic.com; s=automattic1; t=1659460119; bh=aG/ckFQhvXzBuykTEf8hCTy/ezb0IMgEDt8hU4sQzdg=; h=Date:From:To:Subject:From; b=d7WfKhYeqYG4cytbJ1IoTd0P75XxYvzwKUV45iq1ZOHFzy9o8KIMe/HxPfsXNAGu6 FJpNYvVGAkXfQX49GKC8QLNApFqi17UPzQS3MyDCtLwQdk2eaM2n5Y2IzguDR/R13V GD6uPgUCRavyESXBRS2asJ5S/9am+PVbLh2QqUjgZGcX0+P1yagqJMQIv+ixNo27p/ 1grJOv2tAe5cOdQpYbKn/kuL51cmduPsSZX1PJh37dBVadmQg785v7Spni213eCFXT HpT6L6g8LnxNBWWVZoX2ogs0focivFHnjGJhPqLfO8Vi2khObL7KOtglEwPcFlPUR5 hR3YT4ykA0KeA== Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-gw.dca.automattic.com (Postfix) with ESMTPS id 0778DA06F2 for ; Tue, 2 Aug 2022 17:08:39 +0000 (UTC) Received: by mail-lj1-f199.google.com with SMTP id by17-20020a05651c1a1100b0025e54bda6c7so1505048ljb.22 for ; Tue, 02 Aug 2022 10:08:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:subject:to:from:date:x-gm-message-state :from:to:cc; bh=FLD4LNxUvpvBi4wXwXrquI41KbLlXQURI2BAxcADITs=; b=EqBpfYZuSRx517yvAHEXjMO7WShQ/9/Y78nt7BB0s0m55rdO8j8TCwh0SYefpCxOIw 6R88X94jaa+agYM8q2n9IumyQddow3IxRiFp3S2mWyRprH3y5uYIbBnRrYUkynijh58W zj1nbz1Eb4FgYixhezNp/2Qf48ZSQ2atCw7SasYzCrgi43ucoy1Hs2y5iBVhaP74DkVW n/rfxaBlSJM39YDhmLUGq2J3/A4TG3sgReMGlXf6P95z0rlfcQANXlmApH716s9/AiG7 AcVsS9XxScwiDY0f8E4Kdm3XTBVxOwYpvgKn72iYqitGwMzHfW81qCelgd3BfHWepJgS nL9w== X-Gm-Message-State: ACgBeo2pvmb0UcSUq2W39xXSzuMEi4YDC9k++AwRKm6c+cefrBezbkcq bqZHfaozS0Nmav7ZaIFVSTw+pQVh88bo6YapYJpodYNXo8xG9rJei+91HwE2GykFnFSzPIwdYze 4URz7NbxSMk3k1BOOlQ== X-Received: by 2002:a2e:9415:0:b0:25e:477b:adc9 with SMTP id i21-20020a2e9415000000b0025e477badc9mr4475766ljh.109.1659460117654; Tue, 02 Aug 2022 10:08:37 -0700 (PDT) X-Google-Smtp-Source: AA6agR6hXyu1dVuxxPVsU27RXRgj+/2wml7wsbcICJftXxhFHkIBGlKVOXYmOfd9Zqbw11j95C9WKA== X-Received: by 2002:a2e:9415:0:b0:25e:477b:adc9 with SMTP id i21-20020a2e9415000000b0025e477badc9mr4475761ljh.109.1659460117258; Tue, 02 Aug 2022 10:08:37 -0700 (PDT) Received: from [192.168.142.246] (37-136-6-20.rev.dnainternet.fi. [37.136.6.20]) by smtp.gmail.com with ESMTPSA id m1-20020a056512358100b0048af6019c01sm1012936lfr.246.2022.08.02.10.08.35 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Aug 2022 10:08:36 -0700 (PDT) Date: Tue, 02 Aug 2022 10:08:36 -0700 (PDT) X-Google-Original-Date: 02 Aug 2022 20:08:35 +0300 X-Google-Original-From: Dennis Snell To: Message-ID: MIME-Version: 1.0 X-Mailer: Unibox (443:21.6.0) Content-Type: multipart/alternative; boundary="=_2AABE5D5-7299-4DFC-8805-96640C4AFE52" Subject: Re: RFC Idea - is_json - looking for feedback From: dennis.snell@automattic.com (Dennis Snell) --=_2AABE5D5-7299-4DFC-8805-96640C4AFE52 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Still new here so what I say probably doesn't amount to much, but I too see= this as encouraging the generally-less-appropriate sequence of: parse to d= etermine if it's valid JSON via `is_json()`; and then parse again using `js= on_decode()` The name `is_valid_json()` rings truer to me because we are dealing with va= lues that may or may not be "JSON" even if they can parse as such. For exam= ple, the string `"false"` is valid JSON _if interpreted as JSON_ but need n= ot _be_ JSON in truth. The same is true for `"[1, 2, 3]"` and even still I = suppose it's possible to have a string value that works as a JSON map _if i= nterpreted as JSON_ but that data may not have been created with the purpos= e of being JSON. Again, it's _valid JSON_ if interpreted that way, but may = not in fact _be_ JSON. That being said, the idea of putting this behind `filter_var()` seems like = it would do less to encourage poor practice (as in, it would be a good thin= g). That one is likely harder to find, doesn't sound as appealing as `is_js= on()` or `is_valid_json()`, and would stick out more if someone were to use= it casually where `json_decode()` alone is more appropriate. Also that fun= ction already deals with validation and provides a way to pass the options = we would need (max depth, JSON_INVALID_UTF8_SUBSTITUTE, and=C2=A0JSON_INVAL= ID_UTF8_IGNORE) if we wanted this to match the behavior of `json_decode()`. The examples you found are good motivations for a leaner validation check, = but I wonder if they actually represent a great need. If a good user-space = solution exists those few frameworks could use that effectively and presuma= bly the need for a core function would disappear. I think personally I woul= d rather see this go through if our performance improved substantially more= than it shows in the posted benchmarks. While I realize what you are doing= so far is reusing the existing JSON parser and so we shouldn't expect an e= ntirely different performance profile, I agree with the others that the imp= lementation here is somewhat intrinsically bound with the idea itself. In comparison I have created a naive version in PHP itself with PCRE calls = advancing the JSON tokenization:=C2=A0https://pastebin.com/Cf8BZn1H There are certainly glaring bugs in this because I tossed it together while= on a train and only wanted to get a reasonable approximation of the perfor= mance characteristics for in-PHP code. In your benchmark it runs about twic= e as slow as `json_decode()` but uses essentially no memory (matching your = results with `is_json()`). I'm pretty sure there are easy ways to eliminate= some low-hanging performance bottlenecks, though it doesn't check for vali= d UTF8 or valid escaped strings or check for a maximum depth. I feel like i= f we are going to add a new native function for this it should be way faste= r than four times as fast as a primitive function in PHP user-space. Warmly, Dennis Snell --=_2AABE5D5-7299-4DFC-8805-96640C4AFE52--