Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118321 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 8853 invoked from network); 30 Jul 2022 08:08:13 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 30 Jul 2022 08:08:13 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 44433180380 for ; Sat, 30 Jul 2022 03:07:13 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 30 Jul 2022 03:07:12 -0700 (PDT) Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-324293f1414so36739807b3.0 for ; Sat, 30 Jul 2022 03:07:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ps+lYE1hxv1R6Fj792pzYDKyA9e2nh/99tIKZ5ueU0c=; b=czHYcX1Vf+cnBzFoFPvDDIysih6tpERAdbOCkSTkp+cqPJMhRU5vSTayaExalpNnPY QgW5w8/EaALq/VYgiCWafJbGPP0HLlHMJXuIa6QqILf/5pPMj/1LAqsNqxtQSiEa23P0 hRUXpufhTyodHZl6EzIfYNu+uSa/U+W0LMAIiqgbaJdSiKwa5OrL+4RHBv5I0a23KAlh 2kAe4R3AnPLwT8RxvGGIp/8buQIkqentg7v3+DloHh2QHM1jhqO/9b8TiPvrhs8ihFPX r+84yvfYy7vi3ItMJRJQWGa7lpkca5dVgfcncAH1unRYbWkaxc+B9P9S/LbJSggQ0tIw UsgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ps+lYE1hxv1R6Fj792pzYDKyA9e2nh/99tIKZ5ueU0c=; b=TRgPcXu1zdo4j1EXKQSW54cGBwRu4tFzxwvLwvBo3P+wUWntwak1UmknEnTNSn1G7c v4aDH1KC6R3LnEqIm9VV5yb8kBpQ5PnCLeGIP6Kjwq6EJwWJQMCFY5MTT2lBbHZnMe1H Sxh6FHG6h0U3xI2vKziBSU2WCdqtKdYVx3mMZU0PJ3v0pORQz4jWt1t1S+JivaPuAaRj 7D9LHLoapt4Lq3/XfWYlnGVUoAWyKI/6pYuW7cHhR7ShheBQLF+qLAoa0y/KDh70xeFl pqNOd7KJSYA0/gJ4Iv6hEQNuydE+6f4yYtc7Ksnh0RBcjG9RqZFZ+m25Ch/NZeObGrrQ NPUA== X-Gm-Message-State: ACgBeo0JeVNEfogPnQlHJTLZWTPlclM0mAMkMLLwfSGNaVEVQ6RfiNS/ gCXA54E5jOgzjl5si4YEo8vVh5mguYje3aQHqSY= X-Google-Smtp-Source: AA6agR7YjJyBkT1XaAdD4LPPXe8BW9D6uc6BMYL0PW0GzloGIQicNpzb7iKN5yMxmucKfW9n3DBLm7Jm3oExWoMoSv4= X-Received: by 2002:a81:3686:0:b0:31e:322a:f3af with SMTP id d128-20020a813686000000b0031e322af3afmr5897362ywa.497.1659175632356; Sat, 30 Jul 2022 03:07:12 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Sat, 30 Jul 2022 12:07:00 +0200 Message-ID: To: Oleksii Bulba Cc: PHP Internals List Content-Type: multipart/alternative; boundary="0000000000002fc7b905e502ee4d" Subject: Re: [PHP-DEV] RFC Idea - is_json - looking for feedback From: dev.juan.morales@gmail.com (juan carlos morales) --0000000000002fc7b905e502ee4d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable El s=C3=A1b., 30 de julio de 2022 11:58, Oleksii Bulba escribi=C3=B3: > > $memoryAfter =3D memory_get_usage(true) / 1024 / 1024; > > I see that you used `memory_get_usage` that shows memory usage at the tim= e > of the function call. > As far as I understand, your function does not return any value, > so I suspect that it is obvious that the memory usage after the function > call is the same as before. > But what is the actual memory usage during the function call? > Can you run the same benchmark but with `memory_get_peak_usage` function > to see how your function uses memory? > > # $memoryAfter =3D memory_get_peak_usage(true) / 1024 / 1024; > > Also, I'm concerned if it would be better to name the function > `is_json_valid`? > > On Sat, Jul 30, 2022 at 10:37 AM juan carlos morales < > dev.juan.morales@gmail.com> wrote: > >> Just want to clarify that when I mentioned the use of memory, I wrote do= wn >> the function "memory_get_usage()", which basically gives us the memory >> handle by php that is related to the memory_limit INI setting. >> >> Now I will provide a benchmark I have done really quick: >> >> # Code used (I have the implementation of is_json() done already) >> >> > // make sure you set your memory limit to -1 before running this code >> // Here we create a very very very big json string, really big >> $limit =3D 1000000; >> $jsonString =3D '{ "test": { "foo": "bar" },'; >> >> for ($i=3D0; $i < $limit; $i++) { >> $jsonString .=3D " \"test$i\": { \"foo\": { \"test\" : { \"foo\" : { >> \"test\" : { \"foo\" : \"bar\" }}}}},"; >> } >> >> $jsonString .=3D ' "testXXXX": { "foo": "replaceme" } }'; >> //{ "test" : { "foo" : "bar" }}} >> >> $memoryBefore =3D memory_get_usage(true) / 1024 / 1024; >> echo "Megas used before call: " . $memoryBefore . PHP_EOL; >> >> $start =3D microtime(true); >> >> json_decode($jsonString, null, $limit, 0); >> //<------------------ un/comment to show/hide results for json_decode() >> //is_json($jsonString); >> //<------------------ un/comment to show/hide results for is_json() >> >> $memoryAfter =3D memory_get_usage(true) / 1024 / 1024; >> echo "Megas used after call: " . $memoryAfter . PHP_EOL; >> >> echo "Difference: " . ($memoryAfter - $memoryBefore) . PHP_EOL; >> >> echo "Time: " . (microtime(true) - $start) . " seconds" . PHP_EOL; >> return; >> >> # Results >> ## json_decode() >> Megas used before call: 79.23828125 >> Megas used after call: 3269.23828125 >> Difference: 3190 >> Time: 12.091101884842 seconds >> >> ## is_json() >> Megas used before call: 79.23828125 >> Megas used after call: 79.23828125 >> Difference: 0 >> Time: 5.4537169933319 seconds >> >> >> And yes, I am open to share the implementation, but after I write the RF= C. >> >> Thanks for taking your time to give me a feedback. >> >> El s=C3=A1b, 30 jul 2022 a las 3:50, Jordan LeDoux () >> escribi=C3=B3: >> >> > >> > >> > On Fri, Jul 29, 2022 at 7:27 AM juan carlos morales < >> > dev.juan.morales@gmail.com> wrote: >> > >> >> # Why this function ? >> >> >> >> At the moment the only way to determine if a JSON-string is valid we >> have >> >> to execute the json_decode() function. >> >> >> >> The drawback about this, is that json_decode() generates an in memory >> an >> >> object/array (depending on parameters) while parsing the string; this >> >> leads >> >> to a memory usage that is not needed (because we use memory for >> creating >> >> the object/array) and also can cause an error for reaching the >> >> memory-limit >> >> of the php process. >> >> >> >> Sometimes we just need to know is the string is a valid json or not, >> and >> >> nothing else. >> >> >> > >> > You say that you have a first-pass at the implementation done. I'd be >> > curious to see it. My initial thought was that in order to validate th= e >> > string, you likely need to allocate extra memory as part of the >> validation >> > that depends on the string size. You'd definitely save the overhead of= a >> > ZVAL, but for such an object that overhead is likely negligible. >> > >> > So I guess my question would be: in the actual implementation that >> lands, >> > how much memory would this actually save compared to json_decode()? Th= is >> > seems like it would make the RFC tricky, as the purported benefit of t= he >> > RFC depends very tightly on the implementation that lands. >> > >> > Jordan >> > >> > Hello Jordan, thanks for the feedback. I think the benchmark talks by itself (also for the memory Save question). Also by the fact that in order to run it for json_decode() rhe memory limit needs to be super high or -1 (no limit, not a good idea in production right?) The advantage here is to be able to parse huge strings without reaching the memory limit set in the INI settings. I take It as a "IF THIS IS AS GOOD AS IT SEEMS THEN YES" :D Once again... Thanks --0000000000002fc7b905e502ee4d--