Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118339 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 95182 invoked from network); 1 Aug 2022 15:36:33 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 1 Aug 2022 15:36:33 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id B942D180505 for ; Mon, 1 Aug 2022 10:36:07 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 1 Aug 2022 10:36:07 -0700 (PDT) Received: by mail-ed1-f42.google.com with SMTP id p5so14649264edi.12 for ; Mon, 01 Aug 2022 10:36:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc; bh=34kWbU0Uh+9uAS8/lmgrTxbyWE4hCW9srxeZ2lgaJBU=; b=S0q4pQ3UCcDK7UYohrOucSY8iUrv5swKM26P3d8Trx3aoAXfrZkjw0lxep2whE8kzg u3lD3/q1SAk4oGkdXq8NmJ7jlmC0gIxlXzFECfWH+d4uk/fOumISBlBnTZ3LeBrfbSJ6 OORYucA0mk8Itd7MlruwxYPo711kWFpggoKRfzDyH+FUXFC/U1o7/TpggcJDZpDjNAwn 8mWnVYto+uRzddBNh4lxgBSqfMXnXLT/Gtvh7R5IWlTspODxvCcb1FDBOExdMZn79SJe 34GeSi5fZp8fLj6NEXYrIAnYqBc0zIS1AfkZQ7+k4dkZaD4oqQ46P0GJESsUSsFVoL7u WtGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc; bh=34kWbU0Uh+9uAS8/lmgrTxbyWE4hCW9srxeZ2lgaJBU=; b=C73Zj8z8FitcBYWlrlnWLl5Bc5571/ifVIePcr1wqgJFvatFO5v5YhWPR/oX/ANHqI +iIdv21rW9umcVFEZoTMAnJ3DXx0yZg6prlux+R1OmVVeZnY5pyVIUc+g+zd09npff3N 2yTKmxHTLKbTndH7FAy9ecnshfCzgrD3DNJVSIugsOGCU+mqNGtztyr+EPTyki0pkrWa 6xLILjOnH0A4a4tgHxYKgnL3u2kf6Q+3Zq7sfMWjHZ6fQlnIDTSMt87O43TT7xjSW/6b FQ2S8KLi8CB0FMJMqAma/jcizVbSwLI+cg7u931yV7SOD4rmmPfb2y+dooyiMBGhGBb3 hj9g== X-Gm-Message-State: AJIora84uQrD1zibqj+96E3o3w/ip0srUgasvm5S7lRIpn3FfaMocSlN Dbm1vfOSD/HCXHn9/16OHdzZ7sacxsBOXJwBlpila6DkaOg= X-Google-Smtp-Source: AGRyM1sGx5C35+9ATeRhI289ZZ6VcfcEUyPBh5M/C+lrFm+RQJlfiEFipXoCGga6iOI6CYgzvtgK87xh257Q5U+TpqU= X-Received: by 2002:a05:6402:3591:b0:43b:e8c8:a716 with SMTP id y17-20020a056402359100b0043be8c8a716mr16859227edc.356.1659375365607; Mon, 01 Aug 2022 10:36:05 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Mon, 1 Aug 2022 18:35:54 +0100 Message-ID: To: php internals Content-Type: multipart/alternative; boundary="00000000000037552105e5316fe0" Subject: Re: [PHP-DEV] RFC Idea - is_json - looking for feedback From: davidgebler@gmail.com (David Gebler) --00000000000037552105e5316fe0 Content-Type: text/plain; charset="UTF-8" On Sun, Jul 31, 2022 at 4:41 PM Larry Garfield wrote: > So the core argument, it seems, is "there's lots of user-space > implementations already, hence demand, and it would be > better/faster/stronger/we-have-the-technology to do it in C." > There's innumerable features implemented in userland which would be faster/stronger/better done in C. I'm not convinced this alone is a sufficient basis to introduce a new core function. There are also userland JSON streaming parsers which are memory efficient, I've used the one I linked to parse JSON files over 1GB no problem. And I'm not saying I'm against this proposal, I'm just covering devil's advocate here - but while I can accept the number of userland implementations for "validate string as JSON" out there clearly show some demand / use cases for doing this, are there equally numerous examples of issues raised on these product repositories demonstrating userland implementations have commonly been insufficient, encountered OOM errors or otherwise caused problems? This might be an RFC to fix a problem very, very few people have. > > Thus another, arguably more important benchmark would be a C > implementation compared to a userspace implementation of the same > algorithm. Presumably your C code is doing some kind of stream-based > validation with braces/quotes matching rather than a naive "try and parse > and see if it breaks." We would need to see benchmarks of the same > stream-based validation in C vs PHP, as that's the real distinction. That > a stream validator would be more memory efficient than a full parser is not > at all surprising, but that's also not a fair comparison. > > As for the benchmarks themselves, do not use memory_get_usage(); as noted, > it shows the memory usage at that time, not ever. What you want is > memory_get_peak_usage(), which gets the highest the memory usage has gotten > in that script run. Or, even better, use PHPBench with separate sample > methods to compare various different implementations. It will handle all > the "run many times and average the results and throw out outliers" and > such for you. It's quite a flexible tool once you get the hang of it. > > I'll also note that it would be to your benefit to share the working C > code as a patch/PR already. If accepted it would be released open source > anyway, so letting people see the proposed code now can only help your > case; unless the code is awful, in which case showing it later would only > waste your time and everyone else's discussing it in the abstract before > the implementation could be reviewed. > > --Larry Garfield > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > > --00000000000037552105e5316fe0--