Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:114637 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 94240 invoked from network); 27 May 2021 13:10:24 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 27 May 2021 13:10:24 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id B96B01804D8 for ; Thu, 27 May 2021 06:22:10 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: *** X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_20, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_SOFTFAIL autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Thu, 27 May 2021 06:22:10 -0700 (PDT) Received: by mail-lj1-f172.google.com with SMTP id b12so800083ljp.1 for ; Thu, 27 May 2021 06:22:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=67vTYqI+8pFF6yFGCiWpajF+a7RPRZinJl/TkGCTfRQ=; b=NZf0lLFnBHuBkXK85gWCVzWfijQsxFX9SNTOngmZNUnqnt4lGoThAuLhcamuPctSZH BUG/3P1HSVxgr69ejvUjzZ0wkW3r5wFoX3ymo0VcmOGRzLGfUlVpptiK3m8SXk/rJwMX p5nvmyLi8MJ/Vf2Kzv/3qCyXGjqtWptV5vdMTHkNuKD1hnMiqFRPl4dObqKXVtJgdpDx eyKgIMTteqJvCSNEEoVS7Tq4B+s7WHhr8Pn9etkR0Yr7gW+Er9wgYn86NDl+vpRJa8MA kQHrQyHjP0Abe5niWQZbp6Wxoovkhfef5CsSnPU2oQOCihiPKzykaT9BTAmeOG5bwh+i HeMQ== X-Gm-Message-State: AOAM530L1PTX6rNOosoYpy6SOvrtaeUs/5oXyNkStwno4tdzrFy549yu zkbGPNDSHEOZmuPbwuIQ5bouzNJDPbPzwTtH6t2yOQ== X-Google-Smtp-Source: ABdhPJxx3ZDztKqNzqHbK8swSjAjJB9sByXdBgcXD2DXYericZjxBIJEiaTU2nMl34D2WmePMAA7B5gl3be3fmW0a3I= X-Received: by 2002:a2e:9759:: with SMTP id f25mr2556698ljj.304.1622121727613; Thu, 27 May 2021 06:22:07 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Thu, 27 May 2021 08:21:56 -0500 Message-ID: To: Nikita Popov Cc: PHP internals Content-Type: multipart/alternative; boundary="0000000000005b6e1a05c34fa5ab" Subject: Re: [PHP-DEV] Escape \0 in var_dump() output From: pollita@php.net (Sara Golemon) --0000000000005b6e1a05c34fa5ab Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, May 27, 2021 at 5:44 AM Nikita Popov wrote: > https://github.com/php/php-src/pull/7059 does a surgical change to replac= e > null bytes with \0. > > Before delving into any other observations, I first want to state emphatically that we should not ONLY escape chr(0). We either apply full on addcslashes() (or similar) or do nothing at all. Half-escaped is worse than not escaped. Sadly, this presents more BC issues than just chr(0) or '\\0' do alone since there are many more potential places where output will change for anyone using it programmatically. So should we? On the one hand: The intent of the var_dump() function is for human readable debugging. Anyone using this in a programmatic way is Doing It Wrong=E2=84=A2, so I'm not too fussed about changing the output of this fun= ction. On the other hand: WE use this function programmatically in our tests across many thousands of occurrences. Others probably rightly do as well. Yes, the dissonance is real. The most prudent and BC-safe thing would be to add another function `var_dump_escaped()` or even to prefer/advise json_encode() when content safety is relevant, but additional type information is not (most of the uses of var_dump we have). Unfortunately, this doesn't actually fix the initial problem you stated, which is just getting useful data out of CI failures. The PR you offered is the lightest touch and fixes the issue you cited without causing any likely damage to current users, but I don't think we should ignore the ambiguity of the output. Additionally, I think I could make an easy argument in favor of escaping CR and NL at the very least. At this point the urge to escape backslash is extra-real as with windows paths an innocent \n is far more likely. ------ Actually. Maybe we're thinking of this wrong. Rather than change the output at all, why not just have a post-process step in our test runner that transforms the output in the test report? Then we could be as aggressive as we want, going as far as escaping all non-printables plus backslash since at that point we're in it for the human readability (and knowing specific byte sequences is essential) and we break zero BC for anyone else? -Sara --0000000000005b6e1a05c34fa5ab--