Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:93648 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 33844 invoked from network); 31 May 2016 03:52:49 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 31 May 2016 03:52:49 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@golemon.com; spf=softfail; sender-id=softfail Authentication-Results: pb1.pair.com header.from=php@golemon.com; sender-id=softfail Received-SPF: softfail (pb1.pair.com: domain golemon.com does not designate 209.85.223.180 as permitted sender) X-PHP-List-Original-Sender: php@golemon.com X-Host-Fingerprint: 209.85.223.180 mail-io0-f180.google.com Received: from [209.85.223.180] ([209.85.223.180:35299] helo=mail-io0-f180.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 20/06-26200-F8A0D475 for ; Mon, 30 May 2016 23:52:49 -0400 Received: by mail-io0-f180.google.com with SMTP id p64so79021058ioi.2 for ; Mon, 30 May 2016 20:52:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=golemon-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=1URVWYBBsSLsr+MF/1nB+3a5BgY9033D9hfjW/2OhR0=; b=nOWk72qhRbopHAWnWL+cLNgbcDz6mkns3BCmO3uWH98ybxNJvseURues19RO4kQH1I Y3EIX7JWaOQLOMHstoFNyRnWDBisOrYJMT/4eD7B/N23CT/S7os33WLqJKBj5uV1pjUQ v0bhWiVQH3mIImlitYDkJTkz+lajLIckANPzro/VW13Z4FXG+PBoU6B5Gt1EC+Y43Fn7 5kB1Q+4aBjUWdKFXhDPyU+W7bHcgdzOnMzn5V964K/YXKEofKhW3Zt1hPmiou9tVuBFk KGo1d0uZoodaRNZ0uv5vPv/sGiEMuM58bmbICDhR4AFCj1aK44vOrD125BrEWlHkEiT0 axHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=1URVWYBBsSLsr+MF/1nB+3a5BgY9033D9hfjW/2OhR0=; b=M2+bsY29VD6QccueRrCvIXDZqu0SKpEgJ91WmmQEpMJnBdQE0WNxGVvJDZqRVuNhGs KRgH9duNKj1FS/IGaNHqD/cXtyu5V6ee7mW5d+LbtQFxNO7BF5dPeE/hDBfYab6bNYIi 38/JzurOOLbZ7U5k1p7KDvbCgYQ5i/HNHCzLE+YKPI8ZBcz4lDTZ+8+M8NJFytkARQBV 6K6QF6ehL95UH2JLA3WIN0QEUfITRpgVrIcnjAWQyBekT0jjLvgsYDZPsYTJJwSeUmge cs12utHaVKPBHbKswba9gMyGCu26VIE0MyrO8bSUGq7rFk6S7RxUrDHIGPY4fhlJywVk PyRw== X-Gm-Message-State: ALyK8tKX9RJYkSpysyko3O+MGbeS5ILgjmEqCAkOrZW7dzSx6UKFPktkWfNPXo72D8WJD6m3N6hoq7LyfVx0TA== MIME-Version: 1.0 X-Received: by 10.107.162.82 with SMTP id l79mr24659111ioe.193.1464666765145; Mon, 30 May 2016 20:52:45 -0700 (PDT) Sender: php@golemon.com Received: by 10.36.9.67 with HTTP; Mon, 30 May 2016 20:52:45 -0700 (PDT) X-Originating-IP: [107.198.91.68] In-Reply-To: <7b80be21-b397-c40c-8e80-ff4da4f97634@gmail.com> References: <7b80be21-b397-c40c-8e80-ff4da4f97634@gmail.com> Date: Mon, 30 May 2016 20:52:45 -0700 X-Google-Sender-Auth: QDSUuyIPtBph4kF7HTj8RJawQ48 Message-ID: To: Stanislav Malyshev Cc: Sammy Kaye Powers , PHP Internals Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] PHP's handling of BOM (byte order mark) From: pollita@php.net (Sara Golemon) On Mon, May 30, 2016 at 7:18 PM, Stanislav Malyshev wrote: >> In fact, the idea of stripping content from a script file isn't >> without precedent. Shebang lines are routinely removed from >> cli/cgi/fpm, and if you want to properly output it, you need to do so > > True, because in the context of CLI we know what is expected - a CLI > script which can start with #!. It is very unlikely that we'd have a > template run directly as CLI script and we would have this template > starting with #! which we want to output. But we lack such context in a > generic script - namely, the context that would tell us if it's safe to > drop the BOM. > That was the idea of the declare(), to provide that context, since it can't be reliably inferred. >> So can we apply the same to the BOM? There's the obvious BC danger of >> files which might depend on this behavior (declaring their encoding >> via BOM, which happens to be the same as the script encoding). > > Given that BOM in script files is mostly useless, and BOM in UTF-8 is > useless and not recommended for use either, I don't see why we need to. > > In general, I don't think BOM is a real issue worth messing with the > lexer. Surely, from time to time somebody would use weird editor which > produces BOMs, like editing PHP scripts in Word. Surely, they'd have > weird effects that would force them to spend 5 minutes googling and > fixing it. I don't think it is the reason to spend day-persons of our > collective time to find a fix to this very niche problem and risk > potential BC issues. > Agreed it's niche, and agreed that it's mostly the editor's fault for putting the BOM in place to begin with. Disagree on the value of the time that would be needed to provide some sort of benefit. I will say though, that you're almost certainly right that it's not a significant problem (if it's one at all), and I'd want to hear from people who encounter this on a regular basis for which there isn't a much simpler fix available (such as disabling BOM emission in their editor of choice). > If it is really becoming an issue, we could probably make the lexer > treat BOM+ enough issue. > That's probably a reasonable compromise on the context issue. It provides a clean escape hatch for intentional BOMs by echoing those bytes from script, even if it is magic behavior which is generally to be avoided. > That presumes you know there's BOM in the beginning of your file. If so, > why don't you just delete it instead of typing a long declare directive? > Dunno. I just like to argue. -Sara