Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:93644 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 23295 invoked from network); 31 May 2016 01:57:24 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 31 May 2016 01:57:24 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@golemon.com; spf=softfail; sender-id=softfail Authentication-Results: pb1.pair.com header.from=php@golemon.com; sender-id=softfail Received-SPF: softfail (pb1.pair.com: domain golemon.com does not designate 209.85.214.51 as permitted sender) X-PHP-List-Original-Sender: php@golemon.com X-Host-Fingerprint: 209.85.214.51 mail-it0-f51.google.com Received: from [209.85.214.51] ([209.85.214.51:35080] helo=mail-it0-f51.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 80/34-26200-28FEC475 for ; Mon, 30 May 2016 21:57:23 -0400 Received: by mail-it0-f51.google.com with SMTP id z189so43157212itg.0 for ; Mon, 30 May 2016 18:57:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=golemon-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=07krwgqGa2g0i68z5GqCubbX4urwqBv+3WLDEnIS30Y=; b=ZFaCHvdrFoavEwAiOnaH8Pe6WfoPUKAQDAsr+ninRauhUkeb2O+JzUDHhjkEGLZX5c E+Ua/L/74TiL+lyYSdgAmNVR1EDbF7g6sks6gtn8v2ezvwzdjk7v+dnms4lfjdyUZjUj Yq6BSmiWA+CmzPI0ICIydH04dYrXrlGOGT1JVJrLURi6mw9fSp708oycEFQJhC/iBVep lTEov1HgAViF6IudwNLSXWqwgB/Pi3rUQvXSEETdhBH/OlpFFycaDHZ+6kpkZM5zgExB tgN+uaGrP04tLcrtUJdKC/6l6FO0NS1OX/JdPWjikdhdy5ynrgDHqyH7Jw2lTBZ/OfWT AY0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=07krwgqGa2g0i68z5GqCubbX4urwqBv+3WLDEnIS30Y=; b=Ta/Zg8be7NnzYZbG61/k7ORnsW49dmgklHbIplyA2NKDfU+LPGIgag+jmvSo83h/qT XA0PT1eBTxOGOTnPI8OB8IW328OiSmiTDmP8Eky1qMA0UJNp7TFXiwD7WY0TuBPv7ILW Pkj+KnTXxjj0Dg8tDtaZAzy4ehhDLOvydtAaYEnfloj64gBTKRAtHBAsDXrfjHs9sqQg 7bwMZ5HG04pkjMiBEbbOIpNfMyS4NEYAUe69dCv3twuM1YOfmzFy/eFTHg6/ZDboNgqN ET3wgVWjNqDqPSsefWHsavxyIZXj9EGCL6CiCEa23Lx5mN1czpVarIKyIYjBHIp78MKa q8cg== X-Gm-Message-State: ALyK8tK6DPpiLGY3ulShuGdZ21uVIlunVCe1calR11L+018PTnZGcZT4q+Tzs37mKzQSX7Dy62fW/4rxxKCSXA== MIME-Version: 1.0 X-Received: by 10.36.48.138 with SMTP id q132mr10734234itq.82.1464659839986; Mon, 30 May 2016 18:57:19 -0700 (PDT) Sender: php@golemon.com Received: by 10.36.9.67 with HTTP; Mon, 30 May 2016 18:57:19 -0700 (PDT) X-Originating-IP: [107.198.91.68] In-Reply-To: References: Date: Mon, 30 May 2016 18:57:19 -0700 X-Google-Sender-Auth: oRksUrg3JqGv9Rn4KRb_mNi14-c Message-ID: To: Stanislav Malyshev Cc: Sammy Kaye Powers , PHP Internals Content-Type: text/plain; charset=UTF-8 Subject: Re: [PHP-DEV] PHP's handling of BOM (byte order mark) From: pollita@php.net (Sara Golemon) On Mon, May 30, 2016 at 5:40 PM, Stanislav Malyshev wrote: >> BOM's should not be treated as characters and should not be sent to >> the output. Is there any reason this should be considered the expected >> behavior? > > The reason would be PHP does not know where surrounding output ends and > the code starts, beyond file before and so will happen with BOM too. Particular sequence of bytes being BOM > and whether it is desired or not depends on context, but PHP engine does > not have this context. Remember that pure HTML page is also a valid PHP > file. > I'm with Sammy on the principle that being able to have a BOM in a given file is important to any non-ascii code development. Though we can argue whether that's good or even necessary, I honestly don't know how prevalent non-english coding is among PHP developers. In fact, the idea of stripping content from a script file isn't without precedent. Shebang lines are routinely removed from cli/cgi/fpm, and if you want to properly output it, you need to do so in a coded echo statement. (The stripping only applies to a literal, non-scripting line in the file, not dynamic output). So can we apply the same to the BOM? There's the obvious BC danger of files which might depend on this behavior (declaring their encoding via BOM, which happens to be the same as the script encoding). So how about declare statement? {U+FEFF}