Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:93652 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 58843 invoked from network); 31 May 2016 11:52:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 31 May 2016 11:52:14 -0000 X-Host-Fingerprint: 2.216.133.182 unknown Received: from [2.216.133.182] ([2.216.133.182:10155] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B2/A8-26200-CEA7D475 for ; Tue, 31 May 2016 07:52:12 -0400 Message-ID: To: internals@lists.php.net References: Date: Tue, 31 May 2016 12:52:08 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0 SeaMonkey/2.40 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Posted-By: 2.216.133.182 Subject: Re: PHP's handling of BOM (byte order mark) From: ajf@ajf.me (Andrea Faulds) Hi Sammy, Sammy Kaye Powers wrote: > If you create a php file with the following: > > header("X-foo: Bar"); > echo "Foo!".PHP_EOL; > > And save it as UTF-8 with BOM, interesting things happen depending on > the SAPI & configuration. > > If you run it from the CLI you get an error: > >> PHP Warning: Cannot modify header information - headers already sent by (output started at %s:1) in %s on line %d > > But it doesn't seem to return the BOM to std out (but I could be doing > this part wrong). If you run it from `php -S`, and load it in a > browser, the web server returns a code point \u{feff} as the first > code point of the response body. > > BOM's should not be treated as characters and should not be sent to > the output. Is there any reason this should be considered the expected > behavior? If not, I'd like to create an RFC to change it. :) I suspect that this part of the Zend Engine is much-neglected, but PHP actually can detect the BOM, and strip it from the output, if you have zend.multibyte turned on: https://github.com/php/php-src/blob/3b0a6dfeb2896fb204db48d11364c09942b1ad01/Zend/zend_language_scanner.l#L292 I haven't tried this myself, though. Thanks. -- Andrea Faulds https://ajf.me/