Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:93647 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 28502 invoked from network); 31 May 2016 02:18:35 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 31 May 2016 02:18:35 -0000 Authentication-Results: pb1.pair.com smtp.mail=smalyshev@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=smalyshev@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.161.173 as permitted sender) X-PHP-List-Original-Sender: smalyshev@gmail.com X-Host-Fingerprint: 209.85.161.173 mail-yw0-f173.google.com Received: from [209.85.161.173] ([209.85.161.173:34611] helo=mail-yw0-f173.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 96/35-26200-A74FC475 for ; Mon, 30 May 2016 22:18:35 -0400 Received: by mail-yw0-f173.google.com with SMTP id c127so175775027ywb.1 for ; Mon, 30 May 2016 19:18:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=fZc1JxVUutQMrigsv90Bem0yjkuvLJRB0JqsXV4F260=; b=UuHf4gS5Cw4vHNZ60R5iaaPGhLSoyVm3z40qTGF7l78nIfOAVoiu0RVCK5pHMh3Jrq vWP3gjlAz9jas4Ai6rXHClvDiklGoXJfvvlf0O8v12R4bz8kD09QjaFSIJBr6oCsyCCc ggWBhHb0RrZyUPYwKUc3KDYfoPLe5Fo10krpmB5j/2M99j5KtB3M8arWPZOYu7KKDKF7 sCWnZbSFGhDU4hJz2++EWM+/ef3KXYtiQXx1zmiqT88hC6lLTrWar14KXfH8eIpYDEPW ihks++486/1O1sqAEhwakGj2VaWYoC/9WVFuT81yWBiqLAQB/Gmq2gA8nRID50F7UQOK AxuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=fZc1JxVUutQMrigsv90Bem0yjkuvLJRB0JqsXV4F260=; b=A9BVaCIteZ226/5ZhWH0KvoJoFqFqnhzyWvmXAfyfFfB0WstQHgmMuiUks/jgpX34s ooOdNaCcBu+X1A12Vd0P/5uRJr4tH6e+7ryRbTZY62BG2Kvjz6JS1qDqKR5OL+0W6yai y/iWHZ+qbxwFj6Jqo1u5oW+khuc69vL40SPmLARnqi7U/kZlytkK3d/29B2jfG0eSEUI 9x/RrbrRdrQ5eWhigVmL3Pm7K7IDgOegHluTf81ikUUCU0F7cWm8ME97OdSg8oGocaDI l4gX7ZU+dtWdTzsWnVFk+mxGujQgoqT7uPZdilcfGUnr8KR3xYyt/zkOhMiI1sNdlECg Emtg== X-Gm-Message-State: ALyK8tKOh7HfLL8s3gu3LTyUGOSwysCQmLpgLfWiAUSxlR8zEdr/VNzGpACcV38OqfkF+A== X-Received: by 10.129.95.84 with SMTP id t81mr5672531ywb.162.1464661112119; Mon, 30 May 2016 19:18:32 -0700 (PDT) Received: from Stas-Air.local (76-220-46-95.lightspeed.sntcca.sbcglobal.net. [76.220.46.95]) by smtp.gmail.com with ESMTPSA id d71sm15215579ywb.50.2016.05.30.19.18.30 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 30 May 2016 19:18:31 -0700 (PDT) To: Sara Golemon References: Cc: Sammy Kaye Powers , PHP Internals Message-ID: <7b80be21-b397-c40c-8e80-ff4da4f97634@gmail.com> Date: Mon, 30 May 2016 19:18:29 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] PHP's handling of BOM (byte order mark) From: smalyshev@gmail.com (Stanislav Malyshev) Hi! > In fact, the idea of stripping content from a script file isn't > without precedent. Shebang lines are routinely removed from > cli/cgi/fpm, and if you want to properly output it, you need to do so True, because in the context of CLI we know what is expected - a CLI script which can start with #!. It is very unlikely that we'd have a template run directly as CLI script and we would have this template starting with #! which we want to output. But we lack such context in a generic script - namely, the context that would tell us if it's safe to drop the BOM. > So can we apply the same to the BOM? There's the obvious BC danger of > files which might depend on this behavior (declaring their encoding > via BOM, which happens to be the same as the script encoding). Given that BOM in script files is mostly useless, and BOM in UTF-8 is useless and not recommended for use either, I don't see why we need to. In general, I don't think BOM is a real issue worth messing with the lexer. Surely, from time to time somebody would use weird editor which produces BOMs, like editing PHP scripts in Word. Surely, they'd have weird effects that would force them to spend 5 minutes googling and fixing it. I don't think it is the reason to spend day-persons of our collective time to find a fix to this very niche problem and risk potential BC issues. If it is really becoming an issue, we could probably make the lexer treat BOM+ So how about declare statement? > > {U+FEFF} declare(strip_bom=true); That presumes you know there's BOM in the beginning of your file. If so, why don't you just delete it instead of typing a long declare directive? If you don't know it, you'd be forced to add it to every (non-template) file in your codebase - which sounds a bit excessive. -- Stas Malyshev smalyshev@gmail.com