Newsgroups: php.doc,php.internals Path: news.php.net Xref: news.php.net php.doc:969384883 php.internals:73308 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 26329 invoked from network); 19 Mar 2014 23:09:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 19 Mar 2014 23:09:01 -0000 Authentication-Results: pb1.pair.com header.from=adam@adamharvey.name; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=adam@adamharvey.name; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain adamharvey.name designates 209.85.223.174 as permitted sender) X-PHP-List-Original-Sender: adam@adamharvey.name X-Host-Fingerprint: 209.85.223.174 mail-ie0-f174.google.com Received: from [209.85.223.174] ([209.85.223.174:42332] helo=mail-ie0-f174.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D8/23-05195-B832A235 for ; Wed, 19 Mar 2014 18:08:59 -0500 Received: by mail-ie0-f174.google.com with SMTP id rp18so41997iec.5 for ; Wed, 19 Mar 2014 16:08:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=adamharvey.name; s=google; h=mime-version:sender:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=zGA5JsreBdKhfnbYWw1wjEXJKy5CG4VORFQWK5NBgoY=; b=iH9PWMEVHmmIdS6eBFWlIrJ+UsjKXo9YyPVj6s0E6/FCsPF/p40MEX3gBFgAzQlKIR f0yu+EpqrV/boN8Zle1Im/poZl5Gw0AtSqkBgSQaUZRtI4YxKZPTOreE7hsN/P9TGigz TFNFWqSye5b8/eSmGCvlV918YVallOucuHm4s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to:content-type:content-transfer-encoding; bh=zGA5JsreBdKhfnbYWw1wjEXJKy5CG4VORFQWK5NBgoY=; b=l9Hmwt4p8+dm9QE09DKjnFW6mY1VQHGlK+TcADZkxKPoDWsWgW8SABIoJVQIf13aCj oaatVUFKzxcGDu/YhBt+CShztuh+5l7Pv5Aw9x8v1b0CMgqn1rJBouksFkD4E538gGXz hrP4a9gZvZVyqOAVtuKnIPhF7m2U7Ug7jat241zEF82cfY73BLBMIe+y5qnP8gqu2d1i IiKZVu/KrbOKYbpmZSf+yRlgKUpqpisy3OAfORdP9tgBf1IDuaLoOuZ1jt4wqLLJbEtP 6idgnN4/NT5r3xf/xF1PFglYeXvp4074kaN/8Kv3dE5nROBWu5c9v/J2IulIE1PiiKoQ Chvg== X-Gm-Message-State: ALoCoQmZrZMQXe+9MiTRwK8Yx0vbYN3pfMLZC7hCwN//2Lui1ga75dAUhHwKSbbLag7HK9uX0s9C X-Received: by 10.42.114.82 with SMTP id f18mr9582719icq.56.1395270536058; Wed, 19 Mar 2014 16:08:56 -0700 (PDT) MIME-Version: 1.0 Sender: adam@adamharvey.name Received: by 10.42.206.208 with HTTP; Wed, 19 Mar 2014 16:08:35 -0700 (PDT) Date: Wed, 19 Mar 2014 16:08:35 -0700 X-Google-Sender-Auth: 4tXkv5_r4vcgaEQGm2sAlq4ZN3A Message-ID: To: Yasuo Ohgaki , PHP internals , phpdoc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: default_charset and friends From: aharvey@php.net (Adam Harvey) Yasuo, internals, doc-friends, I'm working on the 5.6 migration guide, and I'm a little confused about what state the default encoding RFC actually ended up in after it was accepted and merged. UPGRADING says: > Changes were made to character set handling in: > - the iconv and mbstring extensions, > - and htmlentities(), htmlspecialchars(), html_entity_decode() functions > > The precedence for these is now: > > default_charset < internal/input/output_encoding < (mbstring.* || iconv.= *) < function parameter > > For example, the easiest way to use the UTF-8 encoding is to set > default_charset=3DUTF-8 and leave the following php.ini parameters The way this reads to me, assuming I don't have any other encoding settings set, I should be able to set default_encoding to (for example) "cp1252" and get Windows-1252 handling as the default in htmlentities(), htmlspecialchars() and html_entity_decode() if I don't specify the encoding parameter. In practice, though, that doesn't seem to be the case. I created this script: And ran it using the -n option, so that anything I had set in my php.ini would be ignored. My expected output would have been for the two htmlentities() and html_entity_decode() calls to return the same strings (ignore the unknown character glyphs; my terminal is UTF-8, so I expect Windows-1252 output to be broken): string(7) "£" string(7) "£" string(1) "=EF=BF=BD" string(1) "=EF=BF=BD" But instead, I got this output, suggesting that the calls without explicit charset parameters were treated as UTF-8: string(0) "" string(7) "£" string(2) "=C2=A3" string(1) "=EF=BF=BD" What am I missing here? Do htmlentities(), htmlspecialchars() and html_entity_decode() actually respect default_charset? (Also, if I set internal_encoding, input_encoding and output_encoding, which one should get used for each?) Thanks, Adam