Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:26004 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 36351 invoked by uid 1010); 9 Oct 2006 17:04:08 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 36336 invoked from network); 9 Oct 2006 17:04:08 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Oct 2006 17:04:08 -0000 Authentication-Results: pb1.pair.com header.from=iliaal@gmail.com; sender-id=pass; domainkeys=good Authentication-Results: pb1.pair.com smtp.mail=iliaal@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 66.249.82.230 as permitted sender) DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: iliaal@gmail.com X-Host-Fingerprint: 66.249.82.230 wx-out-0506.google.com Linux 2.4/2.6 Received: from [66.249.82.230] ([66.249.82.230:33424] helo=wx-out-0506.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id DE/83-27792-4018A254 for ; Mon, 09 Oct 2006 13:04:06 -0400 Received: by wx-out-0506.google.com with SMTP id s18so1800254wxc for ; Mon, 09 Oct 2006 10:04:02 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:in-reply-to:references:mime-version:content-type:message-id:cc:content-transfer-encoding:from:subject:date:to:x-mailer:sender; b=Z3u+vZn0iwJColb+k8Xute8O3vIObC/EZAgvSYXwydQ4TkzijADvBa8dakfHsBUUYFfYnkYqa0LJlv7qQW7Q8gdesqth7+ppb/YlCPcvaWuYHd88l9LvB2Ln4vW74cBRygTb/XMolPmGb3uZ1E7ysZ/f+LaMvW2q4BLvvgXIVVc= Received: by 10.90.118.12 with SMTP id q12mr2871128agc; Mon, 09 Oct 2006 10:04:02 -0700 (PDT) Received: from ?192.168.1.6? ( [74.108.68.217]) by mx.google.com with ESMTP id a5sm3563574qbd.2006.10.09.10.04.01; Mon, 09 Oct 2006 10:04:02 -0700 (PDT) In-Reply-To: <452A78A1.2020309@php.net> References: <452A78A1.2020309@php.net> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-ID: <66253D2D-5BA8-48BA-9968-790084CBA457@prohost.org> Cc: kingwez@gmail.com, andrei@gravitonic.com, internals@lists.php.net Content-Transfer-Encoding: 7bit Date: Mon, 9 Oct 2006 13:04:00 -0400 To: Sara Golemon X-Mailer: Apple Mail (2.752.3) Sender: Ilia Alshanetsky Subject: Re: PDO/Unicode Migration Strategies From: ilia@prohost.org (Ilia Alshanetsky) On 9-Oct-06, at 12:28 PM, Sara Golemon wrote: > (C) Add a UConverter *encoding_conv; element to pdo_dbh and > pdo_stmt objects, and an INI setting: pdo.default_encoding. When > passing data to/from a stmt object, the statement objects encoder > is used if available (set during prepare), if not available the > driver's converter is used (set by factory), otherwise > pdo.default_encoding is used as a fallback. Data exchanges > between the dbh object are similarly handled though (obviously) > skipping the stmt step. > > Pros: Keeps character set conversion work out of the driver layer. > Reduces the amount of #ifdef work for multiple version > support. > Recognizes that some drivers (SQLITE) use a single encoding > universally, while others allow different tables to use different > encodings. > Cons: Doesn't solve the "do()" problem of encoding to different > charsets when inserting to tables of a driver which allows > different charsets per table. > Doesn't provide an indicator which says "This came from a > unicode string and was converter by ICU so is reliably in the > correct encoding" versus "This was handed to me by the user as a > binary string and may contain anything". Though this is also > "fixable" by either changing the handler proto or by burying a > state flag in the dbh/stmt objects. From what you propose I think option C is the most reasonable solution, but I'd like to offer a few revisions. PDO already has an API for setting attributes via setAttribute(), which can be set for a connection (default) and can be modified on a per-statement via the same method. Attributes can also be passed via a parameters, this lets the user decide what charsets to send to the database. In some cases there is a neat cheat that can be applied by setting connection charset to utf-8 or even utf-16 and let the database (assuming it does this) do up/down conversion of the data as needed. Ilia Alshanetsky