Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:62081 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 61289 invoked from network); 7 Aug 2012 21:35:25 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 7 Aug 2012 21:35:25 -0000 Authentication-Results: pb1.pair.com smtp.mail=tyra3l@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=tyra3l@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.160.42 as permitted sender) X-PHP-List-Original-Sender: tyra3l@gmail.com X-Host-Fingerprint: 209.85.160.42 mail-pb0-f42.google.com Received: from [209.85.160.42] ([209.85.160.42:54482] helo=mail-pb0-f42.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 31/15-25471-A1A81205 for ; Tue, 07 Aug 2012 17:35:23 -0400 Received: by pbbrp8 with SMTP id rp8so318395pbb.29 for ; Tue, 07 Aug 2012 14:35:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LiBzZoC0Uj5BBODjWOQ0d0Fg39W37pxgAifqgZeSUqg=; b=JX1R7R9zsiQ4z4KFLwaPSM+DacLCNyrb/5hDcI1cm7cv1oWjx4NA2n/LHgnFYkdfYT zsVRx/5NrnpwRdP58/4jnoSotWiMG018nBLMI2mBsqt+3795IHJJlXJABg5bcRibX6Z4 /s0fqsLkaR4LtC2vDBdlS/QKdU8JonyMtOV91OZSUxPqwS4cr5nG9ukV8Pdo1K03HlBo aCfTe6Y1HTbA1T+UkACe0n3bf1Zi9czOKqgKjSmWNr2JejcSJwt3U2eQGfgIlKVYPyZk V7EOT10rBeepkPkUYjheV7vW2bcjtVOw43A/GKxNlZPdX+XO4UfdPvN+oJ2S4zxJ9hrM frLQ== MIME-Version: 1.0 Received: by 10.68.192.40 with SMTP id hd8mr31375238pbc.125.1344375319781; Tue, 07 Aug 2012 14:35:19 -0700 (PDT) Received: by 10.68.28.41 with HTTP; Tue, 7 Aug 2012 14:35:19 -0700 (PDT) In-Reply-To: References: Date: Tue, 7 Aug 2012 23:35:19 +0200 Message-ID: To: Tjerk Meesters Cc: PHP Internals Content-Type: multipart/alternative; boundary=e89a8ffba8a16161e304c6b3c5cf Subject: Re: [PHP-DEV] domdocument loadhtml and encoding From: tyra3l@gmail.com (Ferenc Kovacs) --e89a8ffba8a16161e304c6b3c5cf Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, Jun 1, 2012 at 5:57 PM, Tjerk Meesters wrote= : > Gentlemen, > > Regarding this bug report: https://bugs.php.net/bug.php?id=3D49705 > > As more developers move away from using regular expressions to parse > HTML and start using DOMDocument, I've noticed that quite a few > stumble over encoding "issues". They're not bugs, because it's > documented (I think) that if a document is loaded using > ::loadHTMLFile() or if it contains a "content-type" meta tag which > specifies the character encoding it will work as expected. > > So far I've suggested a hack that involves adding the meta-tag in > front of the string that contains the HTML. As horrible as it seems, > that does the job! > > That said, I'm hoping to get enough internals support to add a > parameter to ::loadHTML() that set / overrides the default character > set when processing the document; when given, any tags > pertaining to character set encoding should be ignored (AFAIK that's > also the browser's behavior). > > Btw, there's another patch that also introduces a new parameter to > ::parseHTML() which has gone into 5.4 branch > (https://bugs.php.net/bug.php?id=3D54037), so it looks like this would > be the second (optional) parameter then. > > Thoughts? > > would be nice. bump. --=20 Ferenc Kov=C3=A1cs @Tyr43l - http://tyrael.hu --e89a8ffba8a16161e304c6b3c5cf--