Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121924 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 28214 invoked from network); 5 Dec 2023 08:43:40 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 5 Dec 2023 08:43:40 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 7FE88180052 for ; Tue, 5 Dec 2023 00:43:48 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,T_SPF_TEMPERROR autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-pg1-f179.google.com (mail-pg1-f179.google.com [209.85.215.179]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 5 Dec 2023 00:43:47 -0800 (PST) Received: by mail-pg1-f179.google.com with SMTP id 41be03b00d2f7-5c2066accc5so2160648a12.3 for ; Tue, 05 Dec 2023 00:43:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sonarsource.com; s=google; t=1701765815; x=1702370615; darn=lists.php.net; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Befw72x/z9LJlMDk9EuVT1gaeJ5zvoK9Pa7FCoXZNDs=; b=AebFm7u0kW7CDNe/xEFc4YxleF1WytQmon+ywQqcAG22+L/8Bri0WabXv0W04GSXAO JFZyGwsjW7ym3ihO7SdWFppl/5x+xXWGyrdVhY8EbPBe9EN7HD9t+g5jqnBMMucKy5ly pOi/xRuzcHqlJwBGWifpUc813P8lzR7Dja2COVf7oIlWAcPE+WRVPf2FhpY9C670C9/h dDIK611BRcv7/1Vh5ccoJfgj5mny0bRWoPALVJya2D/DWb4nGPrvEPcxNwb65pDICWNf Ucp/08Bg5lviOunsFHjTsic7HjApno8C9avVNVU3nJrSThdwAxj+X5XyFLctGOog8IoA 1QNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701765815; x=1702370615; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Befw72x/z9LJlMDk9EuVT1gaeJ5zvoK9Pa7FCoXZNDs=; b=UKH8bBfANRKH9sNR001qaOFJimfMpEVdbOH1xTsY01A8lgehldPZ/pLAmyjikUoxFh PI7oJ27C20A00PjvYQVEZ1Iro3daWeaQ3kTffZ1rCPJboQPQyEjT5ehFzgJJIoW5Dpum JIA3ysvoQQP4p8mZ+qf4YLXGORGvNCCkGIq6Y1DhVA8Y7AsmAO2lMOYtSq1WEI47bX1y 2tD/ycRzZBvsJCrE54iO7pTP4c9OsTD9r1gYsaE0U0evyrpDq/eLZ0nYoMgtb8q+ifBS INrDupJGZ18SXjshw1I0HtOJfQuhDKCMG7MrC9mjrc6PgptjKZPSejnmbUC0jWD1YWya F7rA== X-Gm-Message-State: AOJu0YziYtMFhdcoqIqQTgrnUa7Ijl1liTnc1wFRcywtBnzJiju3Wa5O Sf/4/Xa0dtIUmoJhncwAKw0N0pvq1vrPiLRtU7LKbw== X-Google-Smtp-Source: AGHT+IGbce29fynjdwXsrVb9MrsYhw/cZQS52F55LOQv3Jpb8WehnQ/o+Cxk3IHd3mCBUCfVlY+4V/7/Do/VqIE0Utg= X-Received: by 2002:a05:6a20:2586:b0:18f:97c:4f3c with SMTP id k6-20020a056a20258600b0018f097c4f3cmr3359388pzd.72.1701765814867; Tue, 05 Dec 2023 00:43:34 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Reply-To: Stefan Schiller Date: Tue, 5 Dec 2023 09:43:23 +0100 Message-ID: To: Alex Cc: "G. P. B." , internals@lists.php.net, youkidearitai@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Inconsistency mbstring functions From: internals@lists.php.net ("Stefan Schiller via internals") On Mon, Dec 4, 2023 at 8:45=E2=80=AFPM Alex wrote= : > > Stefan, > >> >> My biggest concern is that this quirk can cause security issues in >> user code. I came across this in the first place when discovering an >> exploitable security vulnerability in an application. From my point of >> view, this is not only about inconsistent behavior but also violates >> the documentation for specific functions like mb_strstr. I agree that >> a lot of mbstring operations should not be used on invalid strings, >> and an exception seems to be an appropriate answer despite the huge BC >> impact. > > > 1) I'm sure the vulnerable application is proprietary, but can you at lea= st tell us which mbstring functions were directly involved in this vulnerab= ility, give an overview of the mechanism by which the exploit worked (scrub= bed of any specific details about the application), and let us know the gen= eral nature of the resulting compromise? (i.e. denial-of-service, informati= on disclosure, allowing users to impersonate other users, etc) > > Real-life examples of actual security impact provide far more compelling = grounds for a BC break than hypothetical security impact. The case I am referring to involves mb_strpos, which is used to determine the index of a specific character, and mb_substr, which is used to extract the substring until the first occurrence of this character. For this specific case, this results in a Cross-Site Scripting (XSS) vulnerability, which would allow an attacker to perform actions on behalf of an authenticated user. The actual code is more complex, but the following lines should be a suitable representation of the issue: $data =3D $_GET['data']; $pos =3D mb_strpos($data, "<"); $out =3D $pos ? mb_substr($data, 0, $pos) : $data; echo $out; The developer's assumption here is that $out should never contain an opening HTML tag, and it is thus safe to echo it back in the response. Despite the fact that e.g. htmlspecialchars should be used to encode the output, this assumption seems reasonable to me. However, due to the inconsistent behavior, an attacker can break this assumption. If $data, e.g., contains the sequence "\xf0AAA", str_pos will consider the index of "<" to be 4. However, mb_substr assumes the full sequence to have a length of 4 (1 x 4-byte Unicode character + 3 x 1-byte Unicode characters). Accordingly, the full sequence, including the "" tag, is reflected in the response. > > 2) Your point about documentation is appreciated. However, you should con= sider the context. We are dealing with a 20+-year old library, with a large= installed base, which has (historically) always had inconsistent, poorly d= efined semantics in edge cases, and which (historically) has always had poo= r documentation. Don't forget Hyrum's Law, Stefan. Never forget Hyrum's Law= . > > In view of all this, we are usually much more likely to amend the mbstrin= g documentation to match behavior than to modify behavior to match document= ation. Now, that's not to say we can never modify mbstring behavior to matc= h documentation (I have done it myself several times), but there should alw= ays be good reasons to prefer the new behavior (not just "because that's wh= at the documentation says"). I can fully understand your concerns, especially taking into account the long-standing history of the library and Hyrum's Law. While the above-mentioned example might even be arguable since two different functions are involved, the behavior of, e.g., mb_strstr just feels wrong to me: $data =3D "\xf0start"; var_dump(mb_strstr($data, "start")); // string(2) "rt" > > 3) Modifying mbstring behavior to make the handling of invalid strings mo= re consistent is a smaller BC break than raising an exception. However, com= pletely refusing to operate on invalid strings (by raising an exception) wi= ll be more effective in guarding against potential security vulnerabilities= . Considering BC, completely refusing to operate on invalid strings seems like a huge break. From a preventing-security-issues-in-user-code point of view, either of these solutions seems appropriate. > > 4) Hamada-san is absolutely correct that all mbstring users should check = strings derived from user input using mb_check_encoding() before operating = on them. Unfortunately, "should" doesn't count for much. Most users will no= t do what they "should". > > 5) Amending the documentation to call out these dangers more clearly woul= d be a positive step, though probably not enough. > In general, I like the idea of warning against pitfalls like this in the documentation, although I estimate the effect, especially on existing applications to be low. Best Regards, Stefan Schiller