Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:121925 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 31688 invoked from network); 5 Dec 2023 09:04:59 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 5 Dec 2023 09:04:59 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 5135B180031 for ; Tue, 5 Dec 2023 01:05:10 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from mail-oo1-f47.google.com (mail-oo1-f47.google.com [209.85.161.47]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 5 Dec 2023 01:05:09 -0800 (PST) Received: by mail-oo1-f47.google.com with SMTP id 006d021491bc7-58ceabd7cdeso3293599eaf.3 for ; Tue, 05 Dec 2023 01:04:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701767097; x=1702371897; darn=lists.php.net; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hTAPTkyt3By6jbNljAVTFtoT28CRmYEaIKzcUK5mLgs=; b=cMPqssFGAUOUhOHFCtXm4Y84idwVwYPFn0sbXi4Mo5EBxJg/Kcdd12ReGath1auoEm BbF6KHkZgLvIZvzleTA3EOJX8QiTCJ/5hA71QNKaRYZ1++ruloQ9XKLtUt0GnApeQNSm 2UEJ/HijlGb5AaXvs/XKzsdXJ/VKQT3D5XtN6kq3BnxoPg99loNWsBEnDhB0PjJC8mah JBYlz728Q7uj0m8AOwY7/YKTFD1ZbotVjfJ/H/WllsG2lgW/kIr3tI9AgRmOAnC6RIu1 J9qB6hivj52yde6Ipn3ylmPD2kPKVP4ZnV/OJTN1mbYozd6nPSVWExEYcHKWGgXpKssI u9TA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701767097; x=1702371897; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hTAPTkyt3By6jbNljAVTFtoT28CRmYEaIKzcUK5mLgs=; b=sPCprTWkqg+8Iid0KDGfxnFTWMepoL2s18Oa0rC5r8jbK7lJdgoZ+ZNWWPDzQU4oeg rSyBrCYEuN+c89zH17Y4gu+PnQ9Ryl/iUwvuL10830aZFpkoCzb+IHFaxko5DTguoUm/ SE3A12fGbRMHnt0V2e6diKKWSClYyxm0PrxEt/dm6TOCDaPEqpGD85vtIqYeOlKqX0gz GeZAArZR6eRd2rVBmj57aB1Z40GKuOfblF2d7eIxFwk5UobPGka2vu42ZQ6G2pAc5kzA /nQAvSc01NqiOiaEkc1f45uhdETqiVsbscF91hDLEHmLimfLO5j3Tm7xnQuwCPH1Szuf 7jTg== X-Gm-Message-State: AOJu0Yxnn23OYtIBj/U+ZQBahEMEowK3c2c7fpclvnV2nKgR+Cpqev7P +pTX1LITFViX2QkCTUL1aAPRXs+7po4lddP2nH4= X-Google-Smtp-Source: AGHT+IEQRb0cF2A6UbUpPuMieW2UEQt6VDpMagJXT/2GwavovMXqRfcircwwbECnJjm/kh7LA7UFeJoP5G4RYDbWiKc= X-Received: by 2002:a05:6870:3326:b0:1fa:f57b:dd21 with SMTP id x38-20020a056870332600b001faf57bdd21mr4928561oae.20.1701767097342; Tue, 05 Dec 2023 01:04:57 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: Date: Tue, 5 Dec 2023 10:04:46 +0100 Message-ID: To: Stefan Schiller Cc: Alex , "G. P. B." , internals@lists.php.net, youkidearitai@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Inconsistency mbstring functions From: landers.robert@gmail.com (Robert Landers) On Tue, Dec 5, 2023 at 9:43=E2=80=AFAM Stefan Schiller via internals wrote: > > On Mon, Dec 4, 2023 at 8:45=E2=80=AFPM Alex wro= te: > > > > Stefan, > > > >> > >> My biggest concern is that this quirk can cause security issues in > >> user code. I came across this in the first place when discovering an > >> exploitable security vulnerability in an application. From my point of > >> view, this is not only about inconsistent behavior but also violates > >> the documentation for specific functions like mb_strstr. I agree that > >> a lot of mbstring operations should not be used on invalid strings, > >> and an exception seems to be an appropriate answer despite the huge BC > >> impact. > > > > > > 1) I'm sure the vulnerable application is proprietary, but can you at l= east tell us which mbstring functions were directly involved in this vulner= ability, give an overview of the mechanism by which the exploit worked (scr= ubbed of any specific details about the application), and let us know the g= eneral nature of the resulting compromise? (i.e. denial-of-service, informa= tion disclosure, allowing users to impersonate other users, etc) > > > > Real-life examples of actual security impact provide far more compellin= g grounds for a BC break than hypothetical security impact. > > The case I am referring to involves mb_strpos, which is used to > determine the index of a specific character, and mb_substr, which is > used to extract the substring until the first occurrence of this > character. For this specific case, this results in a Cross-Site > Scripting (XSS) vulnerability, which would allow an attacker to > perform actions on behalf of an authenticated user. The actual code is > more complex, but the following lines should be a suitable > representation of the issue: > > $data =3D $_GET['data']; > $pos =3D mb_strpos($data, "<"); > $out =3D $pos ? mb_substr($data, 0, $pos) : $data; > echo $out; > > The developer's assumption here is that $out should never contain an > opening HTML tag, and it is thus safe to echo it back in the response. > Despite the fact that e.g. htmlspecialchars should be used to encode > the output, this assumption seems reasonable to me. However, due to > the inconsistent behavior, an attacker can break this assumption. If > $data, e.g., contains the sequence "\xf0AAA", str_pos will consider > the index of "<" to be 4. However, mb_substr assumes the full sequence > to have a length of 4 (1 x 4-byte Unicode character + 3 x 1-byte > Unicode characters). Accordingly, the full sequence, including the > "" tag, is reflected in the response. This feels like an exact case of where mb_* functions shouldn't be used. HTML spec very clearly states that only "<" (\x3C LESS-THAN SIGN) is considered for html opening tags. I don't think there is a reason to consider multibyte strings here.