Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:125912 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id 121841A00BD for ; Tue, 5 Nov 2024 17:29:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1730827918; bh=laRvZA3OfIjrx841DhPQE388c2v4Znxb7yjNKEzbKfk=; h=Date:From:To:In-Reply-To:References:Subject:From; b=avzWqGiV3HrUHGiOH3TwfTQDHOysP8ht4FmJ1Hlt831B0/MTEiomVUqncuEB7gZi6 DXSo5951JRrd54WlDG2iMJwW5aY+4EtRDjacPAY3ClyBJtEfv9fV2v00a0ZQZE+5MQ +j8s/R//hgO5axaApqcLZKvWTZtBiClxPcbDKbn8gfC4q5PefhXImFiryWh6R4Ge0E NwKq27aZ4v1FC7r7QJCXt6ukZWdfomZMsJ3UumfBhqRez+akSFGuWKWe/51sFkO5Yi xtKEcfFxLg1U9DgOYte8p6ikaEUJleo7rrIT/VHd61tcgSpQIX5nta0U0n1bMbEvVx l/rDOR2ZIuCVA== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 6925B180071 for ; Tue, 5 Nov 2024 17:31:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_MISSING,RCVD_IN_DNSWL_LOW, SPF_HELO_PASS,SPF_NONE,URIBL_SBL_A autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: No X-Envelope-From: Received: from fout-b8-smtp.messagingengine.com (fout-b8-smtp.messagingengine.com [202.12.124.151]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 5 Nov 2024 17:31:56 +0000 (UTC) Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfout.stl.internal (Postfix) with ESMTP id 7873F114014C for ; Tue, 5 Nov 2024 12:29:24 -0500 (EST) Received: from phl-imap-06 ([10.202.2.83]) by phl-compute-01.internal (MEProxy); Tue, 05 Nov 2024 12:29:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= garfieldtech.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to; s=fm3; t=1730827764; x=1730914164; bh=LYandvEsLZHzve/BBi5J0 33RMiW9thufoO5OaxDdYOU=; b=H2M4ruuLYm1D9rhh08jjSvGd2glWG8/OzWFO9 rq3ypTTNfCz9npUAHeSE/6QSZkCNBJADQuzRxwFcq6evEeF8qCuuYRzFN6E9q56j E+aYCez1IHVaOAmiOhuP2dzSUPqXJBfR4ARzVRLFi2Lz2vOni7xQ03+iFy3U2m8V nKjnwlR0G2fVy/vibu2OfDgRiQpM4vUIjOjCevNXPJuU9PSJ8SpIAL7rY6QRpHhY UF5O/pdd5GVfUlzM+uVQ5SvlckZW7b2vh3wFy3hg24p2fShJ3SX91M4ZL92smGhA FXBXKIbPOZ801cO+bQwVTD27jdWS84l2zuDwvK4KIyfdJ+7Eg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; t=1730827764; x=1730914164; bh=L YandvEsLZHzve/BBi5J033RMiW9thufoO5OaxDdYOU=; b=EfiNGRDU4nmZyylvL 9K6k0P/bZt/d3ZWWsOlIrd68SuAz8IkRVz4q81g9wVdaMzCtbBBVbx7faxx71kV5 hjzXQXnLK42SlrgssoUoJ3Z13qLBXsVW52AoIIUULR5tTxJn9hL17dHyh5J5lz5k GLWTLzbIMW6DxYZOVBmm5XSPNE6DvQNjJ/+T+iIfqJvYGJOBVx0leoOP022GLV1a 6j+xwraoPpoWyZ5o8pHAdSpw8sOJoZz8lFbqSmL3QBsfdhFXBA9QjYwt5+SGvmD1 OH0cskz5AubeZt04KrvuaLoJPE+WWsTbvIGF4x0j2ac4enRYuZqmeWF5uciqsahG AKGqQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefuddrtddtgdejvdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdpuffr tefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnth hsucdlqddutddtmdenucfjughrpefoggffhffvkfgjfhfutgfgsehtqhertdertdejnecu hfhrohhmpedfnfgrrhhrhicuifgrrhhfihgvlhgufdcuoehlrghrrhihsehgrghrfhhivg hlughtvggthhdrtghomheqnecuggftrfgrthhtvghrnhepveevhfffudeitdfhjeffudel keegkefgvefhffelledvhfettdduuefhhffgueeinecuffhomhgrihhnpehphhhprdhnvg htpdgsohhokhhsthgrtghkrghpphdrtghomhdpghhithhhuhgsrdgtohhmpdhphihthhho nhdrohhrghdpghhordguvghvpdhruhhsthdqlhgrnhhgrdhorhhgpdhnohguvghjshdroh hrghdpthihphgvshgtrhhiphhtlhgrnhhgrdhorhhgpdhkohhtlhhinhhlrghnghdrohhr ghdpshifihhfthdrohhrghdprhhusgihqdhlrghnghdrohhrghenucevlhhushhtvghruf hiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehlrghrrhihsehgrghrfhhivghl ughtvggthhdrtghomhdpnhgspghrtghpthhtohepuddpmhhouggvpehsmhhtphhouhhtpd hrtghpthhtohepihhnthgvrhhnrghlsheslhhishhtshdrphhhphdrnhgvth X-ME-Proxy: Feedback-ID: i8414410d:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id EAAD629C006F; Tue, 5 Nov 2024 12:29:23 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net x-ms-reactions: disallow MIME-Version: 1.0 Date: Tue, 05 Nov 2024 11:29:02 -0600 To: "php internals" Message-ID: In-Reply-To: References: <55320aad-758a-4d06-b1bd-3eac2b5a5f71@app.fastmail.com> Subject: Re: [PHP-DEV] [RFC] PHP.net analytics Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: larry@garfieldtech.com ("Larry Garfield") On Fri, Nov 1, 2024, at 6:10 PM, Bob Weinand wrote: > On 1.11.2024 22:41:29, Larry Garfield wrote: >> In a similar vein to approving the use of software, Roman Pronskiy as= ked for my help putting together an RFC on collecting analytics for PHP.= net. >> >> https://wiki.php.net/rfc/phpnet-analytics >> >> Of particular note: >> >> * This is self-hosted, first-party only. No third parties get data, = so no third parties can do evil things with it. >> * There is no plan to collect any PII. >> * The goal is to figure how how to most efficiently spend Foundation = money improving php.net, something that is sorely needed. >> >> Ideally we'd have this in place by the 8.4 release or shortly thereaf= ter, though I realize that's a little tight on the timeline. > > Hey Larry, > > I have a couple concerns and questions: > > Is there a way to track analytics with only transient data? As in, dat= a=20 > actually stored is always already anonymized enough that it would be=20 > unproblematic to share it with everyone? > Or possibly, is there a retention period for the raw data after which=20 > only anonymized data remains? The plan is to configure Matomo to not collect anything non-anonymous to= begin with, to the extent possible. We're absolutely not talking about= user-stalking like ad companies do, or anything even remotely close to = that. I'm not convinced that publishing raw, even anonymized data, is valuable= or responsible. I don't know of any other sites off hand that publish = their raw analytics, and I don't know what purpose that would serve othe= r than just a principled "radical transparency" stance, which I generall= y don't agree with. However, having an automated aggregate dashboard similar to https://anal= ytics.bookstackapp.com/bookstackapp.com (made by a different tool, but s= ame idea) that we could make public is the goal, but we don't want to do= that until it's been running a while and we're sure that nothing person= ally identifiable could leak through that way. > Do you actually have a plan what to use that data for? The RFC mostly=20 > talks about "high traffic". But does that mean anything? I do look at = a=20 > documentation page, because I need to look something specific up (what=20 > was the order of arguments of strpos again?). I may only look shortly = at=20 > it. Maybe even often. But it has absolutely zero signal on whether the=20 > documentation page is good enough. In that case I don't look at the=20 > comments either. Comments are something you rarely look at, mostly the=20 > first time you want to even use a function. Right now, the key problem is that there's a lot of "we don't know what = we don't know." We want to improve the site and docs, the Foundation wa= nts to spend money on doing so, but other than "fill in the empty pages"= we have no definition of "improve" to work from. The intent is that be= tter data will give us a better sense of what "improve" even means. =20 It would also be useful for marketing campaigns, even on-site. Eg, if w= e spend the time to write a "How to convince your boss to use PHP" page.= .. how useful is it? From logs, all we could get is page count. That's= it. Or the PHP-new-release landing page that we've put up for the last= several releases. Do people actually get value of that? Do they bothe= r to scroll down through each section or do they just look at the first = one or two and leave, meaning the time we spent on any other items is wa= sted? Right now, we have no idea if the time spent on those is even use= ful. =20 Another example off the top of my head: Right now, the enum documentatio= n is spread across a dozen sub-pages. I don't know why I did that exact= ly in the first place rather than one huge page, other than "huge pages = bad." But are they bad? Would it be better to combine enums back into = fewer pages, or to split the visibility long-page up into smaller ones? = I have no idea. We need data to answer that. It's also absolutely true that analytics are not the end of data collect= ion. User surveys, usability tests, etc. are also highly valuable, and = can get you a different kind of data. We should likely do those at some= point, but that doesn't make automated analytics not useful. Another concern with just using raw logs is that it would be more work t= o setup, and have more moving parts to break. Let's be honest, PHP has = an absolutely terrible track record when it comes to keeping our moving = parts working, and the Infra Team right now is tiny. The bus factor the= re is a concern. Using a client-side tracker is the more-supported and = fewer-custom-scripts approach, which makes it easier for someone new to = pick it up when needed. Logs also will fold anyone behind a NAT together into a single IP, and t= hus "user." IP address is in general a pretty poor way of uniquely iden= tifying people with the number of translation layers on the Internet the= se days. > Overall I feel like the signal we can get from using a JS tracker=20 > specifically is comparatively low to the point it's not actually worth= it. Some more things a client-side tracker could do that logs cannot: * How many people are accessing the site from a desktop vs mobile? * What speed connection do people have? * How many people are using the in-browser Wasm code runner that is curr= ently being worked on? cf: https://github.com/php/web-php/pull/1097 Also, for reference, most language sites do have some kind of analytics,= usually Google: https://www.python.org =E2=80=93Plausible.io, Google analytics https://go.dev/ =E2=80=94 Google Analytics https://www.rust-lang.org/ =E2=80=93N/A https://nodejs.org/ =E2=80=93 Google Analytics https://www.typescriptlang.org/ =E2=80=93 N/A https://kotlinlang.org/ =E2=80=93 Google Analytics https://www.swift.org/ =E2=80=93 Adobe Analytics https://www.ruby-lang.org/ =E2=80=93 Google Analytics We'd be the only one with a self-hosted option, making it the most priva= cy-conscious of the bunch. As far as blocking the analytics goes, Matomo uses a cookieless approach= , so it's rarely blocked (and would not need a GDPR-compliance banner). = Even if someone wanted to block it, meh. We'd still be getting enough = signal to make informed decisions. --Larry Garfield