Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:123773 X-Original-To: internals@lists.php.net Delivered-To: internals@lists.php.net Received: from php-smtp4.php.net (php-smtp4.php.net [45.112.84.5]) by qa.php.net (Postfix) with ESMTPS id E34FC1A009C for ; Mon, 24 Jun 2024 03:49:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=php.net; s=mail; t=1719201021; bh=HqMXSMxOmz/Ta/qBJtgBXD2zIsAAVZypyMl3H1wdYAo=; h=References:In-Reply-To:From:Date:Subject:To:From; b=ISrwwSSGiWVOy2kl+mqQN9RPziVJvxuHsDQss8v+EEZ2ng4IlzUZ6K/d8QREI7wxB olhdQ4rkMpqEaFSXX/avLk4MyG0AVyaXHxcl/rHQyIVrdEixqOTIo9qwbmJ0+n2XQv LwpHgjn5gulWzgSFJwxEawcKyLzSVdV0RJKl+/Ysyhc40MQDWQJJx6azqWDthPI9D0 qgEwZ7gkKxIzhvbBoLEHqKTFtB1MOzuT0H9GIuSrrKarn5kx8keN/zkN31CcxCvgOm vXvSUBHK9Zgv1TVxM2btOptGwlFydI0Ys4It+S+O5k1II1uTjSldaPsGTtW2Ynbf/9 Fh8OvcI/dbATw== Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 80A6418003F for ; Mon, 24 Jun 2024 03:50:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=BAYES_50,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,DMARC_PASS,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=4.0.0 X-Spam-Virus: Error (Cannot connect to unix socket '/var/run/clamav/clamd.ctl': connect: Connection refused) X-Envelope-From: Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 24 Jun 2024 03:50:17 +0000 (UTC) Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-4247102da30so4821885e9.0 for ; Sun, 23 Jun 2024 20:49:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719200939; x=1719805739; darn=lists.php.net; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=xZcukyYv+Bca6p0P4QhDPa+GPQ9o5d3pavmyWAQThBA=; b=LiJUxIlZABwPXEncExZBAXrgPxASFaidgMsyePu1o1jSgmhld5eMYB9XIpu+RFfoRk I5RbSw5nkO8yDfWfNTbnI+poV1gZZVXBvTdBR/lVwzdZRn2K4MK9cfn9DuD5Vy/jK3G7 TW9W0krJRNouqDyVCZeANrJQrZ03hLBI3s/D0HsYYNudapwN70bByGVuiEv6uwGGCHXH EKrJx9HVZLKI5dWVHehfFWPn6GKbcvppwwwGhYvLALSu6pg0tR/9qr/XXGhtx3lCkmTz HbywqT5VXK82cMPUrTD0mlcjFVYkoBQAgOXzT7bdrqUwasufKr2zf4Zez/R86eq/8yLo yvfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719200939; x=1719805739; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xZcukyYv+Bca6p0P4QhDPa+GPQ9o5d3pavmyWAQThBA=; b=rITKJlS2YSRgaKtK/x239FHkcjbH56NcZ3ihwxm36MVaXQleXeUOryaWS2YZhFyOET rGFmWfXvQahNAF8BPCUIW8l38+gIoiif4HRZ8M9VrxRP3xP85AFpx2zSXDJ9hTp+hZdN /Q8UGY8DDhPdhLc5xZ9sbut0xw/WjQSBeG9jPgOOZm96ag/stMDby2uwSOZ3XOHEpQzG yoINqcKlb8l6P0CetKpB9A+46PUSgGvSmAFf1ZLtcGa231NYiz8B3+Op4F/T9prlv+nC UmfCw3Jiz5ul0Kev6zbHDqKdkVEH+WoQ2iDCBZ56OC7HqMpJQ74Xp+FLdO64BSXWZUx/ ubjg== X-Gm-Message-State: AOJu0YxSAr5o4ohu/MWm8mVxeX5nE2GOGBk8EguDZhvv/XcNFo8hd2hS ih3STXUEnTIVfKyUiCeKJcg+rQ8EcDJBvsCnR/Lmj4gexMwTSTulECCg+bGLLtH9fPk10w8iP9m 59RxjBPTTh+Z49L02jBOgq033J6zTZeqyB0s= X-Google-Smtp-Source: AGHT+IGOi8nMYRN7/fIkx3r6p8+NIB9R9Na/XeQUZVJxdommRMZQBiF+fdL2jpioyyJTUxkVaAyWfFwfkHWcxYgNbbc= X-Received: by 2002:a5d:588c:0:b0:35f:247e:fbce with SMTP id ffacd0b85a97d-366e2a0999amr3570791f8f.1.1719200939253; Sun, 23 Jun 2024 20:48:59 -0700 (PDT) Precedence: bulk list-help: list-post: List-Id: internals.lists.php.net MIME-Version: 1.0 References: <2a6b92eb-d5e9-4a1a-9548-a068ac42ebd2@app.fastmail.com> In-Reply-To: <2a6b92eb-d5e9-4a1a-9548-a068ac42ebd2@app.fastmail.com> Date: Mon, 24 Jun 2024 10:48:47 +0700 Message-ID: Subject: Re: [PHP-DEV] [Early Feedback] Pattern matching To: php internals Content-Type: text/plain; charset="UTF-8" From: the.liquid.metal@gmail.com (Hendra Gunawan) > Ilija and I have been working on and off on an RFC for pattern matching since the early work on Enumerations. A number of people have noticed and said they're looking forward to it. Hello Larry. Thanks for this Juicy proposal. I agree that this proposal should be proposed as a whole. If we break it down into smaller parts, there is a chance some features will not pass. I will show you that the range pattern and regex pattern have greater value than what you think, and they must be placed together with the literal pattern (in Core patterns section, not in Possible patterns section). I don't want these patterns to fail in the poll. I don't know the line between "high level" and beyond. I hope that whatever I discuss here is still "high level". 1. Is there any chance of ```is not``` or ```isnot```? I am tired looking this ugly code: ```!($foo instanceof Bar)```. If there is, there should be a restricted version of pattern: no variable binding. It doesn't make sense if the binding happens but the pattern fails. 2. ```..=``` and ```..<``` make people have to remember what the left side is: is it greater-than or greater-than-equal. What if those operators are replaced with ```=..=``` and ```<..=```? We don't have to align with other languages if they have weaknesses. 3. There are "between" and ```>``` operators explicitly shown in the proposal (in Range pattern section). I assume that ```>=```, ```<```, ```<=``` implicitly included. What about the ```%``` operator, is it also included? ```%``` is not directly produces boolean, but it can simplify pattern. Note that, we must include additional rules: if produced number is ```0``` (zero) then it will be converted to ```true```, otherwise ```false```. ``` $foo = 2024; // leap year $foo is %4; // true $bar = 2025; // not a leap year $bar is %4; // false // furthermore: $baz is array<%4>; // true if all its members are leap year ``` 4. There are numeric range patterns explicitly shown in the proposal. Hopefully there are also string range patterns. ``` $birthdate is "2000-01-01 00:00:00" <..< "2019-12-31 23:59:59"; $dateOfDeath is <"1980-01-01"; $name is "larry " =..< "larry!"; // true if first name is "larry" ``` I have no idea about string range patterns for array, since it causes error. If there is currently no solution, hopefully we don't abandon string range patterns entirely. The same case also occurs with numeric range patterns for array. 5. It is great that regular expression native syntax is now a first class citizen in php (at least for pattern matching). I found there are 2 drawbacks in this case: loss of flexibility and repeated identical patterns. These things are not found in string based regular expression. ``` // native syntax $foo is /^https:\/\/(?[^\/]*)/; // string based: we can use any valid char as delimiter, // as long as it is not used in the pattern. $pattern = "/^https:\/\/(?[^\/]*)/"; $pattern = "|^https://(?[^/]*)/|"; // valid! $pattern = "@^https://(?[^/]*)/@"; // valid! // ---------------------- // native syntax $foo is /^https:\/\/(?[^\/]*)/ | StringBox{value: /^https:\/\/(?[^\/]*)/}; // string based: RE stored in variable/constant can be used as many as we need $pattern = "|^https://(?[^/]*)/|"; $foo is @RE($pattern) | StringBox{value: @RE($pattern)}; // 1st pattern is scalar string, 2nd is string encapsulated in a class // furthermore use GenericPattern as GP; class Person { public string $firstName is @RE(GP::NAME_PTRN); public string $lastName is @RE(GP::NAME_PTRN); // ... } ``` Hopefully, string based regular expression is also supported. Honestly, if I could choose whether I should support native or string-based syntax, I would choose to support string-based, as long as native syntax is not yet fully supported in general. ```@RE()``` is just an illustration on how to use variable as a regular expression in the pattern. It is stated that ```@()``` will be used in arbitrary expressions. Correct me if I'm wrong, regexp are one of the things in programming that cannot be manipulated or participate in manipulation with other parties, even with fellow regexp. Regexp must be used alone. That is why we need dedicated syntax for regexp. 6. It is shown that type pattern, literal pattern, class constant pattern, and expression pattern can be used to form a compound pattern. Hopefully, the range pattern and regex pattern have the same luxury. Furthermore, hopefully they can be mixed. ``` $foo is 2000 =..= 2100 & %4; // leap year in 21st century class person { public string $firstName is /LENGTH_PTRN/ & /FORBIDDEN_CHARS_PTRN/ & /FORBIDDEN_WORDS_PTRN/; public string $lastName is @RE(GP::LENGTH_PTRN) & @RE(GP::FORBIDDEN_CHARS_PTRN) & @RE(GP::FORBIDDEN_WORDS_PTRN); // these are hard to maintain public string $firstName is @RE(GP::COMPLEX_NAME_PTRN); public string $lastName is @RE(GP::COMPLEX_NAME_PTRN); } 7. I noticed that ```as``` is tightly coupled with exception. Can we suppress this exception with ```??```? ``` // it is weird to see this statement (as show in the proposal) $value = $foo as Foo {$username, $password}; // these statements are more make sense $foo as Foo {$username, $password}; $foo is Foo {$username, $password} ?: throw new Exception(); // if it can be suppressed, "as" is more valuable than what people think. $newRect = $rect as Rectangle{width: <=10, height: <=5} ?? new Rectangle(width: 10, height: 5); ``` 8. Is there any type checking for object property pattern? ``` class Circle { public int $radius; } // this statement will always fails unless there is type checking $circle is Circle{radius: "10"}; ``` ~~~~~ 4 of the 8 points I discussed above are related to range pattern and regex pattern. Both are used daily. From my point of view, literal pattern is no more special than range pattern and regex pattern. IMO, both should be placed in Core patterns section, not in Possible patterns section. Regards, Hendra Gunawan.