Wikipedia:Bots/Requests for approval/MusikBot II 3
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: MusikAnimal (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 03:02, Saturday, February 2, 2019 (UTC)
Function overview: Automatically protect high-risk templates and modules
Automatic, Supervised, or Manual: Automatic
Source code available: GitHub
Links to relevant discussions (where appropriate): Special:Permalink/881367182#Bot proposal: automatically protect high-risk templates and modules
Edit period(s): Daily
Estimated number of pages affected: ~500 on the first run. Variable for future runs, perhaps 0 to 5 pages daily.
Namespace(s): Template, Module, Wikipedia, User
Exclusion compliant (Yes/No): Yes, going by the exclusions
hash in the bot configuration.
Adminbot (Yes/No): Yes
Function details: Every day, a query is ran to identify templates and modules that have N number of transclusions, and it will protect them accordingly based on the bot configuration. Here is an explanation of each option and the initial values (per the WP:AN discussion):
- The
thresholds
option specifies what protection level should be applied for what transclusion count. For now this will be set to 500 transclusions for semi-protection (autoconfirmed
), and 5000 for template protection.extendedconfirmed
andsysop
are available as options but for now will be leftnull
(unused). - The
exclusions
(andregex_exclusions
for regular expressions) option is a list of pages that the bot will ignore entirely. The keys are the full page titles (including namespace), and the values are an optional space to leave a comment summarizing why the page was excluded. - The
ignore_offset
option specifies the number of days the bot should wait after a previous protection change (by another admin) before taking further action. The initial value for now will be 7 days. namespaces
Which namespaces to process. For now this includes Template, Module, Wikipedia, and User.
For now, the bot will not lower the protection level to conform to the settings. Move protection is applied using the same level as the edit protection, but again it will never lower the existing protection. The bot will also ignore any page specified at MediaWiki:Titleblacklist which includes the noedit
flag, and the protection level is the same as the one the bot wants to apply.
Discussion
[edit]- If you're running a query like
SELECT tl_namespace, tl_title, COUNT(*) AS ct FROM templatelinks GROUP BY tl_namespace, tl_title HAVING COUNT(*) >= 500
to find the templates to protect, I note that'll be a pretty expensive query and I wonder whether it can be run less often than daily. Anomie⚔ 03:26, 2 February 2019 (UTC)[reply]- Anomie, I imagine (as you have suggested to me in the past) that this could be batched, like:
SELECT tl_namespace, tl_title, Count(*) AS ct FROM templatelinks WHERE tl_from BETWEEN 1 AND 32500 GROUP BY tl_namespace, tl_title HAVING Count(*) >= 500;
- The above takes under 2s to complete, and the size of the batches could in theory be adjusted on the fly. I know there isn't a tl_id field, which would be ideal, but this would make the overall query much less expensive. SQLQuery me! 05:37, 2 February 2019 (UTC)[reply]
- I've been doing something similar to:
SELECT page_title FROM page JOIN templatelinks ON page_title = tl_title AND page_namespace = tl_namespace LEFT JOIN page_restrictions ON pr_page = page_id AND pr_level IN (...) AND pr_type = 'edit' WHERE tl_namespace = 10 AND pr_page IS NULL GROUP BY page_namespace, page_title HAVING COUNT(*) >= 500;
- Which usually takes around 30 seconds, and only a few seconds for the Module namespace. If you've any suggestions to improve it, please enlighten :) I don't think it's crazy long for a task of this nature. Do I need to check other namespaces, too?
- Another thing I should bring up: There are subpages of Template:POTD for each individual day, and the current day always has a lot of transclusions. The following day it's removed from whichever template and the count goes back down again. Should I exclude these templates? Or add some special code to protect/unprotect accordingly, every day? — MusikAnimal talk 07:43, 2 February 2019 (UTC)[reply]
- POTD only has ~600 transclusions on low-visibility and low-vandalism-target userpages, so I think those templates should be excluded. Galobtter (pingó mió) 07:58, 2 February 2019 (UTC)[reply]
- Per Wikipedia:Administrator's noticeboard/Incidents#The Signpost vandalized, Could the bot run the query on Wikipedia space/other namespaces? (perhaps only weekly if the query is too slow?)
- And I note that Wikipedia:Database reports/Unprotected templates with many transclusions appears to run a very similar query. Galobtter (pingó mió) 10:00, 2 February 2019 (UTC)[reply]
- @Galobtter: Sure, we can include the Wikipedia namespace. It after all seems like the only other namespace that would contain highly-transcluded pages. — MusikAnimal talk 10:04, 2 February 2019 (UTC)[reply]
- I can see some userboxes and user templates having high transclusions - e.g there's User:Resoru/UBX/VG/ with 1700 transclusions etc. Galobtter (pingó mió) 10:17, 2 February 2019 (UTC)[reply]
- Userboxes... of course. Sure, we can check the userspace too. — MusikAnimal talk 19:10, 2 February 2019 (UTC)[reply]
- I can see some userboxes and user templates having high transclusions - e.g there's User:Resoru/UBX/VG/ with 1700 transclusions etc. Galobtter (pingó mió) 10:17, 2 February 2019 (UTC)[reply]
- @Galobtter: Sure, we can include the Wikipedia namespace. It after all seems like the only other namespace that would contain highly-transcluded pages. — MusikAnimal talk 10:04, 2 February 2019 (UTC)[reply]
- @MusikAnimal: Wow, that's surprisingly fast. Looks like it touches about 1e7 rows, finding the possible titles first then diving into the templatelinks tl_namespace index for each one. ... The ~4% of templates that are already protected account for ~98% of all template transclusions, so the query only has to look through the remaining ~2% of transclusions. I withdraw my concern, but you might have it give you some sort of warning if that query starts taking significantly more time. Anomie⚔ 13:42, 2 February 2019 (UTC)[reply]
Now that you've posted the source code, I've given it a quick review. Note I don't actually know python Ruby, so I mainly looked at the general logic.
- L70-L81: The query you have here seems significantly slower than the one you posted earlier. Among other things, there should be no need for "DISTINCT(page_title)" nor for ordering the results.
SELECT page_title AS title, COUNT(*) AS count FROM page JOIN templatelinks ON page_title = tl_title AND page_namespace = tl_namespace LEFT JOIN page_restrictions ON pr_page = page_id AND pr_level IN ('autoconfirmed', 'templateeditor', 'extendedconfirmed', 'sysop') AND pr_type = 'edit' WHERE page_namespace = #{ns} AND pr_page IS NULL GROUP BY page_title HAVING COUNT(*) >= #{threshold}
- L77, L80, L93: I don't see anything obvious that prevents SQL injection if
#{ns}
,#{threshold}
, or#{@mb.config[:ignore_offset]}
are set to unexpected values. Yes, MediaWiki's restriction of editing .json pages helps, but it doesn't hurt to double check it. Simply casting them to integers before interpolating would be good. - L88-L93: Seems like you could add
LIMIT 1
to the query to avoid fetching extra rows when all you care about is whether any rows exist. - L99: Does the
tbnooverride
parameter to action=titleblacklist not work here?
HTH. Anomie⚔ 14:06, 4 February 2019 (UTC)[reply]
- Ha, it is clear that you
don't actually know python
, because the code is in Ruby :) Galobtter (pingó mió) 14:30, 4 February 2019 (UTC)[reply]- And I don't know Ruby either! ;) Anomie⚔ 13:01, 5 February 2019 (UTC)[reply]
- @Anomie: Thanks for the code review! I have made some changes based on your feedback. I am using prepared statements now, but am not doing any type casting. I think it's better for it to fail entirely in this case (and logged to User:MusikBot II/TemplateProtector/Error log). You were right that the main query is a little slower, apparently due to the COUNT in the SELECT clause? It still maxes out at around 1 to 2 minutes run time, which I don't think is terrible. The whole task takes about 5 minutes to complete.
Does the
-- it does not appear to. I always get "ok" when logged in as the bot. Regards, — MusikAnimal talk 20:34, 4 February 2019 (UTC)[reply]tbnooverride
parameter to action=titleblacklist not work here?- The selecting of COUNT(*) isn't the problem, the problems were the ORDER BY (which you fixed) and GROUPing BY tl_title instead of page_title (which you didn't fix yet). Sometimes MySQL can figure out things are equivalent based on join or where clauses and sometimes it can't, and this seems to be one where it can't.
- Switching to a parameterized query for
self.recently_protected?
should be sufficient, as it should result in an SQL error being thrown on bad input rather than an SQL injection. - What's the exact query you're trying with
tbnooverride
? It works when I try something like this, both with this account and with AnomieBOT's account. Anomie⚔ 13:15, 5 February 2019 (UTC)[reply]- @Anomie: An example would be for Template:Taxonomy/Doridina, e.g. [1]. I get "ok" while logged in and "blacklisted" while logged out. I guess it's just an issue for titles restricted to
autoconfirmed
? — MusikAnimal talk 17:25, 5 February 2019 (UTC)[reply]- Yeah, it looks like there's no way to override the "autoconfirmed" restriction. Anomie⚔ 21:58, 5 February 2019 (UTC)[reply]
- @Anomie: An example would be for Template:Taxonomy/Doridina, e.g. [1]. I get "ok" while logged in and "blacklisted" while logged out. I guess it's just an issue for titles restricted to
- Another issue I've encountered: Sometimes there is a highly visible Wikipedia page that is managed by a bot, for instance Wikipedia:Good article nominations/Topic lists/Video games. If MusikBot were to template-protect, the bot can no longer edit it. In my opinion, we should just add template editor rights to such bots. If the transclusion count is really that high, I don't think it's safe to leave it under mere semi-protection. Another option is to check the revision history and try to deduce if it is bot-maintained. That seems error-prone and would be rather expensive, so I'm going to advise against this strategy. Finally, we could just ignore the Wikipedia namespace altogether. I have not encountered a bot-maintained Template or Module, and I suspect such bots would be handed template editor rights anyway. Thoughts? — MusikAnimal talk 01:09, 6 February 2019 (UTC)[reply]
- Ugh, there's also WikiProject to-do templates, e.g. Wikipedia:WikiProject Bangladesh/to do. Many include constructive edits from unconfirmed users. I can exclude these using the
regex_exclusions
option, since all seem to end in "to do", "ToDo", or "to-do", etc. But again... some have an awfully high transclusion count. What to do? — MusikAnimal talk 01:18, 6 February 2019 (UTC)[reply] - IANA BAG member or BOTP expert, but if you were to generate a one-time list of guesses for such bots, I'd be more than comfortable granting template-editor to such bots assuming their operators having the equivalent. As you say, high risk transclusions should be protected, and bots should be made to work within that system not the other way around. (As an aside, this would be/have been a good argument for allowing ECP in this task.) ~ Amory (u • t • c) 01:23, 6 February 2019 (UTC)[reply]
- @Amorymeltzer: There's Legobot for Wikipedia:Good article nominations/Topic lists/Video games and WugBot for Wikipedia:Good article nominations/backlog/items. That's the only two I've encountered thus far. — MusikAnimal talk 01:34, 6 February 2019 (UTC)[reply]
- Sweet! WP:GAN/backlog/items has only ~700, so safely far from the TE level, and I think we can trust Legoktm ~ Amory (u • t • c) 01:40, 6 February 2019 (UTC)[reply]
- It might make more sense to just put such pages on your exclude list than to give random bots templateeditor. Anomie⚔ 01:57, 6 February 2019 (UTC)[reply]
- @Anomie: I've already semi'd both. But Wikipedia:Good article nominations/Topic lists/Video games has nearly 80,000 transclusions. That's a lot! At some point we have to draw the line... or use extended-confirmed protection? — MusikAnimal talk 02:03, 6 February 2019 (UTC)[reply]
- Given the circumstances (Legobot isn't a template editor), I went ahead and broke the rules by adding ECP to Wikipedia:Good article nominations/Topic lists/Video games. The issue of what MusikBot should do in this scenario still stands. I guess we'll just handle it on a case by case basis. — MusikAnimal talk 04:16, 6 February 2019 (UTC)[reply]
- I think the easiest thing would be to, after the initial run, wait ~week for the next run, so that people can point out these edge cases to be added to the exclusion. Another thing you'd want to do is prepopulate the exclusion list with templates that have been ECP protected, because they will almost all be ones where people like Primefac have lowered the protection after (batch) template-protecting templates, and the bot shouldn't annoy people by again template-protecting the templates.
- You'd also definitely want to exclude WikiProject banner templates from template-protection - Primefac batch protected all templates with ~2000+ transclusions nearly a year ago but reduced to semiprotection WikiProject templates, as they don't really need TPE. Galobtter (pingó mió) 06:31, 6 February 2019 (UTC)[reply]
- Would it be better to just flat-out exclude all WikiProject banner templates, since they're likely all semi'd by now? I assume new WikiProjects aren't created that often. I'd prefer to leave this special handling to humans. Otherwise we'd need to further complicate the configuration by allowing you to specify protection levels for each of the
exclusions
andregex_exclusions
. - Right now the bot only targets unprotected templates/modules, so we wouldn't be template-protecting anything that Primefac had lowered to ECP. Again if we want options like "exclude this page from template-protection, but do include it for semi-protection", it will complicate the configuration, which I'm hoping to avoid. — MusikAnimal talk 18:14, 6 February 2019 (UTC)[reply]
- Would it be better to just flat-out exclude all WikiProject banner templates, since they're likely all semi'd by now? I assume new WikiProjects aren't created that often. I'd prefer to leave this special handling to humans. Otherwise we'd need to further complicate the configuration by allowing you to specify protection levels for each of the
- Given the circumstances (Legobot isn't a template editor), I went ahead and broke the rules by adding ECP to Wikipedia:Good article nominations/Topic lists/Video games. The issue of what MusikBot should do in this scenario still stands. I guess we'll just handle it on a case by case basis. — MusikAnimal talk 04:16, 6 February 2019 (UTC)[reply]
- @Anomie: I've already semi'd both. But Wikipedia:Good article nominations/Topic lists/Video games has nearly 80,000 transclusions. That's a lot! At some point we have to draw the line... or use extended-confirmed protection? — MusikAnimal talk 02:03, 6 February 2019 (UTC)[reply]
- It might make more sense to just put such pages on your exclude list than to give random bots templateeditor. Anomie⚔ 01:57, 6 February 2019 (UTC)[reply]
- Sweet! WP:GAN/backlog/items has only ~700, so safely far from the TE level, and I think we can trust Legoktm ~ Amory (u • t • c) 01:40, 6 February 2019 (UTC)[reply]
- @Amorymeltzer: There's Legobot for Wikipedia:Good article nominations/Topic lists/Video games and WugBot for Wikipedia:Good article nominations/backlog/items. That's the only two I've encountered thus far. — MusikAnimal talk 01:34, 6 February 2019 (UTC)[reply]
- Ugh, there's also WikiProject to-do templates, e.g. Wikipedia:WikiProject Bangladesh/to do. Many include constructive edits from unconfirmed users. I can exclude these using the
- My general thoughts is that counting transclusions isn't a very good metric for "highly visible template/module". I think page views is a significantly better metric. Legoktm (talk) 05:32, 6 February 2019 (UTC)[reply]
- Page views would be extremely slow to query, but I suppose the bot could set the threshold for template-protection as: either 2000+ article space transclusions - which are very disproportionately viewed - or 10000+ non-article space transclusions, because vandalism or disruption on templates transcluded on talk pages is lower, and semi-protection would stop most of it. Though I think the blanket 5000 threshold works fine enough and not sure if it should be complicated. Galobtter (pingó mió) 06:31, 6 February 2019 (UTC)[reply]
- I thought about going by pageviews. It would be interesting to see the results, to say the least! Though I question how feasible it is to go through every Template/Module/User/Wikipedia page and get the pageviews of all the transclusions :( Depending on the circumstances, it could take days to finish and be error-prone. Pageviews anomalies happen a lot too: e.g. false traffic from an undeclared bot, or recent deaths/incidents that can overnight send a generally unpopular page to the top of the charts. Defining the conditions for pageviews and working out all the edge cases is going to be a nightmare, let alone how slow it would be and challenging to implement. I'd love to rope in some pageviews logic, but hopefully we can save that for version 2 :)
- But I do like Galobtter's compromise of going by the namespaces of the transclusions. That is a simple tweak to the query, and might even make the task as a whole faster (or slower... ;). It will complicate the configuration, though. I guess it would look something like:
"thresholds": { "sysop": null, "template": { "mainspace": 2000, "non_mainspace": 10000 }, "extendedconfirmed": null, "autoconfirmed": { "mainspace": 500, "non_mainspace": 500 } }
- a little ugly :/ I'm a bit hesitant to change the thresholds at this time. Shouldn't we go back to WP:AN for further input? I'd argue we should go with the current consensus, and see how people react after the first round of protections. I really like how simple the system is right now. — MusikAnimal talk 17:54, 6 February 2019 (UTC)[reply]
- Page views would be extremely slow to query, but I suppose the bot could set the threshold for template-protection as: either 2000+ article space transclusions - which are very disproportionately viewed - or 10000+ non-article space transclusions, because vandalism or disruption on templates transcluded on talk pages is lower, and semi-protection would stop most of it. Though I think the blanket 5000 threshold works fine enough and not sure if it should be complicated. Galobtter (pingó mió) 06:31, 6 February 2019 (UTC)[reply]
- Regarding
The bot will also ignore any page specified at MediaWiki:Titleblacklist which includes the
, the bot should still template-protect taxonomy templates like Template:Taxonomy/Embryophytes, being used on 10000+ articles and regularly getting disruption from people not getting a consensus for their changes (and as autotaxobox gets more widely used over manual taxobox, the transclusions of these templates are rising pretty quickly). Galobtter (pingó mió) 06:31, 6 February 2019 (UTC)[reply]noedit
flag.- That seems reasonable. Maybe we should compare against the Titleblacklist protection level. So for taxonomy templates, if there are less than 5000 transclusions, we don't protect at all since it is already done by the Titleblacklist (as specified with
autoconfirmed
). If the template has >= 5000 transclusions, we template-protect as we would any template. That keeps it simple; basically checking the Titleblacklist is only done to avoid redundant protections. I think this is what Od Mishehu was going for when they commented on the WP:AN discussion. — MusikAnimal talk 18:04, 6 February 2019 (UTC)[reply]
- That seems reasonable. Maybe we should compare against the Titleblacklist protection level. So for taxonomy templates, if there are less than 5000 transclusions, we don't protect at all since it is already done by the Titleblacklist (as specified with
- Regarding multiple protection types - how will you handle these pages? (e.g. if the page has different move protections and edit protections) — xaosflux Talk 15:44, 7 February 2019 (UTC)[reply]
- Only edit protection is applied, though we could do move as well if you think it makes sense to do so? Note also we're only looking for templates/modules that are completely unprotected (for editing, not moving). — MusikAnimal talk 17:05, 8 February 2019 (UTC)[reply]
- Redirect handling? How are you going to handle redirects? (e.g. {{CLEAR}} vs {{Clear}}) ? — xaosflux Talk 15:44, 7 February 2019 (UTC)[reply]
- Redirects are not followed. — MusikAnimal talk 17:06, 8 February 2019 (UTC)[reply]
- When a redirect is transcluded, MediaWiki includes both the redirect and the target page in the templatelinks table. So if 700 pages transclude {{CLEAR}} and 900 transclude {{Clear}} directly (and no pages transclude any other redirect), the bot would see 700 for Template:CLEAR and 1600 for Template:Clear. And, I presume, it would protect each page accordingly? Anomie⚔ 21:12, 8 February 2019 (UTC)[reply]
- Yep! The bot goes by whatever the count is in templatelinks, regardless if the page is a redirect. That is the intended behaviour, I hope? — MusikAnimal talk 21:22, 8 February 2019 (UTC)[reply]
- Sounds like good behavior to me. Anomie⚔ 12:31, 9 February 2019 (UTC)[reply]
- Yep! The bot goes by whatever the count is in templatelinks, regardless if the page is a redirect. That is the intended behaviour, I hope? — MusikAnimal talk 21:22, 8 February 2019 (UTC)[reply]
- When a redirect is transcluded, MediaWiki includes both the redirect and the target page in the templatelinks table. So if 700 pages transclude {{CLEAR}} and 900 transclude {{Clear}} directly (and no pages transclude any other redirect), the bot would see 700 for Template:CLEAR and 1600 for Template:Clear. And, I presume, it would protect each page accordingly? Anomie⚔ 21:12, 8 February 2019 (UTC)[reply]
- Redirects are not followed. — MusikAnimal talk 17:06, 8 February 2019 (UTC)[reply]
- Are you implmeneting downgrade prevention? Under what circumstances would you downgrade protection? — xaosflux Talk 15:50, 7 February 2019 (UTC)[reply]
- Nope. Protection levels are never lowered by the bot. Future iterations of the bot may do this, pending discussion. For now, I'd like to get a simple solution deployed and see how the community reacts. — MusikAnimal talk 17:13, 8 February 2019 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. OK to trial, 25 SPP's, 25 TEP's. — xaosflux Talk 00:46, 18 February 2019 (UTC)[reply]
- Also, MOVE protection should also be applied along with EDIT protections (never lowering any existing still even if they are of different levels) - reasoning is that this is the 'default' behavior for human admins. — xaosflux Talk 00:48, 18 February 2019 (UTC)[reply]
Trial results
[edit]Trial complete. See [2]. There were only 23 pages that qualified for template protection. Hopefully that's sufficient for the trial. As is always the case with my bot trials, the edits were semi-automated, hence the gaps between timestamps. I carefully reviewed each template before it applied protection, and as far as I can tell the bot performed as it should. — MusikAnimal talk 00:15, 19 February 2019 (UTC)[reply]
- This shouldn't hang things up, but would it be too much extra effort to skip adding move protection if the bot is semiprotecting? (auto)confirmed is required to move pages anyway, so it's extraneous. ~ Amory (u • t • c) 01:40, 19 February 2019 (UTC)[reply]
- Can do — MusikAnimal talk 01:44, 19 February 2019 (UTC)[reply]
- Though, I was sort of designing this to be wiki-agnostic. Not sure if this default move protection exists on other wikis — MusikAnimal talk 01:45, 19 February 2019 (UTC)[reply]
- Can do — MusikAnimal talk 01:44, 19 February 2019 (UTC)[reply]
- I might prefer it to say "at-risk" for the autoconfirmed templates/modules, but probably not worth it. To my non-BAG eyes, these protections all look good — got a couple of redirects in there, which is nice. The TWA badges are a great example of why this is a great idea, and the GAC criteria are another example of why extended confirmed would be good. ~ Amory (u • t • c) 11:21, 19 February 2019 (UTC)[reply]
Approved. Primefac (talk) 20:44, 24 February 2019 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.