Wikipedia:Bots/Requests for approval/CobraBot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.
Operator: Cybercobra
Automatic or Manually assisted: Automatic Manual (at least for now; may file another BRfA for Automatic if this goes off without any hitches)
Programming language(s): Python (pywikipedia)
Source code available: User:CobraBot/Code (Will post shortly)
Function overview: Add OCLC# parameter to pages that use Template:Infobox Book based on ISBN in the infobox, if it is given.
Edit period(s): Several runs as my time permits until either task complete; periodic re-reruns (e.g. quarterly) as new pages & |isbn=
-es added or BRfA for automatic running filed
Estimated number of pages affected: However many pages transclude Template:Infobox Book, and specify an ISBN (55% maybe?) (see [1]); Several thousand over the first week
Exclusion compliant (Y/N): Y (via pywikipedia defaults) N (Don't know how to easily implement it)
Already has a bot flag (Y/N): N (needs one)
Function details:
- Bot chooses an article that transcludes Template:Infobox Book
- Bot locates the template in the article
- Bot checks if
|oclc=
parameter is present- If yes, and value is non-whitespace, page is skipped (OCLC# already present). GOTO step 1.
- If yes, and value is whitespace, parameter is removed.
- If no, continue.
- Bot grabs
|isbn=
parameter- If parameter not present, page is skipped (No ISBN to use for OCLC# lookup). GOTO step 1.
- The value of the parameter is obtained, extra preceding "ISBN" text or dashes are stripped from the obtained value
- If the value is "N/A" or similar, page is skipped (No useful ISBN to use for OCLC# lookup). GOTO step 1.
- Using a proprietary process, the corresponding OCLC# is found for the given ISBN. The title of the work corresponding to the OCLC# is also obtained.
- The OCLC# is added to the infobox body using the
|oclc=
parameter - (In assisted editing mode only) The bot operator is presented with the title of the WP page, ISBN, OCLC#, and OCLC title and asked to confirm the change.
- Page changes are saved.
- GOTO 1 until all pages either processed or skipped.
Discussion
[edit]- I've run the bot for testing without having it actually modify the pages on a decent number of articles and think all the bugs are worked out. The bot is more conservative than necessary in that the code for finding where the infobox ends is rather dumb and will think the template ends early if its body has another template within it, thus it might end up skipping some pages it otherwise could help. --Cybercobra (talk) 00:19, 24 September 2009 (UTC)[reply]
- Bot also skips cases where
|oclc=
value empty except for a comment. Also, bot has been much refactored (see updated code page) and is being re-tested. --Cybercobra (talk) 06:28, 24 September 2009 (UTC)[reply] - Changed type to Automatic after not observing problems after significant testing. --Cybercobra (talk) 07:42, 24 September 2009 (UTC)[reply]
- Currently running code in assisted editing mode for demonstration/testing. --Cybercobra (talk) 17:29, 24 September 2009 (UTC)[reply]
- Assisted editing run with human oversight complete. 50 edits for examination. Only issues were an attempt to edit a talkpage (code now ensures pages are in article namespace) and one apparent deficiency in WorldCat's database (Heretics of Dune's ISBN maps to the OCLC# of its French translation). --Cybercobra (talk) 18:22, 24 September 2009 (UTC)[reply]
- After even more testing, code now changed to ensure ISBN is of plausible length --Cybercobra (talk) 23:48, 24 September 2009 (UTC)[reply]
- And with lots of further testing, a couple of rare corner cases were found and now handled. I am confident any remaining issues must be ridiculously obscure & infrequent. --Cybercobra (talk) 05:30, 25 September 2009 (UTC)[reply]
- Great, the edits you've done so far look pretty good, I'll look at them more thoroughly later. I think making the bot exclusion compliant is fairly easy in pywikipedia, although I don't actually know how the language works, I have this from a previous BRfA: pywikipedia default setting/ignore_bot_templates = False. Do you think you could use this to make this bot exclusion compliant please? Thanks - Kingpin13 (talk) 08:26, 25 September 2009 (UTC)[reply]
- After grepping for
ignore_bot_templates
, pywikipedia appears to be exclusion-compliant by default; I confirmed by testing on Three Men in a Boat. --Cybercobra (talk) 08:49, 25 September 2009 (UTC)[reply]
- After grepping for
- Great, the edits you've done so far look pretty good, I'll look at them more thoroughly later. I think making the bot exclusion compliant is fairly easy in pywikipedia, although I don't actually know how the language works, I have this from a previous BRfA: pywikipedia default setting/ignore_bot_templates = False. Do you think you could use this to make this bot exclusion compliant please? Thanks - Kingpin13 (talk) 08:26, 25 September 2009 (UTC)[reply]
- And with lots of further testing, a couple of rare corner cases were found and now handled. I am confident any remaining issues must be ridiculously obscure & infrequent. --Cybercobra (talk) 05:30, 25 September 2009 (UTC)[reply]
- After even more testing, code now changed to ensure ISBN is of plausible length --Cybercobra (talk) 23:48, 24 September 2009 (UTC)[reply]
- Assisted editing run with human oversight complete. 50 edits for examination. Only issues were an attempt to edit a talkpage (code now ensures pages are in article namespace) and one apparent deficiency in WorldCat's database (Heretics of Dune's ISBN maps to the OCLC# of its French translation). --Cybercobra (talk) 18:22, 24 September 2009 (UTC)[reply]
Approved. After some further review of the trial edits, this bot seems to be doing just fine. No concerns, and good bot task - Kingpin13 (talk) 10:09, 25 September 2009 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.