Jump to content

Help talk:Archiving a source

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Archive.today

[edit]

Looks like this website has been changed to archive.ph please verify and make the relevant changes. Susheel c (talk) 18:31, 20 September 2022 (UTC)[reply]

little problem

[edit]

The last sentence of the paragraph on the IA Wayback Machine does not show up on this page. I suppose it has something to do with the unusual code in the original page. I do not know how to solve this. Can anyone with more "technical" experience please fix this? Thanks, --Dick Bos (talk) 15:07, 9 March 2020 (UTC)[reply]

@Dick Bos: I've switched to a different method for displaying the text and example code here. -- John of Reading (talk) 15:40, 9 March 2020 (UTC)[reply]
[edit]

Should live links be mass archived in all articles preemptively? Please see also this discussion at VP/T. Your input is appreciated. Thank you. Dr. K. 19:06, 13 May 2020 (UTC)[reply]

"New URLs added to Wikipedia articles (but not other pages) are usually automatically archived by a bot."

[edit]

Under the "Internet Archive Wayback Machine" section of this article it says: "New URLs added to Wikipedia articles (but not other pages) are usually automatically archived by a bot." This is empirically untrue. I have created and significantly rewrote and improved over 15 Wikipedia pages, and every single time only some or half or if I'm lucky, most are archived when I use IABot. But every single time I find that I have to make custom archive URLs myself either through Wayback Machine or Archive.is. So what is the deal? Is this statement ("New URLs ... usually automatically archived by a bot.") inaccurate, am I doing something wrong, or both? Does the Wikipedia automatic archiving need to be fixed? And if so, how would I go about correcting the archiving, or notifying the person in charge? Factfanatic1 (talk) 10:51, 18 August 2020 (UTC)[reply]

Discussion about Ghostarchive at WP:ELN

[edit]

Hi, I've started a discussion at WP:ELN about Ghostarchive. Direct link: WP:ELN § Ghostarchive. JBchrch talk 11:33, 11 January 2022 (UTC)[reply]

Archives of webpages that update daily

[edit]

Unsure if this is the wrong place to ask, but is it necessary to add archives to references of webpages whose info. changes daily? E.g. artist Spotify profiles referenced on list articles re: various streaming stats. This is the edit that spurred my question. To me, it doesn't make sense adding profile archives since those wouldn't support the latest stats contained in the profiles i.e. the follower count, which is the whole point of the article. -- Carlobunnie (talk) 01:38, 11 March 2022 (UTC)[reply]

Known as 'content drift'. Stock prices, weather, youtube page views, music rankings, etc.. Find an archive page that verifies the cited fact. If the cited fact keeps changing frequently it might not be a good fit for Wikipedia. For example there was an RfC that removed infobox entries for web site rankings since they changed frequently. Follower counts sounds like a similar problem. -- GreenC 03:26, 11 March 2022 (UTC)[reply]

Preferences

[edit]

The Wayback Machine appears to be WP users' preferred source for archived URLs but when it doesn't save the desired page is archive.today or ghostarchive to be preferred? Mcljlm (talk) 08:15, 17 February 2024 (UTC)[reply]

Do any of you Jc86035, Smasongarrison, WhisperToMe, Micler, Rhain, RCraig09, Alalch E., GreenC, Danbloch, Jc86035 have suggestions? Mcljlm (talk) 16:16, 23 February 2024 (UTC)[reply]
I have used archive.today when archive.org didn't work as intended or was too busy. I'm worried about any archive service discontinuing in the way that webcitation.org did. Does anyone know what assurances we have re these services? —RCraig09 (talk) 16:21, 23 February 2024 (UTC)[reply]
What kind of assurance would even be meaningful in this context? They can't predict the future, they could be legally mandated (by the DMCA or other law) to take down content, and none of them are going to last forever.
I'm fairly certain that archive.today is run by a single person. Jc86035 (talk) 16:33, 23 February 2024 (UTC)[reply]
These are good questions. Obviously they can fail at any time for any reason with no recourse to getting the snapshots back ala WebCite. IMO we should be making private backups using tools at https://webrecorder.net/ as a first step, an offline archive-archive. But there are 10s of millions of snapshots, it's a major undertaking, even if you can get permissions to scrape that much content from the provider. The most vulnerable providers are the smallest and youngest, or those exhibiting signs of trouble like lack of maintenance. -- GreenC 17:02, 23 February 2024 (UTC)[reply]
There doesn't appear to be any policy or guideline that mandates the use of a particular web archive service. archive.today is probably the 2nd-most-common one, but most of the services listed on Help:Archiving a source would work. Jc86035 (talk) 16:22, 23 February 2024 (UTC)[reply]
@Mcljlm: I'm late here, but replying since you pinged me. As someone else said, there doesn't appear to be any policy or guideline on which archive site to use, though there are officially approved bots that go around adding Wayback Machine links, so that's tacitly preferred, as you noted. Archive.today is run by someone who prefers to remain anonymous, doesn't respect takedown requests (despite claims to the contrary), has had to change TLDs, and is commonly used as a paywall bypass as much as an archive. As such, I wouldn't expect it to last forever, although it has already survived since 2012. Since it runs paywall bypass code, it's useful in that way for making citations accessible, but I doubt Wikipedia would officially acknowledge that. It doesn't archive videos at all, and rarely if ever successfully archives Instagram or Facebook. Ghostarchive is newer and less known (doesn't have a Wikipedia article) and I would tend to put even less trust in its longevity (it appears to also be run anonymously). But it appears to be more capable of archiving videos and social media posts, if that is needed. Micler (talk) 14:12, 22 April 2024 (UTC)[reply]
Thanks Micler for your detailed reply. I've noticed archive.today's ability to bypass paywalls when the WM doesn't. That's led me to add it's archive URLs to many citations. I've used Ghostarchive once so far when adding a video after noticing someone else had used it. Mcljlm (talk) 21:15, 22 April 2024 (UTC)[reply]

Wayback Machine is down...maybe forever???

[edit]

Apparently rights holders have started to go after the Internet Archive--the website's been one giant 404 the last few days. What happens to all our archived citations? Is there a backup plan??? Takinzinnia (talkcontribs) 03:49, 14 October 2024 (UTC)[reply]

Not sure where you are getting your information. The site is not even 404, it kindly lands on a 200 page, which says "Sorry" with a link to Twitter that explains what is happening. Check that, or the dozens of news reports. For a deeper dive, these three links are useful background pieces (in that order): [1][2][3] -- GreenC 04:25, 14 October 2024 (UTC)[reply]

Cloudflare

[edit]

Tried to archive a page by the British Museum (britishmuseum.org) with Archive Today and Ghost Archive but Cloudflare seems to block it from completing the archival process in both cases.

Archive Today goes into a loop, here's a snippet:

  • 200 text/javascript 13 GET blob:h t t p s://challenges.cloudflare.com/
  • 200 text/plain 481 POST h t t p s://challenges.cloudflare.com/cdn-cgi/

Ghost Archive shows a page with: "Verify you are human by completing the action below. [...] Performance & security by Cloudflare" But there's no usual box where you click to verify your humanity. Mika1h (talk) 22:48, 26 October 2024 (UTC)[reply]