Sunday, April 1, 2018

Archiving Herfordshire Local History web sites and the Wayback Machine


I have just been asked about saving the "Genealogy in Hertfordshire" web site on the Wayback machine - and I have recently been involved in discussions within the Tring Local History Society about consolidation various Tring local history digital records to ensure that they remain accessible when the current owners are no longer in a position to support them. This suggests that it might be appropriate to try and develop a county wide policy for archiving the scattered web sites of local historians and societies. However for the moment I will just say a little about the Wayback Machine and how it relates to my web site..

www.hertfordshire-genealogy.co.uk  has been automatically monitored by the Wayback Machine since 2000 and pages have been captured on 248 occasions. The first pages captured are from 2000 - when it was a bulletin board - but on retrieving them you only get the page framework and the actual bulletin posts, which contain the historical information, are missing.

The site "Genealogy in Hertfordshire" that most people are familiar with, started in April 2001 and a check shows that several extensive snapshots of the site were taken then. I have looked at the later April archive and clicked through a number of pages - and it is possible that the whole site has been recorded as it was originally created. A similar very limited spot check in November 2001 failed to reveal any missing pages.

A check at the end of 2005 showed that while many pages were present, sections such as the pages containing a copy of the the booklet "Tring in 1947" were missing. On the other hand my WW1 talk on "The Terriers in West Herts" appears to be present, and possibly complete. It is clear that Wayback was only archiving a sample of the pages at this date.

By 2010 Wayback was only visiting the site about 4 times a year - and a check shows the Home page with the picture missing. However a quick check showed it was possible to follow some of the menus and interestingly the "Tring in 1947" pages were now accessible, so may have been archived between 2005 and 2010..

There has only been 1 snapshot in 2018, up to the end of March. The main entry page has two pictures missing but it is definitely possible to navigate through parts of the site. However it looks as is there is a limit to the depth of nesting of links and most of the pages and pictures relating to my recent research on the St Albans post card artist signing his name Karaktus had not been archived. I also suspect that many other pages, which were archived years ago, may not show recent updates. Two links on the home page are interesting. The link to the "Hertfordshire News" blog takes you to an archive copy of the blog (sampled 17 times since 2013). The link to "A Guide to Old Hertfordshire" took ages to load the google map, and if you click on a flag on the map it appeared to take you to the correct page - but it was NOT an archived page but the current live page!

The above observations fit in with what I have found, over the years, when looking for local history sites which had suddenly gone offline for some reason.In some case this was because the author/owner had died and the ISP subscription had lapsed, and in other cases the local history pages had been part of a bigger site, perhaps run by a parish council or a church, where the site had been "brought up to date" and the pages of historic information lost when the new version of the site was introduced.

In fact the Wayback machine can be asked to archive single web pages and a test this afternoon recorded one page where only the text and some of the pictures had been archived in March. The page is now on the Wayback machine - with all the pictures - but none of the links to supporting pages have been followed up and archived. This seems a good way to archive single stand alone pages - each of which will have a unique permanent URL. However over the next few weeks I will do some tests, followed up by retrieval requests a few days later, on some of the missing parts of my web site and report on the results.

In addition Wayback also offer a subscription archiving service which will regularly scan and update the archives of selected web sites. This seems to be aimed at major libraries and Universities - for instance to archive web sites linked to particular research projects - and would not appear to be suitable for use by large numbers of individual local historians. However it might be possible to co-ordinate web activities across the county - with one organisation (perhaps HALS?) being responsible for selecting what local sites to archive.

Any ideas anyone???? - Comments would be helpful.

3 comments:

  1. Hi Chris,

    You could also suggest your site be included in the British Library UK Web Archive project. Just go to the 'nominate a site' option, link below.

    https://www.webarchive.org.uk/ukwa/info/nominate

    I was recently worried about what would happen to all the history material on my site, the Brookmans Park Newsletter (www.brookmans.com) when I fall off my perch.

    So I set up a new site, the North Mymms History Project http://www.northmymmshistory.uk, on the Google Blogger platform, and moved all the material over.

    The reason I did this is the Google doesn't delete anything, so my thinking was that it would all stay live long after I become history.

    It's taken about two months to transfer the material - reworking and enhancing it as I did (with the help of a small group of local historians), but it's now done and we are continuing to add to it.

    I see your news updates site is on blogger (blogspot), but I am not sure what your main site is on. Perhaps downloading all your data and uploading it to Blogger might act as another backup of all the valuable work you have done over the years.

    David

    ReplyDelete
    Replies
    1. Thanks for your comment - As a result I have just nominated www.hertfordshire-genealogy.co.uk and will see what happens. However on reading through the small print on the technical side there may be a problem. While the site contains no clever html, javascript, databases, etc, it is very large and the web crawler may find it too big - because it takes so long to scan that it assumes it has got into a loop. I have therefore sent in a technical query as half an archive is not good enough ...

      Will report on what happens.

      Delete
  2. The "Genealogy in Hertfordshire" web site is to be archived as part of the British Library scheme to archive important sites and should be available for public use should anything happen to the original - I will post more details as a seperate blog post later.

    ReplyDelete

This is the newsletter for the Genealogy in Hertfordshire Web site. Comments on this blog are moderated and may be transferred to the web site where appropriate. If you have a local or family history query you want answered you must use "Ask Chris" - See box in right hand column. Anonymous comments cannot be answered.