Web archiving

Web archiving is the process of creating reliable copies of web-based content for long term preservation.

Why do we need to archive websites?

Paper records may seem delicate, but in average conditions, they can survive for decades stashed away in a box or filing cabinet. Digital content is often at a higher risk of loss due to fragile physical carriers and rapid technological obsolescence, and can become inaccessible quickly without appropriate action. This is particularly true for websites as content is changed, updated, and removed frequently – the average lifespan of a website is around just two and a half years.

Heritage Collections manage records that document over 400 years in the life of the University, including records of institutions, organisations, and departments that have merged with the University or are no longer operating. Like many other institutions, the University’s communications are increasingly disseminated via online channels. As such, web archiving is an integral component of the University’s vision for a library and university collection that is at the heart of education, research and engagement at Edinburgh.

How does the University archive its websites?

The University’s web content is predominantly captured through the UK Web Archive (UKWA). The UKWA collects under legal deposit legislation, which entitles the British Library and other legal deposit libraries to make a copy of all UK print and digital publications, including websites. 

Some web content cannot be captured through the UKWA due to legal or technical restrictions. In these cases, we try to capture sites using alternative approaches such as open-source tools for manual capture like the WebRecorder tool suite. 

What web content does the University archive?

Our web archiving activity is guided by the University Archives Collections Development Policy (Appendix G of the Collections Management Policy 2020-2030). Any content that is on the main ed.ac.uk domain is considered in scope for collection, and we make an effort to identify and capture any sites that are owned, managed, or hosted by the University on other domains. 

Where appropriate, we also use web archiving to preserve non-University owned web content that is acquired by Heritage Collections through donations, purchases, or commissions.

What web content does the University NOT archive?

Our web archiving activity focuses on content that has been made publicly available using the web. This means we don’t automatically capture content that is private or restricted, such as SharePoint pages or private wikis. Site owners with this type of content should take steps to ensure any university records are transferred to the University Archives as part of their normal records management activity – see the University Archives pages for more information on how to do this.

How can web archives be accessed?

Most of the University’s archived web content can be accessed through the UKWA’s public interface. By default, access to content captured by the UKWA is restricted to computer terminals onsite in Legal Deposit Libraries, but the University has granted a license which enablesaccess to captures from any computer terminal. This license automatically applies to all content on the main web domain (https://www.ed.ac.uk) and associated subdomains (with the exception of content on the blogs.ed.ac.uk domain), and is manually applied to other University-managed sites. 

Sites and pages that have been captured manually are available to view in the Reading Room at the Centre for Research Collections. To find out what captures we hold and arrange to view them, please contact the Research Services team.

Please note:

Following a cyber-attack on the British Library, access to the UK Web Archive service is temporarily unavailable. While captures are not accessible, crawls continue to run and University web pages continue to be captured. For more information please visit the UK Web Archive blog or contact the Web Archivist.

Where can I find more information about the University's web archiving programme?

For further information, please see the FAQs linked below or consult our Web Archiving Strategy document. 

Some commonly-asked questions about web archiving and the University of Edinburgh's web archiving programme.

There are a few simple steps that you can follow during the design process that can make your website more crawler-friendly and improve any copy that is made for preservation.