Research the Past Web using Web archives

Daniel Gomes (daniel.gomes@fccn.pt), Daniel Bicho (daniel.bicho@fccn.pt) and Fernando Melo, Arquivo.pt

The Web is the largest source of public information ever built. However, 80% of the web pages disappear or are changed to a different content within 1 year. Web archives provide services and tools that preserve and enable access to information published online since 1996. The main objectives of this tutorial provided by the Arquivo.pt team are to:

Motivate to the pertinence of web archiving, present use cases and share recommendations to create preservable websites for future access;
Introduce tools to create and explore web archives such as: oldweb.today, Memento Time Travel Portal, Arquivo.pt, robustify.js, ArchiveReady.com, webrecorder.io or brozzler.
Present methods and technologies to develop web applications that automatically access and process information preserved in web archives, for instance using the Wayback Machine, Memento Time Travel protocol or Arquivo.pt API.

Target audience

Researchers interested in research of temporal web data and digital preservation.
Computer Science students and professionals.
Information science professionals (e.g. digital librarians).
Website authors and managers.

Requirements

A laptop with Internet connection and a Web browser, preferably Google Chrome.

Short biographies

Daniel Gomes

Daniel Gomes started Arquivo.pt (the Portuguese web-archive) and currently leads this public service. He obtained his Ph.D in Computer Science in 2007 with a thesis focused on the design of large-scale systems for the processing of web data. He is a researcher in web archiving and web-based information systems since 2001.

Daniel Bicho

Daniel Bicho has 8 years of experience in computer engineering and holds a degree in Telecommunications and Computers engineering. Currently is finishing his Master thesis in the field of Computer Vision, focusing at image classification using Deep Neural Network techniques. He is responsible for operating the crawling system of Arquivo.pt.

Fernando Melo

Fernando Melo is a software developer and researcher at Arquivo.pt. He obtained his Master degree in Computer Science with a thesis that addressed how to automatically perform the georeferencing of textual documents. He is currently applying and developing Big Data techniques to enable large-scale processing of web-archived content. Fernando Melo participated on the development of the Application Programming Interfaces provided by Arquivo.pt.

Research the Past Web using Web archives

Target audience

Requirements

Short biographies

Medha Devare

Leveraging Standards to Turn Data to Capabilities in Agriculture

Natalia Manola

Open Science in a Connected Society

Herbert Van de Sompel
Martin Klein

A Web-Centric Pipeline for Archiving Scholarly Artifacts