Research the Past Web using Web archives

Daniel Gomes (daniel.gomes@fccn.pt), Daniel Bicho (daniel.bicho@fccn.pt) and Fernando Melo, Arquivo.pt

The Web is the largest source of public information ever built. However, 80% of the web pages disappear or are changed to a different content within 1 year. Web archives provide services and tools that preserve and enable access to information published online since 1996. The main objectives of this tutorial provided by the Arquivo.pt team are to:

Target audience

  • Researchers interested in research of temporal web data and digital preservation.
  • Computer Science students and professionals.
  • Information science professionals (e.g. digital librarians).
  • Website authors and managers.

Requirements

A laptop with Internet connection and a Web browser, preferably Google Chrome.

Short biographies

Daniel Gomes

Daniel Gomes started Arquivo.pt (the Portuguese web-archive) and currently leads this public service. He obtained his Ph.D in Computer Science in 2007 with a thesis focused on the design of large-scale systems for the processing of web data. He is a researcher in web archiving and web-based information systems since 2001.

Daniel Bicho

Daniel Bicho has 8 years of experience in computer engineering and holds a degree in Telecommunications and Computers engineering. Currently is finishing his Master thesis in the field of Computer Vision, focusing at image classification using Deep Neural Network techniques. He is responsible for operating the crawling system of Arquivo.pt.

Fernando Melo

Fernando Melo is a software developer and researcher at Arquivo.pt. He obtained his Master degree in Computer Science with a thesis that addressed how to automatically perform the georeferencing of textual documents. He is currently applying and developing Big Data techniques to enable large-scale processing of web-archived content. Fernando Melo participated on the development of the Application Programming Interfaces provided by Arquivo.pt.

Made with in Porto @ FEUP InfoLab / INESC TEC