Parallel Sessions 2.1 to 2.6
Marcel Ras, Hilde van Wijngaarden (National Library of the Netherlands): Digital preservation from niche to core
In 2003 the first digital archiving system, the e-Depot, of the National Library of The Netherlands (KB) became operational. This system was developed together with IBM and at the time of implementation it was the first long term preservation system running globally. Today, seven years later, the system has processed and stored over 15 million digital objects, mainly e-journal articles.
At the time the e-Depot was developed, it was not common knowledge at all that long-term preservation of digital publications was a challenging issue. Although the general public still does not realise the risks, today, most library professionals are aware of the vulnerability of digital publications.
New soft- and hardware technologies supersede each other with ever growing speed, leaving older formats unreadable. Research and development has focussed on how to ensure permanent access to digital objects and digital archiving systems like the e-Depot have been implemented at several libraries and archives.
At KB, seven years after the implementation of the e-Depot system, we have started projects to build a new system with new requirements. Apart from the fact that our contract with IBM runs out in 2012, several changes call for a new system and a new approach. These changes are not specific for KB, but are a general development in libraries' digital collection management and are a consequence of digital library developments. In short, these changes are:
- scale: digital publishing, web archiving and digitisation has lead to enormous growth of digital collections
- requirements for digital collection management: while preservation was first focussed on special parts of our collections, with the growth of digital collections, preservation has become a core requirement of libraries collection management
- progress in digital preservation R&D: new tools have become available that allow us to better process and manage digital collections (e.g. Tools for identification, characterisation, migration and emulation)
- diversity of digital collections: digital publications (including websites) have become container formats with all types of multimedia components embedded. These formats are a challenge for permanent access.
At KB, we have set requirements for a next generation e-Depot that reflect these changes. Our paper will present the set-up of the new system, including new policies for collection management that will be developed.
The development of the next generation e-Depot system is organized as a group of projects. These projects focus on:
- workflow for ingest and quality control
- data modelling and metadata
- migration from 1st to 2nd generation system
The projects are now in full speed and will deliver a new e-Depot infrastructure in 2011. This new system will be a modular system and a combination of different “off-the-shelf” and tailor made system components. The operational experience gained in the past seven years and the knowledge out of the R&D activities are put into the requirements for the new system.
The new e-Depot will process and store all digital collections on different preservation levels, based on a digital collection management plan. For all digital collections, a level will be set at time of selection. This level will determine the way the collection is processed (with all possible checks or with basic processing), the way the collection is described (top-level description like national bibliography entries with manual checks or automatic generation and processing of minimal metadata) and preservation actions to be developed and applied (format migration and/or emulation). This approach is based on the realisation that not all digital collections require the same investment and top-level care for millions of objects comes with enormous, unaffordable, costs.
KB is currently in the process of laying down this preservation level policy. The policy is also one of the basic principles for our current requirements setting and tender procedure for a next-generation LTP system.
Marcel Ras is Head of the e-Depot Department of the national library of The Netherlands (KB). He received his M.A. degree from Nijmegen University in the fields of Ancient History and Archaeology in 1992. After some of years of Archaeological field survey in different countries, he joined the Post-Graduate training on Historical Information processing at Leiden University as Head and teacher of the training school. From 1999 to 2005 he worked as a consultant for the Digital Heritage Association and was involved in many digitization- and standardisation projects in The Netherlands. As of 2005 Marcel works for the national library of the Netherlands, first as project manager Web Archiving, and since 2007 as manager of the e-Depot department. Marcel is still involved in training and teaching at Leiden University in the field of digitization and digital preservation.
Hilde van Wijngaarden [bio + photo missing]
Sara Aubry (National Library, France): Introducing web archives as a new library service: the experience of the National Library of France
Web sites and web pages emerge and disappear from the World Wide Web every day. Like many other heritage institutions, the National Library of France (BnF) has developed a strategy to collect and keep track of born digital material using very large-scale tools such as web crawlers. Today, for legal deposit purposes, BnF has collected more than 15 billion files (web pages, images, animations, video and sound records...), which constitute more than 160 terabytes of data and share storage facilities with other digital resources.
The Web archives of the French national domain stopped being the sole interest and the study object of the BnF web archiving team, and a dedicated and trained group of librarians in charge of acquisition, when this new collection was opened to the public in April 2008, developed as a new service and released as a new application.
Developing the service was first a technical challenge: how to install and adapt open source tools (the Wayback Machine and Nutch) when the IT staff was used to developing applications on its own internally? How to integrate these tools into a secured and internet proof "public computer" along with other digital resources (catalogs, digitized collections, e-journals and databases…) which have been historically piled up one after the other? How to keep this collection safe?
But introducing Web archives was also an organizational and human challenge: how to involve, explain and pass on to colleagues content and collection development policy, ways of working and not working for a contemporary and disruptive media type? In particular, to those who authorize and orientate researchers in the reading rooms and answer their questions at the help desk. Web archives are artefacts, incomplete documents, which are more common to archivists than to librarians.
A new icon on a screen is not enough to reach and retain a public. BnF has created information and communication tools to encourage readers to use this new service. After almost two years, there is still not a very large public (about 80 visits a month), but it exists here and now, and gives us feedback.
Beyond traditional researchers, who may search and browse Web archives content for scientific, academic or even personal purposes, "Web researchers", studying the media itself, have to face quantity, temporality and time consistency issues. They also have to deal with gaps and noise at a scale of which the live Web only gives a small sample. But because the Web has many authors, many Web archives users and usages are still unknown.
This paper will discuss:
- Web archives as a new and challenging collection for the library end user,
- BnF marketing and service strategies to introduce and promote the web archives,
- current usage surveys and counts, and lessons learnt and to share,
- plans for future developments.
Sara Aubry is a digital curator and a computer analyst. She has been working at the National Libray of France (BnF) since 2002. She is part of the project team which introduced web archiving into the Library missions. She also ran the "Access to Web Archives" project.
Sara has a master degree in Languages, Civilizations and Computer Science. She previously taught information sciences at the University of Caen and was between 1998 and 2009 the moderator of biblio-fr, the main french mailing list for librarians and information science professionals, which had about 18,000 subscribers.
Maria Cassella (University of Turin, Italy): Institutional repositories assessment: an internal and external perspective of the value of IRs for the researchers’ communities
Institutional repositories (IRs) are one of the most innovative and creative components of digital libraries. They are a central service for the research communities and the institution they serve. They are a showcase of the scientific output of a research institution. However, due to manifold reasons, institutional repositories often lack institutional leadership commitment and research communities engagement. Except for a very few cases it is difficult to reach a critical mass of content and fund raising may also become a problem for repository administrators in economic crisis time.
Up to date there are no standard performance indicators to assess repositories activity and demonstrate their value for the researcher communities. This article will examine qualitative and quantitative measures that should be gathered by repository administrators in order to design a successful repository.
The idea is to present the repository assessment as a combination of internal (quantitative) and external (qualitative) measures where the first relate to the collections, total full-text items deposited, level of ordinary activity deposit, percentage of faculty participating to the deposit, value added services provided to the researchers of different disciplines by the repository. These measures are often, but not uniquely, generated from OAI harvesting information.
The latter relate to faculty satisfaction of the repository according to the way repository fulfils researchers’ needs, to internal and external level of funding and to policies adopted to support the repository action (institutional mandates or other non-mandatory supporting policies). All these measures are based on qualitative surveys carried out on researchers and institutional leadership.
In conclusion I argue that the intelligent combination of the two perspectives (internal and external) should help repository administrators to advocate the ideal profile of a successful economically sustainable repository.
Maria Cassella is librarian coordinator of seven libraries in Humanities at the University of Turin.
She is author or co-author of manifold papers published in Italian and in English on Digital libraries.
Her current research interests are in the fields of Digital Libraries, Open Access, scholarly communication, statistics and evaluation, mobile applications.
Since 2008 Maria Cassella is a component of the working group of the Wiki OA Italia, the Italian wiki on Open Access http://wiki.openarchives.it/index.php/Pagina_principale
Since 2009 she is member of the IFLA Standing Committee on Statistics and Evaluation. She is in the editorial board of the Italian Journal of Library and Information Science (JLIS) and in the editorial team of two Italian e-newsletters.
All presentations held in conferences and some Maria Cassella’s papers are self-archived in E-Lis http://eprints.rclis.org/.
Giuseppina Vullo (University of Glasgow, UK): A global approach to digital library evaluation
This paper will describe the key research advances on digital library evaluation models.
Digital library evaluation has a vital role to play in building DLs, and in understanding and enhancing their role in society. The paper will cover the theoretical approach, providing an integrated evaluation model which overcomes the fragmentation of quality assessments, and propose some examples of DL evaluation methodologies, undertaking a comparative analysis of them.
Digital library evaluation is a growing interdisciplinary area. Researchers and practitioners have specific viewpoints of what DLs are, and they use different approaches to evaluate them. Each evaluation approach corresponds to a DL model, and there is no common agreement on its definition. Despite that, more and more efforts have been made to evaluate DLs. However, a methodology that encompasses all the approaches does not yet exist. There are two main reasons for this:
- digital libraries are complex entities which need interdisciplinary approaches
- digital libraries are synchronic entities: the speed of evolution of DLs coupled with their lack of historical traces makes a longitudinal analysis difficult if not impossible.
Nevertheless, DLs and DL research have reached a level of maturity such that a global approach to their evaluation is needed. It would encourage exchange of qualitative data and evaluation studies, allowing comparisons and communication between research and professional communities.
This paper will provide the research advances in the field and a theoretical framework for digital evaluation models.
Giuseppina Vullo is DL.org Co-Principal Investigator in the EU-funded Digital Library Interoperability, Best Practices & Modelling Foundations (DL.org), where she coordinates the Working Group on Quality for Digital Libraries, and researcher at HATII AT THE University of Glasgow. She has been a DPEX fellow at HATII in 2008, where she worked on digital collections assessment, applying DRAMBORA and InterPARES 3. Her research interests range from quality to contextualization in digital libraries and enhancement of special collections within digital environments. Giuseppina completed in 2009 her Ph.D in Library Science at the University of Udine, Italy, focusing on digital libraries evaluation. Before joining HATII, Giuseppina has been working as librarian for international organisations in Italy and in Switzerland.
J Max Wilkinson, Adam Farquhar (British Library, UK): British Library Dataset Programme: supporting research in the library of the 21st century
Advances in computational science and its application have reshaped the social landscape and the practice of research. Researchers are increasingly exploiting technology for collaborative, experimental and observational research in all disciplines. Digital data and datasets are the fuel that drives these trends; increasingly datasets are being recognised as a national asset that require preservation and access in much the same way as text based communication. In response, UK research councils, funding bodies, institutions and publishers are mandating data management plans or accepting supplementary data alongside articles. To date, research libraries have been largely absent from the discussion.
The British Library is in a unique position to enhance UK and international research by extending it’s presence from the physical collection to the digital dataset domain. To meet this challenge and be a responsible steward of the scholarly record, the Library has defined a programme of activity to support the data that underlie modern research and promote them as a national asset.
Awareness of the impact of the digital data age on research is growing. The British Library’s Chief Executive, Dame Lynne Brindley DBE, observed that the biggest challenge facing the British Library is presented by “the data deluge and the increasing need to integrate datasets that underlie published research with the more traditional formats and preserve these digital formats into the long term”. This view is supported by three studies conducted by the Library in 2007 and 2008.
In considering the expectations of the researcher in a digital environment, i.e. to locate and access data regardless of physical location, the scope of our activity is to facilitate and streamline appropriate access to datasets by addressing the scholarly record as a whole and promoting a joined-up infrastructure between other libraries, data providers and those that consume data.
The Library already contains many datasets. Artefacts from ‘on-demand’ or mass digitisation programmes are datasets based on specific requests or operational needs. Externally, national data centres provide for the preservation and persistence of data generated by UK funded research and government departments.
To begin we are designing a mixed model of activity where specific, service level projects with clear goals will provide support for collaborative work that aims to reveal and clarify issues related to datasets. For example, there is a clear community need for stable, scalable and agreed data citation mechanisms. To address this, the British Library is a founding member of DataCite, the International Data Citation Initiative. In addition, we are actively partnering with a number of external stakeholders aimed at investigating value-added services in dataset attribution and impact, further supporting the persistence of datasets of international importance (e.g. the datasets of environmental observation and measurement that will underpin the IPCC report in 2013).
The British Library datasets programme will guide activities across the Library and stakeholder communities to address the challenge of integrating datasets into its researcher services and ensuring the integrity of the scholarly record remain intact, useable and vital for future generations.
Dr Max Wilkinson is the Programme Manager for the British Library’s dataset programme. Prior to joining the Library, he was the scientific analyst with the National Cancer Research Institute’s Informatics Initiative, where he delivered a comprehensive training review of informatics in the UK, designed and managed a project aimed at incorporating semantic technology and change management in the cancer biomedical domain. For the last 4 years he focused on bridging the divide between individuals involved in building informatics technology solutions with those that ‘use’ such technology in the research and clinical environments. Max received his PhD from University College London in 2004, for his thesis on molecular nephopathologies. For the previous twenty years he has been a research scientist in diverse disciplines, from cyanogenic bacteria through viruses, immunity and transplantation. Throughout his career he has been concerned with application of technology in understanding the chemistry of biology. At the British Library he is building a programme of work that will define the Library’s roles in the dataset domain.
Dr. Adam Farquhar is Head of Digital Library Technology at the BL, where he was a lead architect on the BL's Digital Library System, co-founded the Digital Preservation Team, and initiated the BL’s Dataset Programme. He is Co-ordinator and Scientific Director of the EU co-funded Planets project and founder of the Open Planets Foundation. He is President of DataCite , the global data citation initiative and serves on the board of the Digital Preservation Coalition. Prior to joining the Library, Adam was the principle knowledge management architect for Schlumberger (1998-2003) and research scientist at the Stanford Knowledge Systems Laboratory (1993-1998). He completed his PhD in Computer Sciences at the University of Texas at Austin (1993). Over the past twenty years, his work has focused on improving the ways in which people can represent, find, share, use, exploit, and preserve digitally encoded knowledge.
Raymond Bérard (Bibliographic Agency for Higher Education, France): Free library data?
The issue of library records has long been restricted to the circles of librarians. Nowadays this issue is going far beyond librarians: at the last Berlin 7 conference in Paris, data produced by libraries were placed on the same level as scientific papers themselves, in a leap of the open access movement to library catalogs. Several initiatives have led the way: Open Library, Biblios.net etc… the latest being the CERN Library announcing that it publishes its book catalog as Open Data, allowing any library to freely download its catalog, the records being provided under the Public Domain Data License.
It is a hot topic: OCLC is now working on the draft of a new record policy after withdrawing the new controversial policy it had published in 2008. OCLC has set up a working group to produce a draft policy that will be submitted to its members councils this Spring.
As library materials are catalogued by public organizations and librarians are active promoters of the principles of Open Access, one would expect library data to be freely available to all. Yet this is not the case. Why then do so few libraries make their data free? One reason is that many libraries download records from data providers such as OCLC, national libraries or other organizations that impose their own, often diverging policies, some being very restrictive. What are the restrictions? What are the interests (commercial and strategic) at stake?
How to make the date freely available to public, not for profit organizations without putting at risk the collective networks built by libraries over the years?
This paper will present a panorama of the current situation, the actors and interests involved. It will address the legal aspects, the obstacles and show how it is possible to make data produced by libraries freely available to other knowledge organizations while retaining and developing the collective organizations and services built by library networks over the years. The aim of the “free the data movement” is to share and reuse bibliographic data in a new ecosystem where all the actors are involved, both users and providers, not only librarians.
Raymond Bérard is currently Director of the Bibliographic Agency for Higher Education (ABES), which is in charge of the union catalog for French academic libraries. ABES is also involved in e-theses and develops new services to meet the expectations and information research practices of its users. He previously held the positions of Director of the French academic repository library, Marne-La-Vallée (2004-2005) and Dean of studies at the French National library school (ENSSIB, 2001-2004). Among his professional activities, Raymond Bérard chairs the IFLA’s Management and Marketing Section and the Information section of the French Standards authority (AFNOR).