Thursday, June 08, 2017

Videotapes Are Becoming Unwatchable As Archivists Work To Save Them

Videotapes Are Becoming Unwatchable As Archivists Work To Save Them. NPR: All Things Considered. Scott Greenstone. June 3, 2017.
     Research suggests that magnetic tapes, like video tapes, aren't going to live beyond 15 to 20 years, sometimes called the "magnetic media crisis." Magnetic information on tapes will slowly fade, and when it diminishes too much, the information on the tape will be lost. There are groups trying to migrate the tapes before the content is unrecoverable. Part of this process is to identify what is on the tapes and which tapes need to be preserved long term.


Friday, June 02, 2017

Ex Libris joins the Open Preservation Foundation

Ex Libris joins the Open Preservation Foundation. Becky McGuinness. Press Release. Open Preservation Foundation. June 1, 2017.
     The Open Preservation Foundation announced that Ex Libris is its newest charter member. "Ex Libris’ Rosetta is an end-to-end digital asset management and preservation solution for libraries, archives, museums and other institutions, enabling institutions to safely and securely collect, manage, publish, deliver, and ensure longevity for digital information of many different types. With Rosetta’s unique content preservation planning module and its Format Library knowledge base, shared by the entire Rosetta community, institutions can identify format risks, evaluate mitigation alternatives, and select the best preservation actions."  "Rosetta reflects Ex Libris involvement in industry standards and commitment to extensibility and open architecture."  "Rosetta itself is based on an open architecture that allows customers to easily use Rosetta with external tools and plugins such as JHOVE and other open-source software. By supporting OPF, we can further improve open-source tools for the benefit of all."
 

Saturday, May 13, 2017

Design Requirements for Better Open Source Tools

OSS4Pres 2.0: Design Requirements for Better Open Source Tools. Heidi Elaine Kelly. bloggERS! April 25, 2017.
     Free and Open Source Software need to "integrate easily with digital preservation institutional systems and processes.” The FOSS Development Requirements Group created a design guide for to ensure easier adoption of open-source tools and their integration with other software and tools.

Minimum Necessary Requirements for FOSS Digital Preservation Tool Development. The premise is that "digital preservation is an operating system-agnostic field."

Necessities
  • Provide publicly accessible documentation and an issue tracker
  • Have a documented process so people can contribute to development, report bugs, and suggest new documentation
  • Every tool should do the smallest possible task really well; if you are developing an end-to-end system, develop it in a modular way in keeping with this principle
  • Follow established standards and practices for development and use of the tool
  • Keep documentation up-to-date and versioned
  • Follow test-driven development philosophy
  • Don’t develop a tool without use cases, and stakeholders willing to validate those use cases
  • Use an open and permissive software license to allow for integrations and broader use
Recommendations
  • Have a mailing list or other means for community interaction
  • Establish community guidelines
  • Provide a well-documented mechanism for integration with other tools/systems
  • Provide functionality of tool as a library, separate UI from the actual functions
  • Package tool in an easy-to-use way, that supports any dependencies
  • Provide examples of functionality for potential users
  • Consider the long-term sustainability of the tool
  • Consider a way for internationalization of the tool  

Tuesday, May 09, 2017

Using Open-Source Tools to Fulfill Digital Preservation Requirements

OSS4EVA: Using Open-Source Tools to Fulfill Digital Preservation Requirements. Marty Gengenbach, et al. Code4Lib. 2016-10-25.
     Open-source software has played an increasingly prominent role in digital preservation, such as LOCKSS, DSpace, and DROID. The number and variety of such tools has increased, there was a growing need among preservationists to assess how and when to adopt particular tools so that they could better support their institutions’ specific requirements and workflows.  Open-source projects allows the user community to contribute by developing and documenting tools.

There are some challenges with open source programming.
  • Perceptions of instability:  One challenge is the perception that these tools are "inherently unstable and therefore present a risk". 
  • Resources and funding: Administrators often are reluctant to commit resources to an open source project. Funding problems can threaten the long-term sustainability of open source tools.
  • System updates: Open source tools require regular patches, updates, and upkeep. Without this, the tool would be outdated, and open to security holes. "The choice to maintain an unsupported version of a particular open-source tool simply because it meets (or has been customized to meet) an organization’s needs is problematic. For what an institution may stand to gain from this tool in terms of functionality and local integration, it may stand to lose in terms of the stability of a mainstream code release, the risk to information security, and the likelihood that the tool in question will become increasingly less functional and reliable as it ages".
  • Integration. Integrating open-source tools into institutional workflows can be a challenge, taking into account software dependencies, system requirements, and local configuration to put the tools into a production environment. This can require a considerable time and resources. 
One of the possible benefits is that institutions can customize open source tools for use within a specific context, but that comes with its own hurdles, such as reducing the ability to draw on the user community.  The digital preservation open source landscape has evolved from a scattered set of standalone tools designed to complex software environments. "Nevertheless, these tools still are not watertight." There are real concerns about open-source tools that can pose serious risks to collections.

Thursday, May 04, 2017

Personal Digital Archiving Guide Part 1: Preservation Planning

Personal Digital Archiving Guide Part 1: Preservation Planning. Scott Witmer. Bits and Pieces. April 26, 2017.
      Digital materials require active intervention to be usable over time, since technology is constantly changing. "The more we use these files or transfer them from one technology to another, the greater the potential for data corruption. Digital files also run the risk of deletion due to accident or disaster. Having a preservation plan can mitigate the risks of obsolescence, erasure, or other forms of data loss." This post lists some simple suggestions for organizing digital files for long-term preservation, although everyone will have their own methods. Some digital preservation is better than none.

Preservation Steps for Personal Digital Collections:
  • Identify digital materials to save. Make a list or inventory
  • Gather the files you want to save into one place
  • Select what you really want to safe; define the scope of your digital collection
  • Organize your digital files and add descriptive information to the file name, or other important information
  • Give your files short, meaningful names, preferable when creating the files
  • Use a meaningful directory structure to organize the files 
  • Back-up the files and have multiple copies:  
    • 3 copies 
    • 2 of the copies on 2 different types of storage media 
    • 1 copy in a different location  
Digital preservation is an ongoing process, so files and storage technology should be checked periodically.

Wednesday, April 26, 2017

Sustaining The Value: The British Library Digital Preservation Strategy 2017-2020

Sustaining The Value: The British Library Digital Preservation Strategy 2017-2020. British Library. January 2017.
     The strategy document is intended to guide the Library’s digital preservation activities for the next few years. It identifies strategic priorities as well as the the roles and responsibilities of those who will deliver the strategy.  The digital preservation challenges include technological obsolescence, media integrity, bit rot, digital rights management, metadata and others. Also important are
  • Proactive Lifecycle management
  • Integrity & validation
  • Fragility of storage media
"Digital Preservation is the combination of actions and interventions required throughout the digital content lifecycle to ensure continued and reliable access to authentic digital materials." Digital preservation is not just a technical challenge. "It necessitates an ongoing and typically recursive series of actions and interventions throughout the lifecycle to ensure continued & reliable access to authentic digital objects,for as long as they are deemed to be of value."  

Their vision is to make sure that "end-to-end workflows are in place that deliver and preserve our digital collections in a trusted long term digital repository so that they may be accessed by future users.” Other notes:
  • Control and consistency throughout the lifecycle is therefore an essential aspect of large scale, sustainable preservation.   
  • Priorities include: 
    • Changes to the existing technical repository infrastructure 
    • Ingest digital collections with metadata for long term preservation
    • Management and reporting will be documented and provide assurance and evidence of preservation 
    • Deliver content to users from the long term repository in a timely and reliable manner
  • Also important is to embed the skills and resources needed to sustain this approach into the future.

Related posts:

Monday, April 24, 2017

Three Keys to Digital Preservation: Management, Technology, and Content.

Three Keys to Digital Preservation: Management, Technology, and Content. Edward Corrado, Heather Moulaison Sandy. ACRL Webinar.  Apr 12, 2017.
     This is a webinar by Edward Corrado and Heather Moulaison Sandy that examines the basics of digital preservation, starting with what it is and what it is not. They then examine three fundamental and interrelated concerns in digital preservation: management, technology, and the content. The webinar also looks at:
  • The life cycle of digital objects
  • Things to know before starting digital preservation projects
  • Preservation techniques designed to endure changes in technology, as well as models and technical resources currently available
Some notes from the webinar:
  • Digital preservation is the active management of digital content over time to ensure ongoing access.
  • Digital objects are mediated by technology
  • It is not possible to leave the digital object alone and expect it to survive
  • By definition, digital preservation is a long-term activity. It requires policies to support this
  • A preservation plan must balance priorities over time
  • The greatest danger to digital materials is that we forget the meaning of them
  • Preservation metadata supports the long-term access and use of content
  • It is important to get content creators on board with preserving and describing the content, since they know the field and the content, and they will potentially be the content users
  • Important steps to take now;
    • Identify and organize content
    • Manage multiple copies of the content
    • Do a risk assessment of your digital operations
    • Document your processes and decisions
Digital preservation is an opportunity that can be both challenging and exciting.

Tuesday, April 18, 2017

Understanding PREMIS

Understanding PREMIS. Priscilla Caplan. Library of Congress Network Development and MARC Standards Office. 2017.
     PREMIS stands for "PREservation Metadata: Implementation Strategies". This document is a relatively brief overview of the PREMIS preservation metadata standard. It can also serve as an "gentle introduction" to the much larger document PREMIS Data Dictionary for Preservation Metadata. PREMIS defines preservation metadata as "the information a repository uses to support the digital preservation process."  Preservation metadata also supports activities "intended to ensure the long-term usability of a digital resource."

The Data Dictionary defines a core set of metadata elements needed in order to perform preservation functions, so that digital objects can be read from the digital media, and can be displayed or played. It includes a definition of the element; a reason why it is part of the metadata; also examples and notes about how the value might be obtained and used.  The elements address information needed to manage files properly, and to document any changes made. PREMIS only defines the metadata elements commonly needed to perform preservation functions on the materials to be preserved. The focus is on the repository and its management, not on the content authors or the associated staff, so it can be a guide or checklist for those developing or managing a repository or software applications. Some information needed is:
  • Provenance: The record of the chain of custody and change history of a digital object. 
  • Significant Properties: Characteristics of an object that should be maintained through preservation actions. 
  • Rights: knowing what you can do with an object while trying to preserve it.
The Data Model defines several kinds of Entities:
  • Objects (including Intellectual Entities)
  • Agents
  • Events
  • Rights
PREMIS provides an XML schema that "corresponds directly to the Data Dictionary to provide a straightforward description of Objects, Events, Agents and Rights."

Monday, April 17, 2017

Rosetta Knowledge Center

Rosetta Knowledge Center. Ex Libris. April 17, 2017.
     One of the things that I like about Rosetta, is the Ex Libris commitment to an open system. While the software may be proprietary, the essential content is open. The permanent objects and metadata are stored openly, so that they can be accessed or managed outside of the Rosetta software.

Another area that Ex Libris has opened is their Knowledge Center. This is very helpful in training new employees, learning new things about the software, or refreshing my memory. The open website includes:
  • Product Documentation
  • Training: Learn new skills with tutorials, recorded training and other materials
  • Release Notes about the features and capabilities of each product version
  • Implementation Guides that explain the methodology and requirements
  • Knowledge Articles providing answers to help answer questions

Saturday, April 15, 2017

ETD+ Toolkit

ETD+ Toolkit. Dr. Katherine Skinner, et al. Educopia Institute. April 10, 2017.
     Very helpful website for dealing with ETDs. The Toolkit is an open set of six modules to help students create, store, and maintain their research outputs. It was designed to:
  • Help administrators understand the digital research outputs students are creating
  • Help administrators assess what to collect and care for as part of the institutional memory
  • Help students make sure that research outputs are in durable formats and on durable devices;
  • Help students make informed decisions about file formats, documentation, and rights.
The Modules, which include "Learning Objectives, a one-page Handout, a Guidance Brief, a Slideshow with full presenter notes, and an evaluation Survey", are:
  1. Copyright: How can students gain appropriate permissions and how can students signal copyright for their own works?
  2. Data Organization: How can students structure, describe, store, and deposit data and other research files for reuse and/or future access?
  3. File Formats: How will the formats students choose make future access to their research easier or more difficult?
  4. Metadata: How can students store information describing their files to make sure they can tell what they are in the future?
  5. Storage: How can students make well informed choices about where to store their research materials?
  6. Version Control: What mechanisms can students use to make it easier to see the history of a file with multiple versions?
"In a 2014 survey of nearly 800 students across nine universities, students reported that non-PDF files - including research data, video, digital art, and software code - are either as important or more important than the Electronic Thesis and Dissertation (ETD) PDF as research outputs and evidence. Fully 80% of these students are producing non-PDF research outputs, most commonly tabular data (43%), digital images (38%), software code (29%), and digital text (28%)."
.
The ETD+ Toolkit provides introductory training for data curation and digital longevity techniques. It helps students identify and offset risks and threats to their digital research.

Tuesday, April 11, 2017

It’s not just a word

It’s not just a word. Helen Hockx. Things I cannot say in 140 characters.  April 7, 2017.
     Post that talks about her new job, to coordinate and develop a campus-wide strategy, and to oversee its implementation. Digital assets are managed but it now provides the opportunity to revisit the topic and address the gaps.  "A key finding is the strong focus on “now” – archiving and preservation are routinely overlooked. As a result, some digital assets have been lost and some are at risk."  A recommendation, considering "the 3 pillars of policy, process and technology" is to add “digital resources” to the university's goals where superb stewardship is required. Adding the word “digital” or calling out “digital resources” specifically, may not seem needed by some, but it emphasizes the need to "do a much better job with digital assets, if we applied the same rigor and coordinated approach." We still have a ways to go with digital archiving and preservation.

"So it is not just a word. Digital assets are a new class of resources which requires active care and management over time.  Adding it to the strategic mix is a recognition of their value, and of digital stewardship as a strategic priority. No. it is not just a word, it will have to come with commitment, ownership and resources." Some day we can remove the word “digital” from our strategic plan, "when preservation of digital assets is embedded in the organisational culture and operations, when there is no need to even mention it."

Monday, April 10, 2017

Encoding and Wrapper Decisions and Implementation for Video Preservation Master Files

Encoding and Wrapper Decisions and Implementation for Video Preservation  Master Files. Mike Casey. Indiana University. March 27, 2017.
     "There is no consensus in the media preservation community on best practice for encoding and wrapping video preservation master files." Institutions preserving video files long term generally choose from three options:
  • 10-bit, uncompressed, v210 codec, usually with a QuickTime wrapper
  • JPEG 2000, mathematically lossless profile, usually with an MXF wrapper
  • FFV1, a mathematically lossless format, with an AVI or Matroska wrapper
The few institutions digitizing and preserving video for the  long-term are roughly evenly divided between the three options above. This report examines in detail a set of choices and an implementation that has worked well for their institution. Originally they chose the first option, but with recent advances of FFV1, they reopened this decision and initiated a research and review process:
  • Exit strategy research and testing
  • Capture research (use FFmpeg within their system to generate FFV1 files).
  • Comparison of issues
  • Consultation with an outside expert
Results:  Research into exit strategies, they were able to move FFV1 files to a lossless codec with no loss of data. They decided to capture using FFmpeg, which requires developing a simple capture tool, and developed specifications for a minimal capture interface with FFmpeg for encoding and wrapping the video data.

Technical:  identified a number of key advantages to FFV1, including:
  • roughly 65% less data than a comparable file using the v210 codec
  • open source, non-proprietary, and hardware independent
  • largely designed for the requirements of digital preservation
  • employs CRCs for each frame allowing any corruption to be associated with a
  • much smaller digital area than the entire file
FFV1 appears to be "trending upwards among developers and cultural heritage organizations engaged in preservation work". They also chose the Matroska wrapper, which is an audiovisual container or wrapper format in use since 2002, and which is a more flexible wrapper option.

As more and more archives undertake video digitization" they will not accept older and limited formats" (AVI or MOV), but they will be looking for standards-based, open source options developed specifically for archival preservation. "Both FFV1 and Matroska are open source and are more aligned with preservation needs than some of the other choices and we believe they will see rapidly increasing adoption and further development."

Implementation: They developed a quality control program to validate that the output meets their specification for long-term preservation and checks the FFV1/Matroska preservation master files. These files are viewed using the VLC media player, a free open source cross-platform multimedia player that supports FFV1 and Matroska

Currently, they have created over 38,000 video files using FFV1 and Matroska. "We have chosen two file formats that are open source, developed in part with reservation in mind, and on the road to standardization with tools in active development. We have aligned ourselves with the large and active FFmpeg community rather than a private company. While the future is ultimately unknowable, we believe that this positions us well for long-term preservation of video-based content."


Saturday, April 08, 2017

New Home and Features for Sustainability of Digital Formats Site

New Home and Features for Sustainability of Digital Formats Site.  Kate Murray, Jaime Mears. The Signal. April 6, 2017.
     The Library of Congress web site, Sustainability of Digital Formats, contains "the technical aspects of digital formats with a focus towards strategic planning regarding formats for digital content, especially collection policies." The formats are divided into the type of object, which includes:
  • still image, sound, textual, moving image, web archive, datasets, geospatial and generic formats
The website shows the relationships between formats, including the sustainability factors and the quality and functionality for each content category.
  • Disclosure
  • Adoption
  • Transparency
  • Self-documentation
  • External dependencies
  • Impact of patents
  • Technical protection mechanisms
The new website is at loc.gov/preservation/digital/formats and it now includes
  • The PRONOM ID and the Wikidata Title ID, both which help to document the formats, and 
  • The Library of Congress Recommended Formats Statement
The digital formats site continues to evolve to meet the Library’s and the digital preservation community’s changing needs.

Friday, April 07, 2017

How a Browser Extension Could Shake Up Academic Publishing

How a Browser Extension Could Shake Up Academic Publishing. Lindsay McKenzie. The Chronicle of Higher Education. April 06, 2017
     There are several open-access  initiatives. One initiative, called Unpaywall, is a just a browser extension. Unpaywall is an open-source, nonprofit organization "dedicated to improving access to scholarly research". It has created a browser extension to hopefully do one thing really well: instantly deliver legal, open-access, full text as you browse. "When an Unpaywall user lands on the page of a research article, the software scours thousands of institutional repositories, preprint servers, and websites like PubMed Central to see if an open-access copy of the article is available. If it is, users can click a small green tab on the side of the screen to view a PDF." A legally uploaded open-access copy is delivered to users more than half the time.

"It’s the scientists who wrote the articles, it’s the scientists who uploaded them — we’re just doing that very small amount of work to connect what the scientists have done to the readers who need to read the science." Open-access papers have the information but don’t always look like the carefully formatted articles in academic journals. Some users might not feel comfortable citing preprints or open-access versions obtained through Unpaywall, "without the trappings and formatting of traditional paywalled publishing," even if the copy is credible.

Friday, March 31, 2017

Procuring Digital Preservation: A Briefing

Procuring Digital Preservation: A Briefing.  Digital Preservation Coalition. 21 March 2017.
     Selecting and deploying solutions is especially challenging where the processes are new, or where the available resources are stretched, moving from project to ‘business as usual’ can be hard. This may be the case with digital preservation, but new digital preservation tools, services, and suppliers are emerging rapidly. This requires digital preservation staff make confident choices between different products. The increasing number and type of choices can lead to‘information overload,’ and delay the already complicated process. Even organisations that "properly understand their digital preservation needs can be frustrated in solving them, while solution providers have to meet impractical and at times unfeasible expectations."

The Digital Preservation Coalition hosted a briefing day to clarify requirements help find solutions. The presentations:
  • examine requirements from the perspective of the developer and the collection owner
  • discuss procedures for acquiring a preservation solution
  • discuss case studies and good practices for documenting requirements
  • examine current proprietary and open source solutions for digital preservation
  • Allow vendors to explain their own requirements 

Slides from several sessions are available:

Thursday, March 30, 2017

ACRL Closes with Carla Hayden

ACRL Closes with Carla Hayden. Amy Carlton. American Libraries. March 27, 2017.
     Some quotes from the article about libraries, collections, and information:
  • “When we seek information, we examine the privilege of the voices and sources of our information, and we learn to identify whose voices are present and whose voices are missing and how that impacts and influences our understanding of that information.” Margaret Brown-Salazar
  • "Hayden said her goal is to make the Library of Congress’s  (LC) priceless collections available to everybody—for LC to live up to its nickname of America’s Library. Obama told her that he went to an exhibit there and saw Lincoln’s reading copy of the Gettysburg Address and the contents of his pockets from the night he was assassinated, but he was pretty sure this access was because of his being president. He told her he wanted someone for the job who could make sure a kid in Baltimore, a person at public library, a student at a community college, and anyone would be able to see these treasures. “And that’s when I said yes,” she said."
  • “Our materials are nothing without the people and staff. That’s what makes it come alive”
  • “Librarians are having a moment! Trustworthiness is our strength. We should revel in it and be confident in it. If we’re having a moment, let’s seize the moment!”

Wednesday, March 29, 2017

Archives Unlocked vision launched at the Southbank Centre

Archives Unlocked vision launched at the Southbank Centre. Press release. The National Archives. 29 March 2017.
     The National Archives (UK) has launched a vision and action plan to help archives secure their future through digital transformation, investing in new workforce skills, and encouraging innovation. This vision and action plan offers a future where "businesses, creative industries, arts organisations, academia, and communities can fully exploit a more resilient archives sector, with the UK leading the world in digital transformation."  It is built on themes of Trust, Enrichment and Openness, that highlight "the importance of archives in holding authority to account through scrutiny, in driving innovation and creativity for businesses and across society, and in cultivating an open approach to knowledge accessible to all."

The rich, national collection of archives "are the nation’s collective memory." The updated vision is needed to sustain the Archives for the long term. "The Archives Unlocked action plan embodies this. It sets out what is required to release the power of the archives."

"Working with partners, stakeholders, investors and individuals, we will have greater potential and influence to accomplish what we need to do. The UK will be home to world-leading archives: both digital and physical."


Tuesday, March 28, 2017

Thumbs.db – what are they for and why should I care?

Thumbs.db – what are they for and why should I care? Jenny Mitcham. Digital Archiving at the University of York. 7 March 2017.
     Post about the thumbs.db system files and how to deal with them in an archival situation. Windows uses a file called Thumbs.db to create thumbnail images of any images within a directory, and the thumbs.db files are stored in each directory that contains images. They proliferate quickly. If the Windows Explorer preferences must be set to display hidden files and "Hide protected operating system files" option also needs to be disabled in order to see these and other hidden files.  IT can change account options to stop these thumbnail images from being created.

"Do I really want these in the digital archive? In my mind, what is in the ‘original’ folders within the digital archive should be what OAIS would call the Submission Information Package (SIP). Just those files that were given to us by a donor or depositor. Not files that were created subsequently by my own operating system."

[In our data ingest workflow, we use a utility that creates a csv file of items in directories for processing. The csv file is the ingest template which contains the file names and file metadata. This controls the files that are ingested. Unwanted files are removed from the csv file, which means that during ingest time, they are excluded from being ingested into Rosetta. - Chris]

Monday, March 27, 2017

Saving At-Risk Audiovisual Materials

Saving At-Risk Audiovisual Materials. Jeanne Drewes. American Libraries. March 1, 2017.
     Many audiovisual collections are considered at risk. Large amounts of content could be lost through deterioration of the original media unless it can be transferred to more durable digital formats. As libraries and other institutions rediscover the value of these collections they are taking steps to preserve the sounds and images they contain. Here are some steps to consider when planning your audiovisual preservation project.
  • Know what you have. This is an important first step.
  • Determine your priorities and set goals.
  • Develop an action plan based on your goals. 
"Preserving our own history as a profession by capturing the voices and stories of our colleagues is key toward ensuring our future."

Saturday, March 25, 2017

21st-Century Preservation Basics

21st-Century Preservation Basics. Brian J. Baird. Sidebar.  American Libraries. March 1, 2017.
    Since most scholarly information is now electronic, the basic elements of any digital library preservation policy in the 21st century include:

  • Cooperation. Every library has unique digital collections to preserve, but as the volume continues to grow exponentially, and as older material gets accessed less frequently, libraries may need to cooperate in order to collect and preserve materials long term. 
  • Environmental conditions. Optimal conditions for storing and preserving electronic information must continually be reexamined and improved. 
  • Disaster planning. A library disaster plan should build on an institution’s IT disaster plan to address specific needs.
  • Reformatting.  
  • Repositories. Ideally, repository collections should be well preserved, sharable, and cost-effective and could expand on the consortial efforts already in use.

"Preservation in the 21st century must be proactive, visionary, and cooperative. If it is not, vast amounts of cultural heritage are in danger of vanishing."

Wednesday, March 22, 2017

Collecting Digital Content at the Library of Congress

Collecting Digital Content at the Library of Congress. Joe Puccio, Kate Zwaard. The Signal.
March 21, 2017.
     The Library of Congress has increased its digital collecting capacity in order to acquire as much selected digital content as technically possible, currently 12.5 petabytes, and make that content accessible to users. Expansion of the digital collecting program is "an essential part of the institution’s strategic goal to: Acquire, preserve, and provide access to a universal collection of knowledge and the record of America’s creativity." The newly-adopted strategy is directed at acquisitions and collecting, and is based on a vision in which the "Library’s universal collection will continue to be built by selectively acquiring materials in a wide range of formats" and via collaborative relationships with other entities.

The strategy is based on the assumptions that the amount of available digital content will continue to grow rapidly, that the Library will acquire content selectively, that the same content will be "available both in tangible and digital formats", and that intellectual rights will be respected.  Their plan for digital collecting over the next five years is categorized into six strategic objectives:
  1. Maximize collections of selected digital content submitted for copyright purposes
  2. Expand digital collecting through purchase, exchange and gifts
  3. Focus on purchased and leased electronic resources
  4. Expand use of web archiving to acquire digital content
  5. Acquire openly available content
  6. Collect appropriate datasets and other large units of content

Thursday, March 16, 2017

Creating the disruptive digital archive

Creating the disruptive digital archive. John Sheridan. Digital Preservation Coalition. 1 March 2017.
     The National Archives has been working on a new Digital Strategy. "Digital" is their biggest strategic challenge. Archives worldwide are "grappling with the issues of preserving digital records. We also need to be relevant to our audiences: public, government, academic researchers and the wider archives sector – to provide value to them at a time of change."

Traditional archives are built around the physical nature of the records, but digital records "change all our assumptions around the archive – from selection to preservation and access". Their new Digital Strategy is to move beyond the digital simulation of physical records and to become a ‘disruptive’ digital archive, to be "digital by design".

The National Archives is currently a "fully functioning digital archive with a Digital Records Infrastructure capable of safely, securely and actively preserving very large quantities of data with associated descriptive metadata" which is applying the paper records paradigm of selection, preservation and access to digital records. This is their first generation archive.  The second generation digital archive they are aiming for is to be "digital by instinct and design":

  • rich mixed media content (things like websites), datasets, computer programs, even neural networks, as records not just information in document formats
  • ability to select and preserve all these types of things 
  • digital information has value in aggregate – that it’s not just individually important artefacts that have historical value. 
  • a relentless engineering effort to preserve digital objects that measures and manages the preservation risks
  • transparent in its practices
  • develops approaches for enabling access to the whole collection with regard to legal, ethical and public considerations. 
  • regards the archive as conceptually interconnected data.

"These are ambitious aims and there are many challenges we need to tackle along the way." Collaboration between archives and other institutions is essential in moving forward.


Wednesday, March 15, 2017

Developing a Digital Preservation Infrastructure at Georgetown University Library

Developing a Digital Preservation Infrastructure at Georgetown University Library. Joe Carrano, Mike Ashenfelder. The Signal. March 13, 2017.
     At the library of Georgetown University, half of the library IT department is focused on digital services such as digital publishing, digitization and digital preservation. These IT and library functions overlap and support each other, which creates a need for the librarians, archivists and IT to work together. It provides better communication and makes it easier to get things done. "Often it is invaluable to have people with a depth of knowledge from many different areas working together in the same department. For instance, it’s nice to have people around that really understand computer hardware when you’re trying to transfer data off of obsolete media." 

While digital preservation and IT is centered in one department, the preservation files are in different systems and on different storage mediums throughout the library, but they are in the process of  putting them into APTrust.  Several strategies to improve their digital preservation management are:
  1. Implement preservation infrastructure, including a digital-preservation repository
  2. Develop and document digital-preservation workflows and procedures
  3. Develop a training program and documentation to help build skills for staff
  4. Explore and expand collaborations with both university and external partners to increase the library’s involvement in regional and national digital-preservation strategies.
These goals build upon each other to create a sustainable digital-preservation framework which includes APTrust and the creation of tools to manage and upload the content, particularly creating  custom automated solutions to fit their needs. They are also developing documentation and workflows so any staff member can "upload materials into APTrust without much training".

Librarians and archivists need to be trained and integrated into the process to ensure the sustainability of the project’s outcome and to speed up the ingest rate. "Digital curation and preservation tasks are becoming more and more commonplace and we believe that these skills need to be dispersed throughout our institution rather than performed by only a few people". 

"By the end of this process we hope to have all our preservation copies transferred and the infrastructure in place to keep digital preservation sustainable at Georgetown."

Monday, March 13, 2017

What Makes A Digital Steward: A Competency Profile Based On The National Digital Stewardship Residencies

What Makes A Digital Steward: A Competency Profile Based On The National Digital Stewardship Residencies. Karl-Rainer Blumenthal, et al. Long paper, iPres 2016. (Proceedings p. 112-120 / PDF p. 57-61).
       Digital stewardship is the active and long-term management of digital objects with the intent to preserve them for long term access. Because the field is relatively young, there is not yet a "sufficient scholarship performed to identify a competency profile for digital stewards". A profile details the specific skills, responsibilities, and knowledge areas required and this study attempts to describe a competency profile for digital stewards by using a three-pronged approach:
  1. reviewing literature on the topics of digital stewardship roles, responsibilities, expected practices, and training needs
  2. qualitatively analyzing current and completed project descriptions
  3. quantitatively analyzing the results from a survey conducted that identified competencies need to successfully complete projects
"This study had two main outputs: the results of the document analysis (qualitative), and the results of the survey (quantitative)."  Seven coded categories of competence emerged from the analysis:
  1. Technical skills;
  2. Knowledge of standards and best practices;
  3. Research responsibilities;
  4. Communication skills;
  5. Project management abilities;
  6. Professional output responsibilities; and
  7. Personality requirements.
Based on the responses for Very important and Essential, a competency statement representing this profile would suggest that "effective digital stewards leverage their technical skills, knowledge of standards and best practices, research opportunities, communication skills, and project management abilities to ensure the longterm viability of the digital record." They do this by:
  • developing and enhancing new and existing digital media workflows
  • managing digital assets
  • creating and manipulating asset metadata
  • commit to the successful implementation of these new workflows
  • manage both project resources and people
  • solicit regular input from stakeholders
  • document standards and practices
  • create policies, professional recommendations, and reports,
  • maintain current and expert knowledge of standards and best practices for metadata and data management
  • manage new forms of media
The study suggests that, in practice, technical skills are not always as essential in digital stewardship as job postings suggest. Hardware/software implementation and Qualitative data analysis skills were important to only half of the respondents. Workflow management is a universally important skill deemed ”Essential" by almost all respondents. Other categories appeared as Somewhat Important, or as areas that need further research.

The study suggests that "although specific technical skills are viewed as highly important in different settings, a much larger majority of projects required skills less bound to a particular technology or media, like documentation creation and workflow analysis."  Digital stewards should possess, not only a deep understanding of their field, but the ability to "effectively disseminate their work to others."

Thursday, March 09, 2017

Top 10 Digital Archives Blogs

Top 10 Digital Archives Blogs. Jan Zastrow.  Information Today. July/August 2016.
Post about keeping up with reading about an archival or historical topic. By sharing it with others we can learn about new developments in the field without having to read all the current literature ourselves. Here is a list of  selected sources to help sift through the noise and keep up with the quickly evolving world of digital archives, electronic records, digital preservation and curation, personal archiving, digital humanities, and more. Some are from institutions, others are more informal, and they are mostly U.S.-centric, English-language sources. [I learned about some new helpful sites here.]

Society of American Archivists
1. The Society of American Archivists’ semi-annual The American Archivist, theoretical and practical developments in the archives profession in North America.

2. SAA Electronic Records Section runs the popular BloggERS! which aggregates news, information, and resources on electronic records, including case studies, reviews, and surveys.

U.S. Federal Agencies
3.The National Archives’ AOTUS Blog, and more at archives.gov/social-media/blogs.html.

4. The Library of Congress: The Signal: Digital Preservation with up-to-the-minute digital issues (such as web archiving, audiovisual preservation, digital forensics, data migration, and digital asset management).

Aggregated Sources to save you time.
5. ArchivesBlogs is a syndicated collection of blogs about archives, “by and for archivists,” taken from international RSS and Atom feeds every hour.

6. Digital Archiving Resources is an excellent annotated database of materials on digital archiving created by doctoral students at the University of Central Florida.

7. Digital Preservation Matters:  For more than a decade articles on digital preservation, long term access, digital archiving, digital curation, institutional repositories, and electronic records management. Search the blog’s archive, use the tag cloud interface, or subscribe via RSS or on Twitter.

Blogs: By and For Individuals
8. The brainchild of Kate Theimer, ArchivesNext  advocate of archives, technology, and professional issues

9. Trevor Owens: User Centered Digital History blog with cutting edge essays on digitization, born-digital, primary sources, web archives, and digital art, etc. 

10. Jaime Mears, Notes From a Nascent Archivist  is chockfull of great ideas, resources, projects, and more.


Wednesday, March 08, 2017

The Hidden Phenomenon That Could Ruin Your Old Discs

The Hidden Phenomenon That Could Ruin Your Old Discs. Ernie Smith. Motherboard. February 6, 2017.
     An article about regular CD and DVD optical discs and the problems that cause them to deteriorate.  "CDs and DVDs were sold to consumers as these virtually indestructible platters, but the truth, as exemplified by the “disc rot” phenomenon, is more complicated."  Early research showed that problems with the reflective layer could make the disc fail in 8 - 10 years. Or the degradable dye used in record-able discs will break down. The disc degradation sometimes looks like a stain or discoloration, or tiny pin pricks on the disc surface. "The eventual decay of optical media is a serious situation, whether you're a digital archivist or simply someone who wants to watch a movie on a weird format like a Laserdisc."

A Library of Congress preservation specialist said that the disc destruction showed up in three different forms: the "bronzing" of discs;  small pin-hole specs located on the discs; or "edge-rot".
Five facts about disc rot, according to the Library of Congress:
  1. Discs with significant errors are often still at least partially readable. This depends on the type of disc and where the error occurs.
  2. A scratch at the top of a CD is more problematic than one on the bottom, because scratches to the top surface can penetrate through and damage the reflective layer.
  3. DVDs generally have better integrity than do CDs but layers can delaminate over time. Dual-layer discs tend not to hold up so well.
  4. Recordable discs, and particularly DVDs don't last as long, due to the degradation of the organic dye used. A poorly recorded disc tends to wear out more quickly.
  5. Proper storage and handling helps. A well-made commercially pressed disc can last many decades if stored and handled properly. Discs stored in harsh environmental conditions with elevated temperature and/or humidity will have shorter expected lifetime.

Tuesday, March 07, 2017

The role of archives

The role of archives. Helen Hockx. Things I cannot say in 140 characters.  January 20, 2017.
     The role of Archives, especially when it comes to digital records, is not commonly understood. An archivist should ask questions "about the file structure, the access system, who accessed it, and how was it used… Appraisal is based on context, or the entire record keeping system and the importance of individual items depends on how they relate to one another within a system". This is difficult to do after the fact. The heart of the problem is: who makes decisions on what records to keep? A perception is that Archives are "museums with artifacts, and have no authority over digital records”.  access to the digital files should be determined by the “data stewards” under the direction of the University’s Information Governance Committee. The role of Archives, data access, record lifecycles and retention schedules seem to be largely misunderstood.


Monday, March 06, 2017

Electric WAILs and Ham

Electric WAILs and Ham. John Berlin. Web Science and Digital Libraries Research Group. February 13, 2017.
     Web Archiving Integration Layer (WAIL) is a one-click configuration and utilization tool that fits between institutional and individual archiving tools from a user's personal computer. Changing the tool from a Python application into an Electron application has brought with it many improvements especially the ability to update and package it for Linux, MacOS, and Windows.

WAIL is now collection-centric and provides users with the ability to curate personalized web archive collections, similar to Archive-It, but on their local machines. It also adds the ability to monitor and archive Twitter content automatically. WAIL is now available from the project's release page on Github.  More information about WAIL is available on their wiki.

Saturday, March 04, 2017

What Do IT Decision Makers Want?

What Do IT Decision Makers Want? Tom Coughlin. Forbes. March 1, 2017.
     An article that looks at a study of over 1,200 senior IT decision makers in 11 countries. Some findings

  • The vast majority of those surveyed have revised their storage strategy in the last 12 months because of frustrations with storage costs, performance, complexity and fragmentation of existing solutions. 
  • 60% say storage expenses are under increased scrutiny 
  • 95% are interested in the scalability and efficiency of software-defined storage. 
  • Digital storage is about 7% of the total IT budget.
  • Some concerns: 
    • High costs: 
      • 80% were concerned with the cost of their storage system
      • 92 % worry about managing storage costs as capacity needs grow. 
      • On average 70% of IT budgets are allocated to data storage 
    • Performance: 
      • 73% are concerned with the performance of their existing storage solution. 
    • Growing complexity and fragmentation: 
      • 71% of respondents said storage systems were complex and highly fragmented.  
  • Software-defined storage [which involves separating the storage capabilities and services from the storage hardware]  is playing significant roles in improving the utilization of storage resources and stretching storage budgets.

Thursday, March 02, 2017

A lifetime in the digital world

A lifetime in the digital world. Helen Hockx. Blog: Things I cannot say in 140 characters.
February 15, 2017.
     A very interesting post about papers donated to the University of Notre Dame in 1996, and how the library has been dealing with the collection. The collection includes a survey that is possibly “the largest, single, data gathering event ever performed with regard to women religious”. The data was stored on “seven reels of 800 dpi tapes, ]rec]120, blocksize 12,000, approximately 810,000 records in all”, extracted from the original EBCDIC tapes and converted to newer formats in 1996, transferred to CDs then to computer hard disk in 1999. The 1967 survey data has fortunately survived the format migrations. Some other data in the collection had been lost: at least 3 tape reels could not be read during the 1996 migration exercise and at least one file could not be copied in 1999. "The survey data has not been used for 18 years since 1996 – nicely and appropriately described by the colleague as “a lifetime in the digital world”.

The dataset has now been reformatted and stored in .dta and .csv formats. We also recreated the “codebook” of all the questions and pre-defined responses and put in one document. The dataset is in the best possible format for re-use. The post gives examples of  digital collection items that require intervention or preservation actions. A few takeaways:
  • Active use seems to be the best way for monitoring and detecting digital obsolescence.
  • Metadata really is essential. Without the notes, finding aid and scanned codebook, we would not be able to make sense of the dataset.
  • Do not wait a lifetime to think about digital preservation. 
  • The longer you wait, the more difficult it gets.