Formats for long term preservation of information - Guest post from Cassie Findlay
After my presentation on preserving digital information at the recent Local Government Web Network conference, Reem told me that she received lots of enquiries about PDF/A, which I mentioned in my talk; specifically, What is it?, How to use it? and Where to get it? PDF/A is the ‘archival’ format for PDF. PDF/A has fewer "bells and whistles" than traditional PDF which minimises future migration requirements. PDF/A is more open than traditional PDF because it is maintained by the International Standards Organisation, not one specific vendor. The PDF/A standard is ISO 19005-1 Document management – Electronic document file format for long-term preservation – Use of PDF 1.4 (PDF/A-1). You can create PDF/A documents by either adding a PDF maker plug-in to MS Word. Adobe Acrobat Professional has a validation feature which allows you to validate your document against the PDF/A-1 standard.
PDF/A is a great option for a long term preservation format (or LTPF as digital preservation nerds like me like to say!) for documentary style information – ideally you should use PDF/A from the very start of the information’s life, particularly when you know it has long term value (such as Board minutes). Alternatively, you can implement a conversion strategy to turn documents into PDF/A when they are moved off active directories, to minimise the risk of obsolescence and make future migrations easier. If the information is ultimately required to come to the State archives, PDF/A is a suitable format for that, too. Other formats that are suitable as LTPFs include Open Document Format (ODF), HTML, XHTML and XML. For digital images, JPEG, TIFF or PNG are recommended, and FLAC for digital audio.
For those interested in capturing websites or portions of websites in high volumes, rather than saving individual pages, the International Internet Preservation Consortium recently announced the publication of the WARC file format as an international standard: ISO 28500:2009, Information and documentation — WARC file format. This is a a container format that permits one file simply and safely to carry a very large number of constituent data objects (of unrestricted type, including many binary types) for the purpose of storage, management, and exchange.
For more information on choosing the right formats for long term preservation of digital information, check out State Records’ digital records strategy Future Proof, subscribe to our blog at: http://futureproof.records.nsw.gov.au or follow us on Twitter @FutureProofNSW Cassie Findlay Cassandra.Findlay@records.nsw.gov.au