Unformatted text preview:

Metadata for Use and Preservation of SpreadsheetsMelissa KeenanDecember 4, 2006INF 389KThe most correct type of preservation needed for an Excel spreadsheetis dependent on its original use. For instance, some users use a spreadsheet as a form of simplified database, which is the example I have encoded for thisproject (see Appendix I). Others rely heavily on the formulas and chart features of a spreadsheet.If a spreadsheet is used exclusively to store data, then saving as an html file or comma separated file (.csv) would be appropriate. The advantages of such an approach are that the conversion process is simple (just use Save As feature in Excel), and access is not limited to a particular platform. In addition, the results are human readable. The largest con is that this approach only works for spreadsheet that are used simply for data. This approach would not work for spreadsheets with formulas or graphics.Another preservation approach is to save the spreadsheet as XML, was as of Excel 2003 is available as a “Save As” option. The pros of this approach are that it is both simple and allows for the preservation of formulas and graphics. The cons are that Microsoft XML, while more open than Excel, in that it is a published standard, is still a proprietary format that is not platformindependent.A different approach to saving in XML is to import the spreadsheet to the Open Office Spreadsheet format, known as CALC, then save as an ODS document. Most of what is known about the Excel format is due to Open Office’s attempt to reverse engineer the format, and they have taken pains toensure compatibility with Microsoft. OASIS has based their standard, Open Document Format for Office Applications (OpenDocument) v1.0, onOpenOffice.org XML (see references). A few other programs that can read thisformat as well. Examples I found are GNumeric and KSpread (see references).METADATAEAD – Encoded Archival DescriptionThe use case envisioned here is for inventory spreadsheets relating to an archival collection of the Michael Joyce hypertexts. The commonly accepted metadata tag set for archival encoding (at least in the United States) is EAD, the Encoded Archival Description. EAD is maintained by the Network Development and MARC Standards Office of the U.S. Library of Congress, together with the Society of American Archivists. Because of its widespread acceptance, this metadata set seemed to be the best choice the archival metadata. The EAD website (see resources) states that the Beta test period for the latest schema was over September 22, 2006 and that the should officially go into place on October 2, 2006. However, since their website has not been updated, the SIP profile references the beta schema. If this SIP profile were to be widely disseminated, a confirmation of the final EAD schema would have to be made first.The following lists the archival elements used from EAD to enhance an archival collection inventory spreadsheet.EAD METADATA ELEMENTSELEMENTS USEDUSED FOR ELEMENTS USEDATTRIBUTES USED<Archdesc> Archival Description Level (“class”, “collection”, “fonds”, “recordgrp”, etc.<repository> Name and address of collection repositoryCoprname, address<descgrp> Brings together children of <archdesc>Accessrestrict, userestrictEAD Example<archdesc level = “ collection“><respository label = Repository”><corpname>Harry Ransom Humanities Research Center at the University of Texas at Austin</corpname><address>P.O. Box 721DAustin, Texas 78713-721D</address></repository><descgrp><head>Important information for users of the collection</head><accessrestrict><head>Access</head><p>Collection is open for research</></accressrestrict>OPEN FORMULAOpen Formula describes how to exchange recalculated formulas between applications, primarily spreadsheet programs. It designed to let users choosewhich spreadsheet application to use, and still exchange data with people who made different choices.MATHML ELEMENTSHere is an example of a simple notation, to give an example. The use case in the Appendix does not use this notation, but I wanted to make sure this wasavailable as markup until the Open Formula.MATHML ExampleNotation: x2 + 4x + 4 = 0.Markup:<mrow> <mrow> <msup> <mi>x</mi> <mn>2</mn> </msup> <mo>+</mo> <mrow> <mn>4</mn> <mo>&InvisibleTimes;</mo> <mi>x</mi> </mrow> <mo>+</mo> <mn>4</mn> </mrow> <mo>=</mo> <mn>0</mn></mrow> REFERENCESArts and Humanities Data Service. AHDS Preservation Handbook: Spreadsheets. Retrieved November 5, 2006 from http://ahds.ac.uk/preservation/spreadsheets-preservation-handbook.pdf.Ballegooie, Marlene van, and Wendy Duff. (2006). DCC | Digital Curation Manual: Installment on “Archival Metadata”. Retrieved November 18, 2006 from http://www.dcc.ac.uk/resource/curation-manual/chapters/archival-metadata/archival-metadata.pdf.Carpenter, Grace. (2005). Preservation Options for Excel 10.0/XP/2002 (BIFF8X). Retrieved November 12, 2006 from the DSpace Digital Preservation Tools and Strategies website at http://wiki.dspace.org/static_files/4/47/PresOps-excel.pdf.Carpenter, Grace. (2005). Format Background Document: Microsoft Excel 10.0 (XLS). Retrieved November 12, 2006 from the DSpace Digital Preservation Tools and Strategies website at http://wiki.dspace.org/static_files/2/20/Backgrd-XLS.pdf.DAITSS: Dark Archive in the Sunshine State. DAITSS METS Document Profile for Submission Information Packages (SIP). Retrieved November 28, 2006 from http://www.fcla.edu/digitalArchive/pdfs/DAITSS_METS_SIP_Profile.pdf.GNumeric – The Gnome Office Spreadsheet. A component of the GNU GNOMEdesktop environment, found at http://www.gnome.org/projects/gnumeric/.KSpread. A component of the KOffice Project. A free and open source office suite for Linux and Unix systems, found at http://www.koffice.org/kspread.OASIS. (2005). Open Document Format for Office Applications (OpenDocument) v1.0. Retrieved October 16, 2006 from http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf.Open Office Calc. Open source software that provided the basis for the international standard for documents (see OASIS). A component of OpenOffice.org, found at http://www.openoffice.org/product/calc.html.OpenDocument – Formula. Specification for recalculated formulas in office documents.


View Full Document

UT INF 389K - Spreadsheet documentation

Download Spreadsheet documentation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Spreadsheet documentation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Spreadsheet documentation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?