Guidance on selecting file formats for long-term accessibility and interoperability This page lists the file formats which are recommended for depositing in Edinburgh DataShare and Edinburgh DataVault. If you have a suggested update or a question about any of this advice, the Research Data Service team will be delighted to hear from you. Introduction: why your choice of file format is important Longevity of your research data To ensure access and usability of your data to the broadest audience into the long term, the Research Data Support team encourages you to deposit standard preservation file formats to ensure the maximum longevity of your research data. The digital preservation community recommends standard preservation formats because, either: they encode the information in a way that is software-independent or allows interoperability between systems and applications. Often, these formats are a recognised standard, or published in an open format. Some of these file formats might be proprietary but can be opened in different operating systems and with different programs or applications. or the information is encoded with a lossless algorithm and for that reason there is no data loss when files are ‘saved as’ and stored in these formats. We recommend that any files you deposit in Edinburgh DataShare, DataVault or an external repository should, where possible, be open, platform-independent or nonproprietary file formats. Formats we recommend File formats that the Research Data Support team supports and recommends are listed below. Textual documents Adobe PDF /A (filename extension: ".pdf") Text (filename extensions: ".txt", ".asc", ".sts") OpenDocument - text (" .fodt", ".odt") Microsoft Word XML (".docx") Tabular data Comma separated values (CSV) (".csv") Tab separated values (" .tsv", ".tab") OpenDocument - spreadsheet (".fods", ".ods") Microsoft Excel XML (".xlsx") Images JPEG 2000 (".jpxml", ".jp3d", ".jpf", ".jpm", ".jpx", ".jp2") TIFF (".tiff", ".tif") PNG (".png") JPEG * "(.jpg", ".jpeg") Scalable Vector Graphics (SVG) (".svg") Audio / Video AIFF (".aiff", ".aif", ".aifc") WAV (".wav") Free Lossless Audio Codec (FLAC) (".flac") MPEG-4 (".m4v", ".m4r", ".m4b", ".m4p", ".m4a", ".mp4") Motion JPEG2000 (".mjp2", ".mj2") Geo-spatial data Shapefile (".shp", ".shx", ".dbf", ".prf") GeoTIFF (".tif") Other Postscript .ps Structured Query Language (SQL) .sql OpenDocument - presentation .fodp, .odp Microsoft Powerpoint XML .pptx SAS syntax .sas SPSS syntax .sps Stata syntax .do, .dct Minitab syntax and output .lis, .tj R (ASCII, as opposed to the .rdata saved workspace file) .rdata XML (Extensible Markup Language) .xml, .sgml HTML (Hypertext Markup Language) .htm, .html CSS (Cascading Style Sheets) .css NetCDF Network Common Data Form .nc * N.B. While the JPEG format is supported, depositors should be aware that we consider JPEG-2000 and TIFF (both being standard preservation formats) to be more interoperable for the long-term than JPEG. Depositors who value long-term sustainability may wish to add copies of their images which have been converted to JPEG-2000 to their deposit. Other acceptable file formats File formats such as the ones listed below have been deposited in the repository but are not considered standard preservation formats because they are either proprietary or system-, software- or version-dependent, are considered lossy (i.e. data are lost when compression is applied) or not as commonly-encountered as the ones mentioned above. Most of these formats are widely used and it is likely we will be able to preserve them, but we cannot guarantee it. If you have files in these formats, you may deposit them in Edinburgh DataShare. BED .bed bedGraph .bg DBase, DBF .dbf EAF File .eaf Encapsulated PostScript (EPS) .epsi, .epsf, .eps FLT .flt HDF (Hierarchical Data Format) .he4, .h5, .hdf4, .h4, .hdf, .he5, .hdf LAB .lab Mathematica .nbp, .nb MatLab code .m ML source code file .ml MTRANS file .mtr Photo CD .pcd PSC .psc PFSX File .pfsx PITCH File .pitch, .PITCH PitchTier File .pitchtier, .PITCHTIER RESULTSMFC File .RESULTMFC TextGrid .textgrid, .TextGrid VTK (Visualisation ToolKit) .vtu Formats which should be converted Converting research data files from proprietary or software-dependent formats to a standard preservation format will help to avoid difficulties opening these files in the future. By using standard preservation formats, you are maximising the likelihood that most future potential users will be able to open the files. Robin RiceData Librarian & Head, Research Data Support If your research data include any of the following file formats, we recommend you convert them to the suggested standard preservation format, where it is possible to do this without compromising (i.e. losing or altering) the data. The converted files should then be deposited along with the original files. Textual documents Format name (original file) Convert to (recommended preservation format(s)) Encapsulated PostScript (EPS) (".epsi, ".epsf", ".eps") TIFF RTF (".rtf") OpenDocument format, Microsoft Word XML, PDF or plain text LateX (".ltx", ".latex") Deposit .pdf files alongside these. TeX (".tex") Deposit .bib and .pdf files alongside this. TeX dvi (".dvi") PDF WordPerfect (".w51", ".wp5", ".wp", ".wpd") OpenDocument Format, Microsoft Word, plain text or PDF Tabular data Format name (original file) Convert to (recommended preservation format(s)) MatLab binary data files (.mat) CSV or plain text Microsoft Access (.mdb) If practicable, export to multiple tables e.g. CSV, Excel and/or tab-delimited format. SPSS – We recommend SPSS users deposit syntax files and data files. Syntax files should be deposited in the .sps format, as generated automatically by SPSS. Whereas we recommend that the following SPSS data and system files be converted as follows: SPSS portable file (contains data) (filename extension: .por) Deposit .sps (syntax) and .csv (data) files alongside these. SPSS binary data file (.sav, .gsav, .zsav) (aka system file) Deposit .sps (syntax) and .csv (data) files alongside these. SPSS output file (.spv, .spo) Convert to text, HTML or PDF, and deposit alongside these. Images Format name (original file) Convert to (recommended preservation format(s)) BMP (".ddb", ".dib", ".bmp") TIFF / JPEG-2000 NifTi (.img, .hdr, .nii) It may be worth exporting a selection of still 2-D images as TIFF files for accessibility. Photoshop (.psd, .pdd) TIFF / JPEG-2000 GIF (".gif") TIFF / JPEG-2000 Audio / Video Format name (original file) Convert to (recommended preservation format(s)) Audio (.au, .snd) FLAC MPEG (.mpeg, .mpg, .mpe) MPEG-4 Video Quicktime (.qtm, .mov, .qt) MPEG-4 MPEG Audio (.m4a, .mpa, .abs, .mpega) FLAC Flash Video (.f4b, .f4a, .f4p, .f4v, .flv) MPEG-4 AVI Audio/Video Interleaved Format (".avi") MPEG-4 Ogg Vorbis Codec Compressed Multimedia File (".ogg") FLAC Compression archives WARNING for DataVault depositors: files deposited into the DataVault are encrypted. We strongly discourage compression of data destined for deposit in the DataVault since, in combination with the encryption, this adds a considerable risk of the data becoming irretrievable over the long term. Format name (original file) Convert to (recommended preservation format(s)) Compressed Archive File (".zip") If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200. N.B. Mac users - we have found that .zip files larger than 4 GB created using Mac in-built zip functionality cannot be opened using on other platforms. Therefore we ask Mac users to use an alternative such as GNU zip (gzip) for zipping archives of that size. BZIP2 (.bz2, .bz) If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200. GZIP compressed archive file (".gz") If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200. Tarball (.tar, .tgz) If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200. RAR compression archive (".rar") Zip or tarball instead Getting help If you have research data in file formats that you are unsure about, need help converting your files to standard preservation formats, or simply want to discuss your needs with us, please contact us via the Contact box above. This article was published on 2024-08-21