Choose the best file formats

Guidance on selecting file formats for long-term accessibility and interoperability

This page lists the file formats which are recommended for depositing in Edinburgh DataShare and Edinburgh DataVault. If you have a suggested update or a question about any of this advice, the Research Data Service team will be delighted to hear from you.

Introduction: why your choice of file format is important

Longevity of your research data

To ensure access and usability of your data to the broadest audience into the long term, the Research Data Support team encourages you to deposit standard preservation file formats to ensure the maximum longevity of your research data.

The digital preservation community recommends standard preservation formats because, either:

  • they encode the information in a way that is software-independent or allows interoperability between systems and applications. Often, these formats are a recognised standard, or published in an open format. Some of these file formats might be proprietary but can be opened in different operating systems and with different programs or applications.

      or

  • the information is encoded with a lossless algorithm and for that reason there is no data loss when files are ‘saved as’ and stored in these formats.

We recommend that any files you deposit in Edinburgh DataShare, DataVault or an external repository should, where possible, be open, platform-independent or nonproprietary file formats.

Formats we recommend

File formats that the Research Data Support team supports and recommends are listed below.

Textual documents

  • Adobe PDF /A    (filename extension: ".pdf")
  • Text    (filename extensions: ".txt", ".asc", ".sts")
  • OpenDocument - text   (" .fodt", ".odt")
  • Microsoft Word XML   (".docx")

Tabular data

  • Comma separated values (CSV)    (".csv")
  • Tab separated values   (" .tsv", ".tab")
  • OpenDocument - spreadsheet    (".fods", ".ods")
  • Microsoft Excel XML   (".xlsx")

Images

  • JPEG 2000   (".jpxml", ".jp3d", ".jpf", ".jpm", ".jpx", ".jp2")
  • TIFF    (".tiff", ".tif")
  • PNG    (".png")
  • JPEG *    "(.jpg", ".jpeg")
  • Scalable Vector Graphics (SVG)    (".svg")

Audio / Video

  • AIFF    (".aiff", ".aif", ".aifc")
  • WAV    (".wav")
  • Free Lossless Audio Codec (FLAC)    (".flac")
  • MPEG-4    (".m4v", ".m4r", ".m4b", ".m4p", ".m4a", ".mp4")
  • Motion JPEG2000    (".mjp2", ".mj2")

Geo-spatial data

  • Shapefile    (".shp", ".shx", ".dbf", ".prf")

  • GeoTIFF   (".tif")

Other

  • Postscript    .ps
  • Structured Query Language (SQL)    .sql
  • OpenDocument - presentation    .fodp, .odp
  • Microsoft Powerpoint XML    .pptx
  • SAS syntax    .sas
  • SPSS syntax    .sps
  • Stata syntax    .do, .dct
  • Minitab syntax and output    .lis, .tj
  • R (ASCII, as opposed to the .rdata saved workspace file)    .rdata
  • XML (Extensible Markup Language)    .xml, .sgml
  • HTML (Hypertext Markup  Language)    .htm, .html
  • CSS (Cascading Style Sheets)    .css
  • NetCDF Network Common Data Form    .nc

* N.B. While the JPEG format is supported, depositors should be aware that we consider JPEG-2000 and TIFF (both being standard preservation formats) to be more interoperable for the long-term than JPEG. Depositors who value long-term sustainability may wish to add copies of their images which have been converted to JPEG-2000 to their deposit.

Other acceptable file formats

File formats such as the ones listed below have been deposited in the repository but are not considered standard preservation formats because they are either proprietary or system-, software- or version-dependent, are considered lossy (i.e. data are lost when compression is applied) or not as commonly-encountered as the ones mentioned above.

Most of these formats are widely used and it is likely we will be able to preserve them, but we cannot guarantee it. If you have files in these formats, you may deposit them in Edinburgh DataShare.

  • BED    .bed
  • bedGraph    .bg
  • DBase, DBF    .dbf
  • EAF File    .eaf
  • Encapsulated PostScript (EPS)    .epsi, .epsf, .eps
  • FLT    .flt
  • HDF (Hierarchical Data Format)    .he4, .h5, .hdf4, .h4, .hdf, .he5, .hdf
  • LAB    .lab
  • Mathematica    .nbp, .nb 
  • MatLab code  .m 
  • ML source code file    .ml
  • MTRANS file    .mtr
  • Photo CD    .pcd
  • PSC    .psc
  • PFSX File    .pfsx
  • PITCH File    .pitch, .PITCH
  • PitchTier File    .pitchtier, .PITCHTIER
  • RESULTSMFC File    .RESULTMFC
  • TextGrid    .textgrid, .TextGrid
  • VTK (Visualisation ToolKit)    .vtu

 

Formats which should be converted

Converting research data files from proprietary or software-dependent formats to a standard preservation format will help to avoid difficulties opening these files in the future. By using standard preservation formats, you are maximising the likelihood that most future potential users will be able to open the files.

Robin Rice
Data Librarian & Head, Research Data Support

If your research data include any of the following file formats, we recommend you convert them to the suggested standard preservation format, where it is possible to do this without compromising (i.e. losing or altering) the data. The converted files should then be deposited along with the original files.

Textual documents
Format name (original file) Convert to (recommended preservation format(s))
Encapsulated PostScript (EPS)  (".epsi, ".epsf", ".eps") TIFF
RTF (".rtf") OpenDocument format, Microsoft Word XML, PDF or plain text
LateX      (".ltx", ".latex") Deposit .pdf files alongside these.
TeX    (".tex")    Deposit .bib and .pdf files alongside this.
TeX dvi    (".dvi")    PDF
WordPerfect    (".w51", ".wp5", ".wp", ".wpd")   OpenDocument Format, Microsoft Word, plain text or PDF

 

 

Tabular data
Format name (original file) Convert to (recommended preservation format(s))
MatLab binary data files (.mat)  CSV or plain text
Microsoft Access (.mdb) If practicable, export to multiple tables e.g. CSV, Excel and/or tab-delimited format.

SPSS – We recommend SPSS users deposit syntax files and data files. Syntax files should be deposited in the .sps format, as generated automatically by SPSS. Whereas we recommend that the following SPSS data and system files be converted as follows:

 
SPSS portable file (contains data) (filename extension: .por) Deposit .sps (syntax) and .csv (data) files alongside these.
SPSS binary data file (.sav, .gsav, .zsav) (aka system file) Deposit .sps (syntax) and .csv (data) files alongside these.
SPSS output file (.spv, .spo) Convert to text, HTML or PDF, and deposit alongside these.

 

Images
Format name (original file) Convert to (recommended preservation format(s))
BMP (".ddb", ".dib", ".bmp") TIFF / JPEG-2000
NifTi (.img, .hdr, .nii) It may be worth exporting a selection of still 2-D images as TIFF files for accessibility.
Photoshop (.psd, .pdd) TIFF / JPEG-2000
GIF (".gif") TIFF / JPEG-2000
Audio / Video
Format name (original file) Convert to (recommended preservation format(s))
Audio (.au, .snd) FLAC
MPEG (.mpeg, .mpg, .mpe) MPEG-4
Video Quicktime (.qtm, .mov, .qt) MPEG-4
MPEG Audio (.m4a, .mpa, .abs, .mpega) FLAC
Flash Video (.f4b, .f4a, .f4p, .f4v, .flv) MPEG-4
AVI Audio/Video Interleaved Format (".avi") MPEG-4
Ogg Vorbis Codec Compressed Multimedia File (".ogg") FLAC

 

Compression archives

WARNING for DataVault depositors: files deposited into the DataVault are encrypted. We strongly discourage compression of data destined for deposit in the DataVault since, in combination with the encryption, this adds a considerable risk of the data becoming irretrievable over the long term. 

Format name (original file) Convert to (recommended preservation format(s))
Compressed Archive File (".zip")

If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200. N.B. Mac users - we have found that .zip files larger than 4 GB created using Mac in-built zip functionality cannot be opened using on other platforms. Therefore we ask Mac users to use an alternative such as GNU zip (gzip) for zipping archives of that size. 

BZIP2 (.bz2, .bz) If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200.
GZIP compressed archive file (".gz") If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200.
Tarball (.tar, .tgz) If practicable, expand archive to submit individual files for ease of navigation, as long as number of files is less than 200.
RAR compression archive (".rar") Zip or tarball instead

Getting help

If you have research data in file formats that you are unsure about, need help converting your files to standard preservation formats, or simply want to discuss your needs with us, please contact us via the Contact box above.