io: Manage file formats that can be opened and exported

The io module keeps track of the functions that can open, fetch, and export data in various formats.

I/O sources and destinations are specified as filenames, and the appropriate open or export function is found by deducing the format from the suffix of the filename. An additional compression suffix, i.e., .gz, indicates that the file is or should be compressed. In addition to reading data from files, data can be fetched from the Internet. In that case, instead of a filename, the data source is specified as prefix:identifier, e.g., pdb:1gcn, where the prefix identifies the data format, and the identifier selects the data.

All data I/O is in binary.

register_format(format_name, category, extensions, prefixes=(), mime=(), reference=None, dangerous=None, icon=None, **kw)

Register file format’s I/O functions and meta-data

Parameters:
  • format_name – format’s name
  • category – says what kind of data the should be classified as.
  • extensions – is a sequence of filename suffixes starting with a period. If the format doesn’t open from a filename (e.g., PDB ID code), then extensions should be an empty sequence.
  • prefixes – is a sequence of filename prefixes (no ‘:’), possibily empty.
  • mime – is a sequence of mime types, possibly empty.
  • reference – a URL link to the specification.
  • dangerous – should be True for formats that can write/delete a users’s files. False by default except for the SCRIPT category.

Todo

possibly break up in to multiple functions

register_open(format_name, open_function, requires_filename=False)

register a function that reads data from a stream

Parameters:
  • open_function – function taking an I/O stream or filename and returns a 2-tuple with a list of models and a status message
  • requires_filename – True if first argument must be a filename
register_fetch(format_name, fetch_function)

register a function that fetches data from the Internet

Parameters:fetch_fuction – function that takes an identifier, and returns an I/O stream for reading data, and identifying name. Usually the name is the same as the identifier.
formats()

Return all known format names

open_data(session, filespec, as_a=None, label=None, **kw)

open a (compressed) file

Parameters:
  • filespec – ‘’‘prefix:id’‘’ or a (compressed) filename
  • as_a – file as if it has the given format
  • label – optional name used to identify data source

If a file format requires a filename, then compressed files are uncompressed into a temporary file before calling the open function.

open_multiple_data(session, filespecs, **kw)

Open one or more files, including handling formats where multiple files contribute to a single model, such as image stacks.

prefixes(format_name)

Return filename prefixes for named format.

prefixes(format_name) -> [filename-prefix(es)]

extensions(format_name)

Return filename extensions for named format.

extensions(format_name) -> [filename-extension(s)]

open_function(format_name)

Return open callback for named format.

open_function(format_name) -> function

fetch_function(format_name)

Return fetch callback for named format.

fetch_function(format_name) -> function

export_function(format_name)

Return export callback for named format.

export_function(format_name) -> function

mime_types(format_name)

Return mime types for named format.

requires_filename(format_name)

Return whether named format can needs a seekable file

dangerous(format_name)

Return whether named format can write to files

category(format_name)

Return category of named format

format_names(open=True, export=False, source_is_file=False)

Returns list of known format names.

formats() -> [format-name(s)]

By default only returns the names of openable formats.

categorized_formats(open=True, export=False)

Return known formats by category

categorized_formats() -> { category: formats() }

deduce_format(filename, has_format=None, prefixable=True)

Figure out named format associated with filename

Return tuple of deduced format name, whether it was a prefix reference, the unmangled filename, and the compression format (if present). If it is a prefix reference, then it needs to be fetched.

Example:

io.register_format("mmCIF", "Molecular structure",
    (".cif",), ("mmcif", "cif"),
    mime=("chemical/x-cif", "chemical/x-mmcif"),
    reference="http://mmcif.wwpdb.org/",
    requires_seeking=True, open_func=open_mmCIF)