Processing NMR Data

Sparky displays contoured 2, 3, and 4 dimensional frequency domain spectra. It does not have facilities for viewing 1-D spectra and it will not read FID (free induction decay) data. You must first fourier transform the time domain FID data using a processing program. Sparky can directly read Felix matrix format, Bruker format, or UCSF format. These formats divide the data into cells so that small regions of the spectrum can be read with few disk accesses. You can create Felix matrices using the Felix program from BioSym. NMRPipe data can be converted to UCSF format with a conversion program pipe2ucsf supplied with Sparky. Varian VNMR 2-D and 3-D processed data can be converted with vnmr2ucsf distributed with Sparky. You use the ucsfdata program to view the header, extract regions, project along an axis, and perform other simple operations on UCSF format nmr data. See the UCSF format specification if you want to write your own conversion program.

Converting Spectra from NMRPipe to UCSF format

If you process your data with NMRPipe you'll need to convert it to UCSF format with the program pipe2ucsf that comes with the Sparky distribution. NMRPipe is a processing program written by Frank Delaglio ( available from the NMRPipe web page.

Given NMRPipe data noe150.pipe you convert it to UCSF format with:

	% pipe2ucsf noe150.pipe noe150.ucsf

This creates the file noe150.ucsf. Only the real component is taken from the NMRPipe file. To see the header of a UCSF format file you can use the ucsfdata program:

	% ucsfdata noe150.ucsf
	axis                          w1          w2
	nucleus                       1H          1H
	matrix size                 2048        4096
	block size                    64         128
	upfield ppm               -0.888      -0.884
	downfield ppm             10.780      10.784
	spectral width Hz       7000.350    7000.350
	transmitter MHz          599.929     599.929

Make sure that the nucleus type for each axis (1H, 13C, 15N) is correct. If it is not right many Sparky features will not work properly (eg. crosshair synchronization, assignment guessing, ...). You can correct nucleus type names and other header parameters using ucsfdata.

Details of pipe2ucsf

	pipe2ucsf [-axisorder] pipefile ucsffile
	  where axisorder is a string of digits (eg. 312)

The pipe2ucsf program will convert 2D, 3D, or 4D data processed with NMRPipe to UCSF format. 3D or 4D nmrpipe data should be in a single file. A spectrum divided into separate files containing 2D planes (with pipe2xyz) cannot be converted by pipe2ucsf. Use xyz2pipe which comes with NMRPipe to convert sets of 2D planes into a single 3D pipe file. For a processed hnco spectrum saved as 2D planes with names hnco001.ft, hnco002.ft, ... the command to produce a single 3D NMRPipe file looks like:

        xyz2pipe -in hnco%03d.ft -x > hnco_3D.ft
Refer to the xyz2pipe man page for more details. You would then apply pipe2ucsf to hnco_3D.ft. The input and output files for pipe2ucsf must be explicit files, unix pipes cannot be used. This is because pipe2ucsf does random access reading and writing of data which is not possible using stdin (standard input) and stdout (standard output).

The axis order in the created UCSF spectrum file (ie. which spectrum axis is called w1, which is called w2, ...) can be controlled by the axisorder option. The w1, w2, ... axis names are displayed by Sparky but have no special significance (ie. there is no correct order). Conventionally the highest axis number is the directly detected dimension. Here is an explanation of how pipe2ucsf arrives at its default axis order. To put Bruker or Varian raw spectrometer data into NMRPipe format before processing you use the programs bruk2pipe and var2pipe that come with NMRPipe. The arguments to these commands define the XYZA axes for the data. For a 2D spectrum pipe2ucsf with no axisorder option will name the X and Y axes w2 and w1, for 3D data the XYZ axes become w3, w2, w1, and for 4D data the XYZA axes become w4, w3, w2, w1. This is true even if processing with NMRPipe transposed the axes in the NMRPipe file. To obtain a different order use a command like:

	pipe2ucsf -321 pipefile ucsffile

This will reverse the default ordering for a 3-D spectrum. Using -123 gives the default order, -213 swaps the first two axes, etc... For a 2-D spectrum -21 reverses the default axis order.

The pipe2ucsf program determines the byte order of the binary data using a parameter in the file. If you move the processed NMRPipe data to a machine whose native byte order is not the same as the data, and run pipe2ucsf it will recognize that the machine and file byte orders do not match and swap the order to produce a correct UCSF data file.

Data from Felix

Data processed with Felix that is in Felix matrix format can be used directly by Sparky. You use the Sparky open command under the file menu. The Felix matrix should contain correct transmitter frequencies, spectral widths, reference points, reference shifts and axis labels for each axis. These are set with the Felix rmx command.

	rmx dim freq width type refpt refval text

For example,

	rmx 1 600.123 5000.0 0 1 10.35 H1

sets axis 1, with transmitter frequency 600.123 MHz, sweep width of 5000.0 Hz, axis type 0 (this is not used by Sparky), index position 1 has a chemical shift of 10.35 ppm, and this is a proton axis. Axis labels C13 or N15 are used for heteronuclear spectra.

Converting VNMR processed spectra to UCSF format

If you process your data with Varian's VNMR you'll need to convert it to UCSF format using the program vnmr2ucsf supplied with Sparky. This program will convert 2D or 3D data.

To convert VNMR processed 2D data in directory exp7 to UCSF format use:

	% vnmr2ucsf exp7/procpar exp7/datdir/phasefile exp7.ucsf H H

This creates the file exp7.ucsf using parameters from exp7/procpar and data from exp7/datdir/phasefile. To make sure that VNMR produces the phasefile you need to set trace='f1', display the full spectrum, and use the VNMR flush command. If you don't use the above VNMR commands the phasefile may not be written or may contain just zero values. The above example is for a homonuclear experiment. The H H arguments specify the nucleus types for the f1 and f2 axes. You use C and N for carbon and nitrogen dimensions.

Here is an example that converts processed 3D data.

	% vnmr2ucsf extr/procpar3d extr/dataf1f3 hnca.ucsf C N H

Parameters are read from the procpar3d file and data comes from files dataf1f3.* which are 2D planes of the 3D spectrum. The 2D planes dataf1f3.n (where n is the plane number) are created by the VNMR getplane command. See the VNMR command reference manual for how to use getplane. The path to the data files on the vnmr2ucsf command line is specified without the plane number suffix. The vnmr2ucsf program appends the numeric suffixes to read all 2D planes. The final 3 arguments are the nucleus types for the f1, f2 and f3 axes.

Parameters fn, fn1, sfrq, dfrq, dfrq2, dfrq3, tn, dn, dn2, dn3, sw, sw1, rfl, rfl1, rfp, rfp1 from the procpar file are used by the conversion program to determine the size of the data set, transmitter frequencies, sweep widths and ppm scales. For 3D the additional parameters fn2, sw2, rfl2, rfp2 are used.

The vnmr2ucsf program assumes that the phasefile data was written with big endian byte order, the native byte order on Sun Sparc machines on which VNMR is usually run. If you move the phasefile and procpar file to a PC which has little endian byte order, you can run vnmr2ucsf on the PC and the conversion will work correctly.

Converting Bruker processed spectra to UCSF format

If you process your data with Bruker's XWinNMR or UXNMR software you can convert it to UCSF format using the program bruk2ucsf supplied with Sparky. This program has not been tested much. Bruker processed data can also be read directly by Sparky. Both the Bruker and UCSF file formats store the data matrix in blocks so that the whole file does not need to be brought into memory. The Bruker blocks are typically large (1Mb) while the UCSF blocks are small (32Kb). Because Sparky is designed to use the small blocks, you might run out of memory if you have many Bruker spectra loaded simultaneously. In this case you should convert the Bruker files to UCSF format with bruk2ucsf.

To convert Bruker processed data 1/pdata/1/2rr to UCSF format use:

	% bruk2ucsf 1/pdata/1/2rr noesy150.ucsf

This creates the file noesy150.ucsf using parameters from 1/acqus, 1/acqu2s, 1/pdata/1/procs, and 1/pdata/1/proc2s. There is a separate proc*s and acqu*s file for each spectrum axis. The parameters used from the acqu*s files are NUC1, NUCLEUS and SFO1. The parameters used from the proc*s files are SI, XDIM, SW_p, OFFSET, and BYTORDP. All the parameters needed by Sparky are put in the UCSF file. These include nucleus names for each axis, data size, block size, transmitter frequencies, sweep widths and ppm scales. The path to the Bruker data file should not be a symbolic link since paths to the acqus parameter files like symbolic-link/../../acqus will not be correct.

If the NUC1 or NUCLEUS parameters in the acqu*s files are not set to the correct nucleus (1H, 13C or 15N) for the spectrum axis, then the UCSF file will incorrectly identify the axis nuclei. You can check the UCSF file you get from bruk2ucsf using the ucsfdata command to see that the nucleus types are correct. If they are not correct Sparky will not properly synchronize crosshairs or guess assignments, and other features will not work. The ucsfdata program can be used to correct the nucleus types in the UCSF file.

The bruk2ucsf program determines the byte order of the binary data using the BYTORDP parameter. If you move the processed Bruker data to a machine whose native byte order is not the same as the Bruker data, and run bruk2ucsf on that machine it will still give a correct UCSF data file.

Details of ucsfdata

To extract information from a UCSF NMR file you can use the program ucsfdata that comes with Sparky. Invoking ucsfdata with no arguments gives the following message:

Usage: ucsfdata [-w1 i j] [-w2 i j] [-w3 i j] [-w4 i j]
                [-s1 r] [-s2 r] [-s3 r] [-s4 r]
                [-a1 name] [-a2 name] [-a3 name] [-a4 name]
                [-o1 ppm] [-o2 ppm] [-o3 ppm] [-o4 ppm]
                [-sw1 Hz] [-sw2 Hz] [-sw3 Hz] [-sw4 Hz]
                [-f1 MHz] [-f2 MHz] [-f3 MHz] [-f4 MHz]
                [-t negative-threshold positive-threshold]
                [-r] [-m] [-l] [-o output-file] input-file

Prints header information for UCSF format NMR data files.
Produces new UCSF data files for subregions, or with data below
  specified thresholds zeroed, or with reduced matrix size.
Produces matrix file with no header information.

  -wN low-index high-index
	specifies subregion for axis N
	project along axis N taking data value of largest magnitude.
  -sN cell-size
	replace cells by single data value of largest magnitude
  -aN axis-nucleus-name
	change the nucleus name (eg 1H, 15N, 13C) for axis N
  -oN origin-in-ppm
	change the chemical shift of the downfield edge of spectrum
  -swN spectral-width-in-hz
	change the spectral width
  -fN spectrometer-frequency-in-MHz
	change the spectrometer frequency (eg 600 Mhz for 1H)
  -t negative-threshold positive-threshold
	zeros all data points below threshold values
	reduce dimension by eliminating dimensions with only one plane
	outputs the data matrix (last axis varying fastest) to stdout
	print long format header info for data processed with Striker

With no -m or -o flag header information is printed to standard output. If the -m flag is present the data matrix is written in binary format as IEEE floats with your machine's native byte order. The matrix is written out with highest axis varying fastest. With the -o flag a new ucsf format file is written. The -w1, -w2, ... flags can be used to specify a subregion. The -s1, -s2, ... flags can be used to have cells replaced with the data value of largest magnitude (reducing the size of the matrix). A data matrix can be projected down one dimension (eg 3D to 2D) by using the -p1, -p2, ... flags. The projected value is the one of largest magnitude. Only one dimension can be projected in a single ucsfdata invocation. If more than one -pN flag is given, the last one is the only one performed. To change the axis labels in the input file header use the -a1, -a2, -a3, -a4 flags. The axis label should be one of 1H, 13C, or 15N. You can change the ppm shift of the downfield spectrum edge (ie change the referencing) with the -o1, -o2, -o3, and -o4 switches. The shift value is specified in ppm. You can change the spectral width for each axis with the -sw1, -sw2, -sw3 and -sw4 flags. You can also change the spectrometer frequency associated with an axis. The correct value depends on the nucleus. For example, for a 600 MHz magnet you should have approximately 600 MHz for a 1H axis, 150 MHz for a 13C axis, or 60 MHz for a 15N axis. The spectral width and spectrometer frequency are used to determine the ppm step between matrix data points. Changing the spectral width or spectrometer freqency keeps the ppm shift at the center of the spectrum the same. The -t flag can be used to zero small data values. The -t option can be used to conserve disk space. The resulting file must be compressed (eg. with gzip) to realize any space savings. Sparky cannot read the compressed file directly so it must be uncompressed before use. The -l flag outputs header information of use only if the data was processed with the obsolete UCSF Striker program.

Opening and Saving data

Use File Open (fo) to open UCSF format NMR data, Felix matrix files, or Sparky spectrum files. File Save (fs) writes the spectrum file for the current view. It does not modify Felix or UCSF NMR data files. If you want to modify a Felix matrix use Felix. You can extract a subregion from a UCSF NMR data file with the program ucsfdata. To save all the spectrum files you are currently working on use Project Save As (ja). In addition to updating all the spectra, a list of the spectra are recorded in a project file. Use Project Open (jo) to open all the spectra listed in a project file. Note that a project file just lists the spectra you are working on. So saving a project under a new name does not make a copy of your peak data. If you want to copy your peak data you need to copy the Sparky spectrum files.

Files that Sparky creates

Sparky reads UCSF format NMR data or Felix matrix files. It never creates or modifies NMR data files. The data you produce while using Sparky (peak positions, assignments, volumes, etc...) is saved by Sparky in spectrum files and project files. Spectrum files are usually placed in the ~/Sparky/Save/ directory and contain almost all the data relating to an individual spectrum. Peak positions, assignments, volumes, label positions, and what views you are currently looking at are all saved in these files. Project files are usually placed in the directory ~/Sparky/Projects/. A project file contains a list of spectrum files you are working on. It also contains information pertaining to more than one spectrum such as the list of resonances for each molecule and condition, view synchronizations, and view overlays.

Backup files

When you save a project file or spectrum file Sparky copies the previous contents of the file to a file with the suffix .BAK. If you accidently save bad data you can recover the previous file contents by copying the .BAK file.

You can have Sparky automatically backup your project every so many minutes. You set this using the preferences dialog under the file menu. Sparky then periodically saves copies of your current unsaved work in files with a .backup suffix. These files will be deleted when you quit Sparky. If Sparky crashes the files are not deleted. When you start Sparky again you will be asked if you want to use the automatic backup files, ignore them, or remove them. If you choose to use them Sparky will load the .backup files. If the auto backup data looks OK you can save your work bringing your files up to date. Sparky will not automatically backup files that you do not own. This is to prevent backup conflicts when you and the owner of the files are both looking at the data with separate Sparky sessions.

UCSF NMR data format

Here is a sketchy description of the UCSF NMR data format as used by Sparky. Sparky needs to know the dimension, data matrix size, data values, nucleus types, and parameters to convert data indices to ppm and Hz. The ppm and Hz scales are determined by knowing along each axis the transmitter frequency, spectral width, and ppm value for index 0. Given this information you can write a UCSF NMR data file for input to Sparky. The file format is somewhat complicated. The data matrix is broken down into small arrays so that small regions of the data can be read with few disk accesses.

A UCSF NMR data file is a binary file with a 180 byte header, followed by 128 byte headers for each axis of the spectrum, followed by the spectrum data. All integer and float values in the file have big endian byte order independent of the native byte order for the machine that wrote the file. An int has big endian order if the most significant byte appears first in the file (or at the lowest memory address). Sun Sparc and SGI Rx000 processors are big endian and DEC Alpha and Intel x86 are little endian. If you write a conversion program to produce UCSF format the code should check the native byte order at run time and reverse floats and ints if necessary before writing them out. All float values are in IEEE format. All header bytes not described below should be set to zero.

The 180 byte header contains:

positionbytescontentsrequired value
010file type = UCSF NMR (8 character null terminated string)
101dimension of spectrum
111number of data components = 1 for real data
131format version number = 2 for current format

The first byte in the file is position 0. A complex spectrum has two components. Sparky only reads real data and I will only describe below the layout for real data so set the number of components to 1. Use format version number 2.

For each axis of the spectrum write a 128 byte header of the following form:

06nucleus name (1H, 13C, 15N, 31P, ...) null terminated ASCII
84integer number of data points along this axis
164integer tile size along this axis
204float spectrometer frequency for this nucleus (MHz)
244float spectral width (Hz)
284float center of data (ppm)

Next comes the spectrum data floating point values. The full data matrix is divided into tiles. Each tile is written to the file as an array with the highest axis varies fastest. If a tile size does not evenly divide the data matrix size then there will be a partial tiles at the edge. Partial tiles are written out as a full tiles with zero values for the region outside the data matrix. The tiles themselves form an array and are written out in sequence with highest axis varying fastest.

Here is a detailed example of how the floating point values are written. Suppose we have a 2D spectrum with matrix that is 1024 by 4096 data points. Each value is a 4 byte IEEE floating point number. So the data takes up 16 MBytes (1024 x 4096 x 4 bytes). Sparky wants to be able to show you a small region of the spectrum without any delay, so the data is stored in small rectangular blocks (tiles). Only the tiles needed to display the region you are viewing need to be read from disk. The size of tiles in UCSF format files is typically 32 Kbytes or 16 Kbytes, although a little bit bigger sizes like 128 Kbytes would be reasonable with current disk drive seek times (circa 1999). The current conversion programs that produce UCSF format divide each axis of the data matrix by 2 in steps until the resulting tile size is less than or equal to 32 Kbytes. For purposes of example I will instead take tiles that are 128 by 256 floating point values. So each one is 128 Kbytes ( = 128 * 256 * 4). Then an 8 = (1024 / 128) by 16 = (4096 / 256) grid of tiles covers the 1024 by 4096 data matrix. Denote a point in the data matrix like (312,1587) and a tile {2,5}. Call the two axes w1 and w2. Tile {0,0} includes a rectangle of data values from 0 to 127 along axis w1, and 0 to 255 along axis w2. Tile {a,b} is the data rectangle a*128 to (a+1)*128 - 1 along axis w1 and b*256 to (b+1)*256 - 1 along axis w2.

You write the tiles in the following order.





This order is what was meant above by the phrase "highest axis varies fastest". In this case the w2 axis varies fastest. Each tile is output with the same rule. Tile {0,0} has its 128 by 256 floating point values output in the following order:





For 3D data the tiles are 3D data cubes. The order of writing the grid of tiles out has the w3 axis varying fastest, w2 second fastest, and w1 slowest. Likewise each tile's cube of floating point values are written out in order having the w3 axis vary fastest, w2 second fastest, and w1 slowest.