Blast Protein performs protein sequence searches using
a BLAST web service hosted by the
Resource for Biocomputing, Visualization, and Informatics (RBVI).
Corresponding structures can be retrieved and automatically superimposed,
and the pseudo-multiple alignment
from BLAST can be shown in
The search results are included in saved sessions.
There are several ways to start
Blast Protein, a tool in the Sequence category.
When it is started from the
Info menu in
any of the sequences in the alignment can be specified as the query.
Otherwise, the query sequence can be:
- From Structure
- a protein chain from a structure open in Chimera
- Plain Text
- a protein sequence entered or pasted in the Sequence field
as plain text
Clicking OK launches the search and dismisses the dialog.
Apply launches the search without dismissing the dialog,
Cancel simply dismisses the dialog,
and Help opens this manual page in a browser window.
- blast (default) - protein-protein search
- psiblast - iterated protein-protein search, where each pass
after the first uses a PSSM (position-specific scoring matrix)
based on the results of the previous pass
- pdb (default) - sequences of structures in the
Protein Data Bank (PDB).
Whereas completely identical sequences are collapsed to a single entry
in the pdb sequence database from NCBI, the
web service uses a customized version of this database in which the
identical sequences are separate entries. The List only... option
controls whether multiple hits with the same PDB identifier or only
the best-matching one should be included in the results.
- nr - all non-redundant GenBank CDS translations +
RefSeq Proteins + PDB + SwissProt + PIR + PRF. This database is much
larger than pdb and will take longer to search.
- E-value (1e-X) - significance cutoff;
only matches with E-values ≤ 10-X will be returned
(X=3 by default)
- Matrix (BLOSUM45/BLOSUM62/BLOSUM80/PAM30/PAM70)
- amino acid substitution matrix to use for alignment scoring
- Passes (default 1)
- number of psiblast iterations; 1 pass is equivalent to blast,
whereas multiple passes will find more distantly related sequences
- List only best-matching chain per PDB entry (default)
- whether multiple hits with the same PDB identifier
(matches to different chains in that PDB entry) or only
the best-matching one should be included in the results
The search is a background task.
Clicking the information icon in the Chimera
will bring up the
in which the job can be canceled if desired.
Results are returned as a table of sequences with several columns of
information. Which columns are displayed can be set in the
Column configuration panel.
Some columns of information are available for nr sequences
regardless of whether they are in the pdb subset:
- GI - sequence GI number
(for the query, model name is shown instead)
- Evalue - significance value
- Score - alignment score
- Description - text description (truncated)
Additional columns are available for pdb sequences:
- PDB - corresponding PDB identifier, including chain identifier
- Authors - structure authors
- Chain names
- chain identifiers and descriptions of polymer chains in the structure
- Copies - number of copies of the hit chain in the structure
- Date - structure deposition date
- Ligand formulas - chemical formulae of ligand chemical components
- Ligand names - names of ligand chemical components
- Ligand smiles -
SMILES strings of ligand chemical components
- Ligand symbols - residue names of ligand chemical components
- Ligand weights - molecular weights of ligand chemical components
- Method - method of structure determination
- Polymers - number of different polymer chains in the structure
(not counting multiple copies of the same sequence)
- PubMed - PubMed identifier of literature reference, if any
- Residues - number of residues in the hit chain
- Resolution - crystallographic resolution
- Species - scientific name of source organism
- Title - structure title
- Total atoms
- total number of atoms in the structure (all chains)
- Total residues
- total number of residues in the structure (all chains)
- UniProt identifier,
if any, for hit chain
- Weight - molecular weight of hit chain
Other than PDB identifier, the information in these columns is
retrieved using a web service provided by the
One or more of the hits (lines in the table) can be chosen
with the left mouse button.
A block of lines can be chosen by dragging, or by
clicking on the first (or last) line in the desired block
and then Shift-clicking on its last (or first) line.
Ctrl-click toggles the state (chosen or not) of single line.
Buttons at the bottom of the results dialog:
- Show in MAV - display the
of the chosen sequences in
- Load Structure
- retrieve and open the PDB structures corresponding
to the chosen sequences (as described for
Fetch by ID); they will be superimposed
automatically according to the pseudo-multiple alignment.
Entire PDB entries are retrieved, not just the matching chains.
- PDB Web Site
entries for the chosen sequences in a browser window
- show/hide the Column configuration panel, which contains checkboxes
to set which columns are displayed.
A maximum column width (default 40 characters) can be specified.
Buttons within the panel:
- All - show all columns in the table
- Default - restore the previously saved configuration
- Standard - use the factory default configuration
- Set Default - save the current configuration to the Chimera
- Export... save the chosen rows, in the current sort order
and including the currently displayed columns,
as a file in CSV (comma-separated values) format.
If no rows are chosen, all will be exported. Note that some columns
may contain commas or newline characters (multiple lines), but these
are handled according to CSV standard by enclosing the values in quotes.
Spreadsheet software and other programs built to work with
the CSV standard will interpret the exported files correctly,
but user scripts need to account for the potential complexities,
and viewing the file contents directly may be confusing.
- Hide - hide the table without deleting the results;
it can be shown again using the Raise option of its
instance in the Tools menu.
This is also useful when the table has become obscured by other windows.
- Quit - delete the results
- Help - show this manual page in a browser window
Basic Local Alignment Search Tool (BLAST).
software is provided by the
and described in the following:
Gapped BLAST and PSI-BLAST:
a new generation of protein database search programs.
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ.
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402.
Basic local alignment search tool.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.
J Mol Biol. 1990 Oct 5;215(3):403-10.
The pseudo-multiple alignment from BLAST is not a true multiple alignment,
but a consolidation of the pairwise alignments of individual hits to the
query. This output corresponds to the BLAST formatting option
"flat query-anchored without identities,"
in which identities and conservative changes relative to the query
are shown in upper case, nonconservative changes in lower case.
UCSF Computer Graphics Laboratory / May 2012