Journal Club 08/13/07

Journal Club

fastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies. Tung CH, Yang JM. Nucleic Acids Res. 2007 Jul 1;35(Web Server issue):W438-43.

Rationale:

Reducing 3D structures to 1D descriptions allows faster comparisons to structures in databases, thus rapid SCOP superfamily classification and domain boundary prediction.

Summary:

The fastSCOP server is an alternative to fold comparison servers like Dali and CE. It is faster, plus gives SCOP classifications, predicted domain boundaries, and multiple structure alignment information. Other classifications (CATH, etc.) may be added in the future. Being structure-based, it is expected to be better at finding distant relationships than sequence/HMM comparison servers such as SUPERFAMILY.

Method:

...they need spell-checking!

3D-BLAST:

each 5-residue segment of a structure (sliding window) is represented by one of 23 characters in the structural alphabet
a query's structural alphabet sequence is BLASTed against those previously determined for a database of structures (see substitution matrix), in this case SCOP 1.71 (~76,000 domains, 1589 superfamilies) filtered at 95% structural alphabet ID (~13,000 domains).

3D-BLAST is superfast, but problems arise from multi-domain queries and alignment shifts along secondary structure elements. Subsequent structural superposition with MAMMOTH helps solve these problems. This is still much faster than all-by-all structure comparisons as only 10 hits from 3D-BLAST are superimposed with the query.

Issues:

do they mean that only one BLAST-3D hit per superfamily is passed to MAMMOTH?
what if more than one superfamily meets the assignment criteria for a given segment, or does that never happen?
how does repeating the steps help if something ended up unassigned?

Their tests:

they searched SCOP 1.67 with the 586 structures (736 domains) that were new in SCOP 1.69 – results
they searched SCOP 1.71 with 8700 newer structures, got assignments for 84% of these in <10 hrs

My test:

I entered 1rvk chain A, then waited about 10 seconds...

Then I clicked on the link for the N-terminal domain region, which gives a list of hits in the assigned superfamily. There must be some cutoff for what gets listed, because when I searched with two different enolase superfamily members, I got different numbers of hits listed for the SCOP "Enolase N-terminal domain-like" superfamily.

Clicking a structure icon gives an image and Chime-plugin view of a pairwise superposition and an option to download them in PDB format. Instead, I checked all the boxes (there was an option to do that at the bottom) and clicked the "Multiple_structural_alignment" button...

I cut and pasted the amino acid sequence alignment into a text file and edited it into an aligned fasta file. I opened the fasta alignment in Chimera, auto-loaded all the structures (since they were named by PDB and SCOP domain IDs), and matched them using the alignment.

P.S. The SUPERFAMILY server took much longer, at least in real time. It found the "Enolase N-terminal domain-like" superfamily and family for 1rvk chain A, but nothing for the C-terminal domain.