﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	blockedby	blocking	notify_on_close	platform	project
7358	Update AlphaFold database sequence searches for July 2022 release of 200 million structures	Tom Goddard	Tom Goddard	"The EBI released an update to the AlphaFold database increasing from 1 million structures to 200 million structures.

https://www.embl.org/news/technology-and-innovation/alphafold-200-million/

We should update the ChimeraX alphafold match and search commands to use the new sequence database.  But the searches will be very slow, maybe 30 minutes our an hour using our current BLAT and BLAST search methods.  To handle 200 million sequences we will probably need to use mmseqs.  It is claimed that mmseqs2 can run about 100 times faster than BLAST with the same sensitivity up to 10000 times faster at reduced sensitivity.

It is going to be hard to achieve fast searches.  The 200 million sequences would take 70 Gbytes uncompressed if average sequence length is 350 (guess).  The mmseqs2 docs indicates that it want to hold the entire index in memory for the fast searches it reports and that takes 7 times the uncompressed sequence size, so 500 Gbytes of memory.  For ColabFold which is by the mmseqs2 authors they use a server with 48 cores and 768 Gbytes of memory to allow fast searches of the few billion sequences used for AlphaFold prediction.   mmseqs2 can run with the index loaded in blocks with less memory.

At any rate, the server infrastructure and maintenance may be beyond our resources.  It would be better if we did not have to maintain the sequence database and server.

Possibly we can use the EBI sequence search which includes the AlphaFold database.

Another possibility is to use the mmseqs web server, https://search.mmseqs.com.  Unfortunately that server web page says it is no available now saying they don't have enough resources to run it and ColabFold.  But it says ""MMseqs2 search will be back"".



"	enhancement	closed	moderate		Structure Prediction		fixed					tic20, zjp, meng	all	ChimeraX
