﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	blockedby	blocking	notify_on_close	platform	project
7388	BLAST of NR database uses 180 Gbytes of resident memory on server	Tom Goddard	Zach Pearson	"This ticket is about high server memory use by BLAST searches of NR database.  I'm not sure there is anything simple we can do about it.  Mostly it is good to know about the problem in case we experience server problems.

A single BLAST sequence search of the NR (non-redundant) database uses 180 Gbytes of resident memory on the server (franklin.cgl.ucsf.edu) and takes 50 minutes searching a 268 amino acid sequence (PDB 7u0u chain A).  This is using quite large compute resources, about half of the 384 Gbytes of server memory.  Here is the output of ""top"" near the completion of the job.

{{{
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                  
102779 apache    20   0  247.7g 180.6g 180.6g D  94.9 48.0  49:37.17 blastp                                   
}}}

I'm not sure what would happen if a few NR searches ran at the same time.  Maybe the server would function poorly with lots of thrashing.  Blast is using 1 thread.

I've been looking at how to do sequence searches of the AlphaFold database of 214 million sequences (ticket #7358) which is large (93 GB fasta file, and 65 GB blast database), but not as large as NR which has 179 GB blast database (nr*.psq files in /databases/mol/blast/db_current on plato).  It did not find any blastp command-line options that can limit the memory use.  I also tested BLAT on plato with AlphaFold database which also brought the whole database (93 GB) resident in memory at the same time and is used by the ChimeraX ""alphafold match"" command. BLAT also has no options to reduce memory use.   I also tested mmseqs2 which used even more memory (280 GB) using index files about 7 times larger than the sequence data.  mmseqs2 has some options that may limit memory use.  I'm noting what I find about those AlphaFold database search methods in ticket #7358."	enhancement	assigned	moderate		Sequence				chimerax-programmers Scooter Morris				all	ChimeraX
