top of page

Group

Public·63 friends
Anthony Edwards
Anthony Edwards

SPAdes Manual: Installation and Running Guide for SPAdes Genome Assembler



How to Download and Use SPAdes Assembler




SPAdes - St. Petersburg genome assembler - is an assembly toolkit that contains various assembly pipelines for different types of sequencing data. It was originally developed for de novo assembly of bacterial and viral genomes from single-cell or isolate samples, but it has been extended to support metagenomic, plasmid, transcriptomic, and biosynthetic gene cluster assembly as well. SPAdes can also perform hybrid assembly using short reads (Illumina or IonTorrent) and long reads (PacBio, Oxford Nanopore, or Sanger). SPAdes is one of the most widely used assemblers in the field, and it has several advantages over other assemblers, such as:


  • It can handle complex repeat structures and large genome variations.



  • It can produce high-quality assemblies with low error rates and high gene completeness.



  • It can assemble genomes from low-coverage or unevenly distributed data.



  • It can assemble multiple genomes from mixed samples.



  • It can assemble novel sequences that are not present in reference genomes.



In this article, I will show you how to download and use SPAdes assembler for your own genome assembly projects. I will cover the following topics:




download spades assembler


DOWNLOAD: https://www.google.com/url?q=https%3A%2F%2Ft.co%2FeKcSBIPtCh&sa=D&sntz=1&usg=AOvVaw0m3-gyg6cTzTfSLG7CVvcW



  • How to download SPAdes binaries or source code for Linux or Mac.



  • How to verify your installation and run a self-test.



  • How to provide input data and command line options for different assembly pipelines.



  • How to evaluate the output files and statistics.



By the end of this article, you should be able to perform de novo genome assembly using SPAdes with confidence and ease. Let's begin!


Downloading SPAdes




The first step is to download SPAdes from its official website: http://cab.spbu.ru/software/spades/. You can choose to download either the pre-compiled binaries or the source code, depending on your operating system and preference. The latest version of SPAdes is 3.15.5, which was released on July 14th, 2022 under GPLv2 license.


Downloading SPAdes binaries for Linux




If you are using a Linux system (64-bit only), you can download the pre-compiled binaries from the website. The file name is SPAdes-3.15.5-Linux.tar.gz. You can use the following command to download it:


wget http://cab.spbu.ru/files/release3.15.5/SPAdes-3.15.5-Linux.tar.gz


Alternatively, you can use a web browser to download it manually. After downloading, you need to extract the file using the following command:


tar -xzf SPAdes-3.15.5-Linux.tar.gz


This will create a folder named SPAdes-3.15.5-Linux, which contains the executable files and other resources for SPAdes.


Downloading SPAdes binaries for Mac




If you are using a Mac system (64-bit only), you can download the pre-compiled binaries from the website as well. The file name is SPAdes-3.15.5-Darwin.tar.gz. You can use the following command to download it:


How to download spades assembler for Linux


Download spades assembler binaries for Mac OS


Download and compile spades assembler source code


Verify spades assembler installation


Spades assembler input formats and options


Spades assembler command line usage and examples


Spades assembler output files and formats


Spades assembler performance and benchmarks


Spades assembler manual and support


Spades assembler citation and publications


Download spades assembler for metagenomic data sets


Download spades assembler for plasmid extraction and assembly


Download spades assembler for transcriptome assembly


Download spades assembler for biosynthetic gene cluster assembly


Download spades assembler for RNA viral data sets


Spades assembler pipeline overview and comparison


Spades assembler hybrid mode with PacBio, Nanopore or Sanger reads


Spades assembler HMM-guided mode with gene models


Spades assembler coronaSPAdes pipeline for coronavirus data sets


Spades assembler rnaviralSPAdes pipeline for RNA viral data sets


Spades assembler metaviralSPAdes pipeline for viral metagenomes


Spades assembler metaSPAdes pipeline for metagenomes


Spades assembler plasmidSPAdes pipeline for plasmids from WGS data sets


Spades assembler metaplasmidSPAdes pipeline for plasmids from metagenomes


Spades assembler rnaSPAdes pipeline for RNA-Seq data sets


Spades assembler biosyntheticSPAdes pipeline for biosynthetic gene clusters


Spades assembler GAGE-B data sets benchmark results and analysis


Spades assembler stand-alone binaries and tools description and usage


Spades assembler k-mer counting tool (spades-hammer)


Spades assembler k-mer coverage read filter tool (spades-bwa)


Spades assembler k-mer cardinality estimating tool (spades-kmercount)


Spades assembler graph construction tool (spades-core)


Spades assembler long read to graph alignment tool (spaligner)


Spades assembler hybridSPAdes aligner tool (hybrid_aligner)


Spades assembler assembly evaluation tool (quast)


Download spades assembler latest version 3.15.5 from official website


Download spades assembler previous versions from GitHub repository


Download spades assembler example data sets and reference genomes


Subscribe to spades assembler updates and news via email or Twitter


Provide feedback and bug reports to spades assembler developers via email or GitHub issues


Learn more about spades assembler features and algorithms from SPAdes papers and protocols


wget http://cab.spbu.ru/files/release3.15.5/ SPAdes-3.15.5-Darwin.tar.gz


Alternatively, you can use a web browser to download it manually. After downloading, you need to extract the file using the following command:


tar -xzf SPAdes-3.15.5-Darwin.tar.gz


This will create a folder named SPAdes-3.15.5-Darwin, which contains the executable files and other resources for SPAdes.


Downloading SPAdes source code




If you prefer to compile SPAdes from source code, or if you are using a different operating system, you can download the source code from the website as well. The file name is SPAdes-3.15.5.tar.gz. You can use the following command to download it:


wget http://cab.spbu.ru/files/release3.15.5/SPAdes-3.15.5.tar.gz


Alternatively, you can use a web browser to download it manually. After downloading, you need to extract the file using the following command:


tar -xzf SPAdes-3.15.5.tar.gz


This will create a folder named SPAdes-3.15.5, which contains the source code and other resources for SPAdes.


To compile SPAdes from source code, you need to have some prerequisites installed on your system, such as CMake, GCC, Python 2 or 3, zlib, bzip2, and Boost libraries. You can check the detailed instructions on how to install these prerequisites on the SPAdes website: http://cab.spbu.ru/software/spades/#prereq. Once you have installed the prerequisites, you can use the following commands to compile SPAdes:


cd SPAdes-3.15.5 ./spades_compile.sh


This will create an executable file named spades.py in the bin folder.


Installing SPAdes




After downloading and extracting (or compiling) SPAdes, you need to install it on your system. The installation process is very simple and straightforward. You just need to add the bin folder of SPAdes to your system's PATH variable, so that you can run SPAdes from any directory.


Installing SPAdes on Linux




If you are using a Linux system, you can add the bin folder of SPAdes to your PATH variable by editing your .bashrc file (or equivalent) in your home directory. You can use the following command to open the file with a text editor (such as nano):


nano /.bashrc


Then, add the following line at the end of the file (replace /path/to/SPAdes-3.15.5-Linux/bin with the actual path of your SPAdes bin folder):


export PATH=$PATH:/path/to/SPAdes-3.15.5-Linux/bin


Save and close the file, and then run the following command to apply the changes:


source /.bashrc


You can now run SPAdes from any directory by typing spades.py.


Installing SPAdes on Mac




If you are using a Mac system, you can add the bin folder of SPAdes to your PATH variable by editing your .bash_profile file (or equivalent) in your home directory. You can use the following command to open the file with a text editor (such as nano):


nano /.bash_profile


Then, add the following line at the end of the file (replace /path/to/SPAdes-3.15.5-Darwin/bin with the actual path of your SPAdes bin folder):


export PATH=$PATH:/path/to/SPAdes-3.15.5-Darwin/bin


Save and close the file, and then run the following command to apply the changes:


source /.bash_profile


You can now run SPAdes from any directory by typing spades.py.


Verifying SPAdes installation and running a self-test




After installing SPAdes, you should verify that it works properly on your system. You can do this by running a self-test that comes with SPAdes. The self-test will run SPAdes on a small dataset and check if the output matches the expected results.


To run the self-test, you need to go to the test folder of SPAdes, which is located inside the main SPAdes folder. You can use the following command to go there:


cd /path/to/SPAdes-3.15.5/test


Then, you can run the self-test by typing:


./spades.py --test


This will launch SPAdes in test mode and run it on a small dataset of E. coli reads. The test will take a few minutes to complete, and it will generate some output files in a folder named spades_test. You should see something like this at the end of the test:


===== Test passed OK =====


This means that SPAdes ran successfully and produced the correct output. If you see any errors or warnings, you should check the log file (spades.log) for more details and troubleshoot the problem.


Running SPAdes




Now that you have installed and verified SPAdes, you are ready to use it for your own genome assembly projects. To run SPAdes, you need to provide some input data and some command line options for different assembly pipelines.


Providing input data




The input data for SPAdes are sequencing reads from one or more samples. SPAdes can handle various types of reads, such as:


  • Illumina paired-end (PE) or mate-pair (MP) reads.



  • IonTorrent PE or MP reads.



  • PacBio single-molecule real-time (SMRT) reads.



  • Oxford Nanopore MinION or GridION reads.



  • Sanger reads.



  • Mixed reads from different sources.



You need to specify the type and format of your input reads using different command line options. The most common options are:


OptionDescription


-1 <filename>The file name with forward PE reads (in FASTQ or FASTA format).


-2 <filename>The file name with reverse PE reads (in FASTQ or FASTA format).


--s1 <filename>The file name with unpaired reads (in FASTQ or FASTA format).


--pacbio <filename>The file name with PacBio SMRT reads (in FASTQ or FASTA format).


--nanopore <filename>The file name with Oxford Nanopore reads (in FASTQ or FASTA format).


--sanger <filename>The file name with Sanger reads (in FASTQ or FASTA format).


--pe1-12 <filename>The file name with interlaced forward and reverse PE reads (in FASTQ or FASTA format).


--mp1-12 <filename>The file name with interlaced forward and reverse MP reads (in FASTQ or FAST A format).


You can use multiple options to provide reads from different sources or libraries. For example, if you have PE reads from Illumina and SMRT reads from PacBio, you can use the following options:


-1 illumina_pe_1.fastq -2 illumina_pe_2.fastq --pacbio pacbio_smrt.fastq


You can also use the --dataset <filename> option to provide a YAML file that describes your input data in more detail. For example, you can specify the library type, orientation, insert size, quality offset, and coverage for each file. You can find more information on how to create a YAML file on the SPAdes website: http://cab.spbu.ru/software/spades/#dataset.


Choosing command line options for different assembly pipelines




The next step is to choose the appropriate command line options for the assembly pipeline that suits your data and goal. SPAdes has several assembly pipelines for different types of data, such as:


  • --sc: Single-cell assembly pipeline for bacterial or viral genomes from single-cell or isolate samples.



  • --meta: Metagenomic assembly pipeline for mixed microbial communities.



  • --plasmid: Plasmid assembly pipeline for plasmid detection and extraction.



  • --rna: Transcriptomic assembly pipeline for RNA-Seq data.



  • --isolate: Isolate assembly pipeline for bacterial or viral genomes from isolate samples.



  • --moleculo: Moleculo assembly pipeline for long synthetic reads from Moleculo technology.



  • --bga: Biosynthetic gene cluster assembly pipeline for secondary metabolite gene clusters.



You can use one of these options to run the corresponding pipeline, or you can omit them to run the default pipeline, which is suitable for most cases. For example, if you want to assemble a bacterial genome from single-cell data, you can use the following option:


--sc


If you want to assemble a metagenomic sample from mixed reads, you can use the following option:


--meta


If you want to assemble a transcriptome from RNA-Seq data, you can use the following option:


--rna In addition to these pipeline options, you can also use some other options to customize your assembly process, such as:


  • -k <value>: The k-mer size to use for assembly. You can specify a single value (e.g. -k 21) or a comma-separated list of values (e.g. -k 21,33,55). The default value is auto, which means that SPAdes will choose the optimal k-mer size based on your data.



  • -t <value>: The number of threads to use for assembly. The default value is 16.



  • -m <value>: The amount of RAM to use for assembly in GB. The default value is 250.



  • --careful: The option to run SPAdes in careful mode, which will reduce the number of mismatches and short indels in the resulting assembly.



  • --only-assembler: The option to run only the assembly module of SPAdes, without performing error correction or read mapping.



  • --continue: The option to resume a previously interrupted run of SPAdes from the last available checkpoint.



You can find more information on the available command line options on the SPAdes website: http://cab.spbu.ru/software/spades/#manual.


Evaluating SPAdes output




After running SPAdes, you will get some output files and statistics in a folder named after your project. For example, if you run SPAdes with the following command:


./spades.py -1 illumina_pe_1.fastq -2 illumina_pe_2.fastq --pacbio pacbio_smrt.fastq -o my_project


You will get a folder named my_project, which contains the following files and subfolders:


File or subfolderDescription


spades.logThe log file that records the progress and status of SPAdes.


params.txtThe file that contains the parameters and options used for SPAdes.


dataset.infoThe file that contains the information about the input data.


corrected/The subfolder that contains the error-corrected reads.


mismatch_corrector/The subfolder that contains the mismatch-corrected contigs and scaffolds.


K21/ K33/ K55/ .../The subfolders that contain the intermediate assemblies for each k-mer size.


scaffolds.fastaThe final assembly file that contains the scaffolds (sequences with gaps).


contigs.fastaThe final assembly file that contains the contigs (sequences without gaps).


assembly_graph.fastgThe final assembly graph file in FASTG format.


scaffolds.pathsThe file that contains the paths of contigs in scaffolds.


contigs.paths(#message) Continue writing the article. [assistant](#message) The file that contains the paths of edges in contigs.


spades.yamlThe file that contains the summary statistics and quality metrics of the final assembly.


To evaluate the quality and accuracy of your assembly, you can look at some of these output files and statistics. For example, you can check the following metrics:


  • The number and length of scaffolds and contigs. You can use tools like QUAST or MetaQUAST to generate a comprehensive report on these metrics.



The N50 and NG50 values of scaffolds and contigs. These are measures of contiguity and completeness of your assembly. The higher the values, the better the assembly. You A: You can cite SPAdes using the following reference: Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012 May;19(5):455-77. doi: 10.1089/cmb.2012.0021. You can also use the BibTeX format: @articlebankevich2012spades, title=SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing, author=Bankevich, Anton and Nurk, Sergey and Antipov, Dmitry and Gurevich, Alexey A and Dvorkin, Mikhail and Kulikov, Alexander S and Lesin, Vladislav M and Nikolenko, Sergey I and Pham, Son and Prjibelski, Andrey D and Pyshkin, Alexey V and Sirotkin, Alexander V and Vyahhi, Nikolay and Tesler, Glenn and Alekseyev, Max A and Pevzner, Pavel A, journal=Journal of Computational Biology, volume=19, number=5, pages=455--477, year=2012, publisher=Mary Ann Liebert Inc Q: How do I


About

Welcome to the group! You can connect with other members, ge...

Friends

Group Page: Groups_SingleGroup
bottom of page