BLAST

Tap to Read ➤ Buzzle Staff

With over 65 databses containing DNA, RNA, and protein sequences, how do scientists find what is the desired sequence and information? Instead of searching each one individually, they use a tool called BLAST―a free search tool that accesses scientific databases all over the world. This sure ain’t Google.

The ingenious BLAST tool was developed in 1990 by Stephen Altschul, Warren Gish, Webb Miller, Eugene Myers, and David Lipman, at the National Institutes of Health (NIH), USA.

Frequently in the course of their research, scientists come across questions that cannot be answered through a simple Google search. For example, if you have a specific protein sequence, you need to know where it came from, or which organisms have proteins identical or similar to it. Would you trust Yahoo for that? No. Do you have to find it out through trial and error by sequencing a whole bunch of proteins? No, that would take years. Do you have to read every study associated with your sequence? No―how would you even search that? Now, you use a brilliant search tool called BLAST.

BLAST stands for Basic Local Alignment Search Tool and is provided by the National Center for Biotechnology Information (NCBI). Despite its rather bland and non-descriptive name, it's a go-to tool for anyone involved in biomedical science, genetics, proteomics, biotechnological research, and bioinformatics.

It’s basically a giant search engine that accesses a host of databases including GenBank, Gene Expression Omnibus (GEO), protein databases, and about 65 others that include DNA, RNA, and protein sequences as well as the relevant information obtained through sequencing and analytical studies of the genes and proteins of several organisms.

BLAST is free to use and open to the public, but unless you have an advanced science degree, the interpretation of the results is a difficult task.

Searching

First you need to decide which BLAST search to use, and that depends on what you’re starting with (i.e., the query sequence) and what you want to find out (i.e., the objective of the search). That is, in case you have a DNA sequence, and you want to find identical or similar DNA sequences, the nucleotide blast can be used.

If you have a protein sequence and want to search for identical or similar protein sequences, it is the protein blast option you can make use of. Going beyond that, if you have a nucleotide sequence, you can search for proteins similar to the one coded by that sequence, or if you have a protein sequence, go for genes similar to the gene of that protein.

In addition to these basic searches, BLAST also provides the option for specialized searches. Primer-BLAST will show you specific primers for rDNA, GEO will show gene expression profiles, IgBLAST will give you antibodies and T-cell receptors.

and searching against PubChem BioAssay database will give you protein or nucleotide sequences along with their chemical structures and biological activities. You can select a particular species for your search, or search against all the sequences available in the database, depending on the intention of performing the search.

Then you enter your query sequence or the database accession number of the desired sequence, define your parameters, set and optimize the filters, and choose from a list of results.

Results

After you enter your search, you’ll see the page updating itself constantly for 2-3 seconds―remember, it's sifting through more than 65 separate databases here, and each one of those is quite extensive. At the end of 2-3 seconds, the results are displayed. The first thing you will see is a graphical representation showing colored lines that are aligned to the query sequence in a position-specific manner.

Given below that is a list of the search hits arranged in decreasing order of their similarity with the query sequence. The similarity is decided on the basis of alignment scores obtained on aligning the query sequence with those in the database. The more refined your search is, obviously, the shorter the list will be.

To the right of the list is the E-value (Expect value) for each hit, which indicates the number of sequences that would have aligned with the query sequence simply by chance. The lower the E-value, the higher the significance of the match.

Details

Clicking on a headline shows you the name of the molecule or gene, and the organism it comes from. There will be a list to the right with links to additional information about the gene in whatever database it appears, and a complete sequence of the gene.

The bar at the top of the sequence tells you how many of your search terms it matched, how many positive matches there are, and how many gaps are in the sequence.

Clicking on the "Graphics" link takes you to an NCBI page that features labeled regions, additional links to studies, the different domains present in the molecule, and sometimes even a 3D representation of the molecule.

BLAST is tough to figure out. Like most scientific tools, it requires a good understanding of the search parameters and the different scores and statistical values that accompany the results. It enables researchers to search and retrieve sequences with utmost convenience and speed, and any studies involving sequence analysis would be incomplete without BLAST.