A sequence alignment viewer with Bokeh and Panel

July 11 2019


Sequence alignment is a common task in molecular biology. There are many tools for viewing such alignments like AliView, Clustalx and Mega. Following from a previous article on using pyviz/panel for dashboards, this post details how to make our own multiple sequence alignment viewer with Panel and Bokeh. Bokeh is an interactive plotting library for Python for use in web browsers and dashboards. Bokeh renders its graphics using HTML and JavaScript and can be very flexible if you are willing to invest the time in learning how to customise it. Here we want to be able to:

  • display text in a grid with a row for each sequence
  • be able to scroll from left to right
  • color by base
  • have a preview plot showing an overview

We will then make a basic app using Panel widgets to show how it can be used in a dashboard. As before the code is written, tested and run inside a Jupyter notebook. It can be deployed as a standalone server too. To install all requirements for this example run the following. Note I tested with python3 for ubuntu use pip3 to ensure you install for the right version. You can also use conda.

pip install jupyter biopython panel bokeh


Once you have started a new jupyter notebook you can import the required packages as follows:

import os, io, random
import string
import numpy as np

from Bio.Seq import Seq
from Bio.Align import MultipleSeqAlignment
from Bio import AlignIO, SeqIO

import panel as pn
import panel.widgets as pnw

from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, Plot, Grid, Range1d
from bokeh.models.glyphs import Text, Rect
from bokeh.layouts import gridplot

Bokeh sequence plotter

This code creates a bokeh figure and plots text glyphs in a grid. This is done by creating a ColumnDataSource object. This is a key structure in bokeh that deals with interpreting your data to the glyphs you see in the plot. It can be created from pandas dataframes, numpy arrays or lists. Here we make the coordinates, sequence text and colors into 1D arrays and pass them all to the constructor. The columns in this object are later referenced when adding plot elements. Note that we use the same source object for both plots. You can refer to the Bokeh documentation for more details. The function takes a biopython alignment object as input.

def view_alignment(aln, fontsize="9pt", plot_width=800):
    """Bokeh sequence alignment view"""

    #make sequence and id lists from the aln object
    seqs = [rec.seq for rec in (aln)]
    ids = [rec.id for rec in aln]    
    text = [i for s in list(seqs) for i in s]
    colors = get_colors(seqs)    
    N = len(seqs[0])
    S = len(seqs)    
    width = .4

    x = np.arange(1,N+1)
    y = np.arange(0,S,1)
    #creates a 2D grid of coords from the 1D arrays
    xx, yy = np.meshgrid(x, y)
    #flattens the arrays
    gx = xx.ravel()
    gy = yy.flatten()
    #use recty for rect coords with an offset
    recty = gy+.5
    h= 1/S
    #now we can create the ColumnDataSource with all the arrays
    source = ColumnDataSource(dict(x=gx, y=gy, recty=recty, text=text, colors=colors))
    plot_height = len(seqs)*15+50
    x_range = Range1d(0,N+1, bounds='auto')
    if N>100:
    #view_range is for the close up view
    view_range = (0,viewlen)
    tools="xpan, xwheel_zoom, reset, save"

    #entire sequence view (no text, with zoom)
    p = figure(title=None, plot_width= plot_width, plot_height=50,
               x_range=x_range, y_range=(0,S), tools=tools,
               min_border=0, toolbar_location='below')
    rects = Rect(x="x", y="recty",  width=1, height=1, fill_color="colors",
                 line_color=None, fill_alpha=0.6)
    p.add_glyph(source, rects)
    p.yaxis.visible = False
    p.grid.visible = False  

    #sequence text view with ability to scroll along x axis
    p1 = figure(title=None, plot_width=plot_width, plot_height=plot_height,
                x_range=view_range, y_range=ids, tools="xpan,reset",
                min_border=0, toolbar_location='below')#, lod_factor=1)          
    glyph = Text(x="x", y="y", text="text", text_align='center',text_color="black",
    rects = Rect(x="x", y="recty",  width=1, height=1, fill_color="colors",
                line_color=None, fill_alpha=0.4)
    p1.add_glyph(source, glyph)
    p1.add_glyph(source, rects)

    p1.grid.visible = False
    p1.xaxis.major_label_text_font_style = "bold"
    p1.yaxis.minor_tick_line_width = 0
    p1.yaxis.major_tick_line_width = 0

    p = gridplot([[p],[p1]], toolbar_location='below')
    return p

We can test this plotter by loading an alignment from a file and showing the plot in a Panel Bokeh pane.

aln = AlignIO.read('test.aln','fasta')
p = view_alignment(aln, plot_width=900)

A sequence alignment tool

Now we return to our previous example of a multiple sequence alignment app using Panel(PyViz). This one is an updated version using the Bokeh plotter and some new options. The widgets use Panel and allow us to select a varying number of sequences and lengths.

Helper functions

The following functions help us do some sequence utilities. For alignment we will use muscle which you can easily install on linux. The sequences are created by making rudimentary mutations to a single template.

def make_seq(length=40):    
    return ''.join([random.choice(['A','C','T','G']) for i in range(length)])

def mutate_seq(seq):
    """mutate a sequence randomly"""
    seq = list(seq)
    pos = np.random.randint(1,len(seq),6)    
    for i in pos:
        seq[i] = random.choice(['A','C','T','G'])
    return ''.join(seq)

def get_colors(seqs):
    """make colors for bases in sequence"""
    text = [i for s in list(seqs) for i in s]
    clrs =  {'A':'red','T':'green','G':'orange','C':'blue','-':'white'}
    colors = [clrs[i] for i in text]
    return colors

def muscle_alignment(seqs):
    """Align 2 sequences with muscle"""
    filename = 'temp.faa'
    SeqIO.write(seqs, filename, "fasta")
    name = os.path.splitext(filename)[0]
    from Bio.Align.Applications import MuscleCommandline
    cline = MuscleCommandline(input=filename, out=name+'.txt')
    stdout, stderr = cline()
    align = AlignIO.read(name+'.txt', 'fasta')
    return align

Aligner dashboard

Finally we can now create the dashboard that makes the random mutated sequences and displays them.

title = pn.pane.Markdown('## Sequence aligner')
load_btn = pn.widgets.FileInput()
aln_btn = pn.widgets.Button(name='align',width=100,button_type='primary')
randomseq_btn = pn.widgets.Button(name='random seqs',width=100,button_type='primary')
numseqs_input = pn.widgets.IntSlider(name='sequences',start=2,end=50,value=5,width=200)
length_input = pn.widgets.IntSlider(name='length',start=10,end=500,value=50,width=200)

seq_pane = pn.pane.Str(name='sequences',height=300)
result = pn.pane.Str("empty",width=600)
bokeh_pane = pn.pane.Bokeh(height=100,margin=10)

def create_sequences(event):
    #creates a set of sequences using widget values
    seqlen = length_input.value
    startseq = make_seq(seqlen)
    num = numseqs_input.value
    for i in range(num):
        seq = mutate_seq(startseq)
        name = ''.join([random.choice(string.ascii_lowercase) for i in range(10)])
        s+='>%s\n' %name + seq+'\n'
    seq_pane.object = s

def align(event):
    #this function does the alignment using the textinput values    
    s = seq_pane.object
    sequences = SeqIO.parse(io.StringIO(s),format='fasta')
    sequences = [rec for rec in sequences]
    aln = muscle_alignment(sequences)    
    #the result widget is then updated    
    result.object = aln
    #aligned = [rec.seq for rec in (aln)]
    bokeh_pane.object = view_alignment(aln,fontsize="7pt",plot_width=600)

aln_btn.param.watch(align, 'clicks')
randomseq_btn.param.watch(create_sequences, 'clicks')

top = pn.Row(aln_btn,randomseq_btn,length_input,numseqs_input)
bottom = pn.Row(seq_pane,bokeh_pane, sizing_mode='stretch_height')
app = pn.Column(title,top,bottom)

The final result will look like this: