Posts

Plot a phylogenetic tree with bokeh and biopython

25 October 2024

Bokeh can be used to make a wide variety of interactive plots. Here we use it to plot an interactive phylogenetic tree. We skip the more complicated part of tree drawing by using the Biopython Phylo module to read a newick tree and then get the tip coordinates from the tree object. From here it’s a simple matter of connecting the tips with the branches and adding labels and so on. This code accepts a dataframe with metadata for the...

Read more...

Bacterial species identification from WGS using 16S genes

04 June 2024

The 16S rRNA gene is present in all bacteria and contains both highly conserved and hypervariable regions. The conserved regions allow for the design of universal primers that can amplify the gene across a wide range of bacteria and archaea. The hypervariable regions (e.g., V1-V9) provide species-specific sequences that can be used to distinguish between different taxa. This is a widely used means of species identification. With WGS we have the whole genome and relying on 16S alone is not...

Read more...

Plot and colour a minimum spanning tree with networkx

31 May 2024

A Minimum Spanning Tree (MST) is a concept from graph theory. Given a connected, undirected graph with weighted edges, an MST is a subset of the edges that connects all vertices together, without any cycles, and with the minimum possible total edge weight. While MSTs and phylogenetic trees both deal with the concept of connecting nodes in a graph, their purposes are different. Phylogenetic Trees represent evolutionary relationships among species or genes and have a root (representing a common ancestor)...

Read more...

An updated convenience function for ggtree with heatmaps

30 May 2024

Previously we looked at a convenience function for drawing and colouring phylogenetic trees with ggtree. This post contains an updated version of this function with some improvements. Recall that the appropriate meta data is provided as a data.frame object with row names matching tip names of the tree. The first column in cols is used for the tip colors. You also need to provide corresponding cmap values for the colormaps. Numeric data is just coloured with a predefined gradient. This...

Read more...

Calculate PubMLST sequence types using Python

29 March 2024

PubMLST, or Public databases for Molecular Typing and Microbial Genome Diversity, is a web-based platform that provides access to databases of microbial genetic sequences. It is primarily used for microbial typing, strain comparison, and epidemiological studies. The primary purpose of PubMLST is to facilitate the standardization and sharing of microbial sequence data. The databases hosted by PubMLST typically include sequence types (STs), allelic profiles, and sometimes additional metadata for each strain. Users can query the databases, submit their own sequences...

Read more...

Two quick ways of building a bacterial species phylogeny

22 March 2024

Performing phylogenetic analysis with whole or core genome sequences maximizes the information used to estimate phylogenies and the resolution of closely related species. Usually sequences are aligned with a reference species or strain. However genome alignment is a process that does not scale well computationally. Even for small numbers of genomes it can be time consuming. Here are two relatively painless ways make a bacterial species phylogeny that you can do yourself. This might be useful if you are concerned...

Read more...

Filtering a QTableView with QSortFilterProxyModel

28 January 2024

Qt is a popular GUI toolkit for writing desktop applications. It has bindings for Python using either PySide2 or PyQt5 (which use essentially identical syntax). QTableView is the default class for table representations. It uses a QAbstractTableModel as the data source class. These can be sub classed to use whatever data backend you want. Here is some code using a pandas DataFrame as the data source. It’s not actually essential for the example, but could be extended with more complex...

Read more...

Finding genes in a genome or assembly with Python

21 January 2024

There are lots of tools for finding specific genes inside genome sequences. The well established technique is to blast the query gene sequence to your own (the target). You generally need to use some threshold of percentage identity and coverage of the sequence to filter results. Here is a method in Python that uses blast to search a genome sequence in a fasta file. The sequence can be anything like full genomes, assembly contigs or a short segment of sequence....

Read more...

Fetch assemblies and associated biosample data using Entrez tools with Biopython

19 January 2024

This is a somewhat altered version of code from an old post for downloading assemblies. GenBank provides access to information on all it’s assembled genomes via the assembly database. You have several options: download assemblies individually or in bulk via the website or the Entrez database search system using command line tools without the website. The later is good for large automated searches/downloads. The BioPython package provides an interface to the entrez tools too. esearch - searches an NCBI database...

Read more...

Excess mortality in Ireland still high in 2023

05 January 2024

Previously we have looked at potential excess deaths in Ireland using data retrieved from RIP.ie. These represent quite accurate real-time all-cause mortality estimates that are an alternative to the official GRO data which lag behind by many months. It appears that the unusually high mortality signal present since around the start of 2021 continues to the present (December 2023). Below are updated plots showing now including 2023 data. Note that below total deaths are used for the remaining analysis. Strictly...

Read more...

All posts