Posts

Calculate PubMLST sequence types using Python

29 March 2024

PubMLST, or Public databases for Molecular Typing and Microbial Genome Diversity, is a web-based platform that provides access to databases of microbial genetic sequences. It is primarily used for microbial typing, strain comparison, and epidemiological studies. The primary purpose of PubMLST is to facilitate the standardization and sharing of microbial sequence data. The databases hosted by PubMLST typically include sequence types (STs), allelic profiles, and sometimes additional metadata for each strain. Users can query the databases, submit their own sequences...

Read more...

Two quick ways of building a bacterial species phylogeny

22 March 2024

Performing phylogenetic analysis with whole or core genome sequences maximizes the information used to estimate phylogenies and the resolution of closely related species. Usually sequences are aligned with a reference species or strain. However genome alignment is a process that does not scale well computationally. Even for small numbers of genomes it can be time consuming. Here are two relatively painless ways make a bacterial species phylogeny that you can do yourself. This might be useful if you are concerned...

Read more...

Filtering a QTableView with QSortFilterProxyModel

28 January 2024

Qt is a popular GUI toolkit for writing desktop applications. It has bindings for Python using either PySide2 or PyQt5 (which use essentially identical syntax). QTableView is the default class for table representations. It uses a QAbstractTableModel as the data source class. These can be sub classed to use whatever data backend you want. Here is some code using a pandas DataFrame as the data source. It’s not actually essential for the example, but could be extended with more complex...

Read more...

Finding genes in a genome or assembly with Python

21 January 2024

There are lots of tools for finding specific genes inside genome sequences. The well established technique is to blast the query gene sequence to your own (the target). You generally need to use some threshold of percentage identity and coverage of the sequence to filter results. Here is a method in Python that uses blast to search a genome sequence in a fasta file. The sequence can be anything like full genomes, assembly contigs or a short segment of sequence....

Read more...

Fetch assemblies and associated biosample data using Entrez tools with Biopython

19 January 2024

This is a somewhat altered version of code from an old post for downloading assemblies. GenBank provides access to information on all it’s assembled genomes via the assembly database. You have several options: download assemblies individually or in bulk via the website or the Entrez database search system using command line tools without the website. The later is good for large automated searches/downloads. The BioPython package provides an interface to the entrez tools too. esearch - searches an NCBI database...

Read more...

Excess mortality in Ireland still high in 2023

05 January 2024

Previously we have looked at potential excess deaths in Ireland using data retrieved from RIP.ie. These represent quite accurate real-time all-cause mortality estimates that are an alternative to the official GRO data which lag behind by many months. It appears that the unusually high mortality signal present since around the start of 2021 continues to the present (December 2023). Below are updated plots showing now including 2023 data. Note that below total deaths are used for the remaining analysis. Strictly...

Read more...

A Panel app for image-to-image generation

29 December 2023

Previously we saw how to implement the Stable Diffusion image-to-image model using the Python diffusers library. There are plenty of websites now that offer AI image generation since it has become so popular. This post simply shows how you can make your own basic web dashboard with Panel that does something similar. The app allows someone to upload an image and generate new ones with a prompt and some of the settings previously demonstrated. Generated images are placed in a...

Read more...

image-to-image with Stable Diffusion in Python

18 December 2023

Previously we saw how to implement the Stable Diffusion text-to-image model using the Python Diffusers library, which is a library for state-of-the-art pre-trained diffusion models. It is hosted by huggingface. You can also use the image-to-image pipeline to make text guided image to image generations. This is essentially using one image as a template to make another. This would be useful for converting simple sketches to refined looking artwork or concept art. Or maybe you want to constrain a particular...

Read more...

Speech diarization with OpenAI whisper and pyannote.audio

12 November 2023

OpenAIs whisper library is an effective and free means of doing speech-to-text analysis. It’s easy to use once installed and will output a set of files with timestamps for each sentence spoken. This is ideal for things like subtitling videos. It does not identify individual speakers however, so won’t group the conversation into passages according to who is speaking. This process is called speech diarization and can be acchieved using the pyannote-audio library. This is based on PyTorch and hosted...

Read more...

Condition Stable Diffusion images with ControlNet

21 September 2023

This is another aspect of the Stable Diffusion AI art library, covered previously. With ControlNet, users can ‘condition’ the generation of an image with a spatial context such as a segmentation map or a scribble. That is, you can weight the model to produce images that are constrained to the form of another. We can turn a cartoon drawing into a realistic photo for example, or place another face in a portrait. We can still provide a prompt to guide...

Read more...

All posts