AI art with Stable Diffusion in Python

September 19 2023

Background

Stable Diffusion is a text-to-image model trained on 512x512 images from a subset of the LAION-5B dataset. You can implement this model on your own computer using the Python Diffusers library, which is a library for state-of-the-art pre-trained diffusion models for generating images, audio, and 3D structures. It is hosted by huggingface. The workings of the library are beyond the scope of this post, there are guides here if interested. To make this work fast enough you should have a reasonably modern graphics card. There is also a GUI that allows you to use Stable Diffusion in a GUI if you don’t want to use Python.

Install

You can install everything with pip. It’s recommended to create a virtualenv for this. Note that exact install process will possibly change over time.

pip install -q diffusers==0.14.0 transformers xformers git+https://github.com/huggingface/accelerate.git

Code

import os, glob
import random, math
import numpy as np
import pandas as pd
import torch
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "stabilityai/stable-diffusion-2-1"
#model_id = "CompViz/stable-diffusion-2"

# Use the Euler scheduler here 
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16, safety_checker=None)
pipe = pipe.to("cuda")

We can then call the pipe object to create an image from a text prompt. The prompt function below is a convenient way to make multiple images at once and save them to the same folder with unique names. Prompts are a hit and miss process so you will find yourself discarding a lot of images to get the one you might want.

def prompt(prompt, n=1, style=None, path='.', negative_prompt=None):
    if style != None:
        prompt += ' by %s'%style
    if negative_prompt == None:
        negative_prompt = 'disfigured, lowres, bad anatomy, worst quality, low quality'
    for c in range(n):
        print (prompt)
        image = pipe(prompt, negative_prompt=negative_prompt).images[0]
        if not os.path.exists(path):
            os.makedirs(path)
        i=1
        imgfile = os.path.join(path,prompt[:90]+'_%s.png' %i)
        while os.path.exists(imgfile):
            i+=1
            imgfile = os.path.join(path,prompt[:90]+'_%s.png' %i)
        image.save(imgfile,'png')        
    return image

Examples

prompt('flowers',style='Alfred Sisley',n=5,path='test')

We can use the style of many artists and even combine them as shown below. Sometimes the picture will be a quite obvious combination of the two styles.

Other examples

Stable diffusion can often be used to capture the likeness of well known people, depending on how well trained it has been on that person. Here is an example of a famous actor:

Sometimes the style of the artist overwhelms the subject and you get a totally unexpected image. I added the word ‘impressionist’ to the prompt to get it to make paintings of Mr. Spock.

Finally to give an illustration of the breadth of styles that can be used to deptict one subject, here’s some pictures of Elvis:

Here are some styles/media to try in the prompts. You can find lots of web pages with more detailed guides.

"linocut"
"crayon drawing"
"pencil"
"engraving"
"risograph print"
"illustration"
"pen and watercolor"
"oil"
"pen and ink"
"3D model"
"analog film"
"anime"
"cinematic"
"craft clay"
"digital art"
"fantasy art"
"isometric"
"line art"
"lowpoly"
"neonpunk"
"origami"
"pixel art"
"texture"
"papercraft collage"

Grid image code

Here is the code for putting multiple images in a grid as used above. It may be of use elsewhere.

def tile_images(image_paths, outfile, grid=False, tile_width=300):
    """Make tiled image from folder. Assumes images are the same size."""

    from PIL import Image, ImageDraw
    images = [Image.open(path) for path in image_paths]
      
    ratio = images[0].height / images[0].width
    tile_height = int( tile_width * ratio )
    num_rows = int(math.sqrt(len(image_paths)))
    # Calculate number of cols
    num_columns = (len(images) + num_rows - 1) // num_rows

    tiled_width = num_columns * tile_width
    tiled_height = num_rows * tile_height
    tiled_image = Image.new("RGB", (tiled_width, tiled_height))

    for idx, image in enumerate(images):      
        row = idx // num_columns
        col = idx % num_columns
        x_offset = col * tile_width
        y_offset = row * tile_height
        tiled_image.paste(image.resize((tile_width, tile_height)), (x_offset, y_offset))
    if grid == True:
        draw = ImageDraw.Draw(tiled_image)
        # Draw borders around each tile
        for row in range(num_rows):
            for col in range(num_columns):
                x1 = col * tile_width
                y1 = row * tile_height
                x2 = x1 + tile_width
                y2 = y1 + tile_height
                draw.rectangle([x1, y1, x2, y2], outline=(0, 0, 0), width=3)  

    tiled_image.save(outfile)
    return tiled_image

We can use it like this:

files = glob.glob('images/*.png')
#get a nxn sample if needed ,i.e. 4, 9, 16
x = random.sample(files, 9)
tile_images(x, 'tiled.png', grid=True)