Initially, all the fun for me with Stable diffusion was with txt2img – you type what you want to see and it attempts to make you a picture of that.
Then you discover img2img.
You take an exisiting image (or one you just generated using txt2img), and modify it using a text prompt. You can control how much of the original image it uses and how artistic it gets with your text input.
Well, that sounds like fun!
Being a closet narcissist that immediately meant experimenting with images of myself. Perhaps, as the image at the head of this post will surely show, it would be possible to make myself look cooler & more handsome!
I very quickly discovered that method wasn’t going to get me what I wanted. Not enough control, not enough scope, fun but ultimately unsatisfying.
That’s when I discovered Dreambooth, a way to train a version of the Stable Diffusion model using images of a person, so that you could add their name (or some other chosen word) to a prompt in txt2img generation and get that person in any style you could imagine.
Things in the AI art space are moving at a ridiculous pace, when I first became aware of Dreambooth you needed some serious GPU muscle (24GB of VRAM I seem to remember) to use it locally.
Four times what my aging GTX1060 has :(
Seemed I was out of luck.
(*Not true anymore, you can now run it locally on much less VRAM or your CPU, granted taking A LOT longer*)
Then I discovered Google Collab.
Another day of a steep learning curve and being right on the edge of what I could understand coding wise later and I could use the Shivam Shrirao implementation of Dreambooth in Google Collab to train a model on some photos of myself using free GPU time from Google over the web.
Then things started to get weird.
The first time I tried my newly trained model I got this:
I was speechless. A completely new photoreal picture of myself. WTAF?
(It was generated in the Google Collab right after my model finished training and I stupidly forgot to write down the exact prompt and seed used but it definitely was something about ‘wet plate photograph’)
Using the Stable Diffusion image search site lexica.art to discover interesting prompts I became (and still am!!!) obsessed with generating images of myself in various styles.
It’s an odd process, some prompts generate something amazing on the first try, others require several iterations until something stands out.
Inevitably some artists names creep in, though I have tried to remove them from my prompts whenever possible whilst still getting interesting results.
It quickly becomes clear why some artists names appear in SO many prompts though (Greg Rutkowski anyone?) as they do tend to generate some really beautiful images.
The ethics of this (discussed briefly in Part 2) however still leave me somewhat conflicted. I must admit I feel much less conflicted about it when generating images of myself, especially from artists who are no longer with us.
BELOW: A gallery of some of the images of myself generated locally using Dreambooth to train a version of the Stable Diffusion model
I was really enjoying these images, it’s pretty amazing to see these utterly new images of yourself appear from the latent space.
Of all of the ones I’ve produced so far, this one sticks in my mind:
A prompt stolen from lexica.art and unfortunately containing a long list of artists (NOT Mr Rutkowski though:).
It is however probably the one image in these experiments that feels most like looking in a very flattering mirror in an alternate universe.
It is the one I keep coming back to.
I’m not sure what it is about it, but it speaks to me as much as if it was painted by a human artist, it captures something about “me” that is frankly a little scary.
The implications of this technology became very personal and real.
I unashamedly fucking love it and I thank every one of the artists who appear in the prompt for producing all their beautiful work to make it possible.
I haven’t even begun to discuss inpainting – don’t like one small element in a picture? mask it and ask the model to replace just that part and outpainting – 512×512 not doing it for you? add more pixels on any or all of the sides exactly matching the original picture.
It was of course not long until I took my new Dreambooth model and applied it to what I’d learnt about Deforum animations. Throw in a dash of some use of EbSynth and what I can already do in After Effects and volia.
Jaw on the floor time for the 100th time.
I owe a big debt to the people in the Stable Diffusion Reddit forum and Discord, The Deforum discord, AItrepreneur YouTube channel, Olivio Sarikas YouTube channel, Enigmatic_E YouTube channel, Nerdy Rodent YouTube channel, Sebastian Kamph YouTube channel and everyone who is developing these tools, new workflows and problem-solving in the community. And of course to all the artists whose work the model has been trained on (mine included).
It is because of all these people that I have been able to do any of this and to realise how much more there is to learn and explore.
I’m barely scaping the surface.
It’s been, and continues to be, one of the most exciting, inspiring and downright fun technologies to come along in some time.
I’m hooked and excited for what developments the amazing open source community comes up with next!