HOME | DD

Description

Results of some experimentation with a Stable Diffusion AI algorithm.

Broadly speaking (because I am not capable of explaining this in detail!), Stable Diffusion's overall process can be likened to cloud-watching -- one person points to whatever formation of clouds (ideally, cumulus) are in the sky and says something like "that one looks like a cat", explaining the details they see while another person tries to visualize it from their description and where they're pointing.

Stable Diffusion, similarly, initializes a field of random data, then "denoises" it (sharpens, transforms, diffuses -- hence the name) according to the prompt you provided it (the analysis of which is, itself, another algorithm). Notably, this is performed not on prospective image data (i.e. pixels), but on its abstract (or "latent") internal representation. This also informs why the algorithm occasionally weirds out in spectacular ways, e.g. adding extra body parts here and there, because if the drawing algorithm "interprets" a region of the image as "looks like a leg goes here" then the renderer is going to draw a leg, darnit, with basically no consideration for whether that makes any logical, structural, or compositional sense.

But about that initial starting point: Since computers are not "truly" random, that initial random data was itself the result of a "noise generator" process, which in turn was initialized using an RNG seed number. Much like an RNG seed in a "roguelike" video game, Stable Diffusion's RNG seed controls basically everything else not actually mentioned in the input prompt itself (i.e: if you can run the same prompt and the same seed, the generator will yield the same results, down to the same individual pixels), albeit in esoteric, opaque, and unpredictable ways. The input prompt, on the other hand, is obviously not random, and controls the output in (mostly) predictable ways. Meaning, if you can control a specific seed value, you can add or tweak details to fine-tune a given result.

But, back to the subject at hand! The above "cast" of characters was created from a single seed value applying minor variations on otherwise identical prompts, resulting in consistent, predictable differences between each permutation: Notice how all characters share the same basic posture (if you were to perform an overlay test, the position of their feet/tail/head/hands actually lines up almost perfectly!) and mostly similar composition, even though their "fingerprint" details (specific horns/ears, tail length, etc) vary enough for us to regard them as unique (but possibly related) individuals.

Now for each variation in specific:

Variation 1 - "fox" vs. "dragon" (upper vs. lower sets)

Pretty self-explanatory, but notice that I did not actually specify anthropomorphic versions -- Stable Diffusion inferred that on its own, from my use of the term "digitigrade" (the animalistic, "elongated tiptoe" leg structure), without which the result would have been very different.

Many real animals have digitigrade hindlegs already but this is not considered noteworthy, so the algorithm's training data presumably associated the term more with bipedalism/anthropomorphism (perhaps due to the furry fandom's usage of the terms; even if I specified "plantigrade" -- heel on the ground like a human -- this would still bias the design towards anthropomorphism). If I ran enough seeds I could have generated a quadruped version, but that would be a rare exception to the trend.

Variation 2 - "male" vs. "female" (left vs. right sets)

Also pretty self-explanatory. The male is recognizable by his broader shoulders, stronger arms, and angular chest (traits derived from humans, and by extension, anthropomorphic animals) where the female is slimmer and rounder (including in her chest) by comparison. This results from how the algorithm links certain attributes to certain keywords (known as an "algorithmic bias", more on that later).

Additionally, I previously discovered that merely using a gendered pronoun at all (his/her/etc.) is sufficient for the algorithm to prioritize selecting male or female attributes. It's also known that certain words which are gendered in real life are considered gender-neutral to the AI (e.g. "female lion" yields different results than "lioness").

Variation 3 - but why are some of them naked? (central pairs vs. outer pairs)

...And this is where algorithmic bias gets very interesting!

Notice how the female fox wears a full shirt/blouse with dress skirt, while the male fox is topless with long shorts. The female dragon wears a form-fitting tank and split dress skirt, while the male dragon wears a tank, shortpants, with arm- and ankle- wraps.

Why do their outfits differ so widely from each other? Because I never included (in the prompt) any specific kind of clothes, leaving it up to the drawing algorithm to decide on its own.

Then why are the rest naked?

... because I never actually specified any clothes at all?

Well, yes but it's actually more complicated than that. Here goes.

The drawing algorithm doesn't actually know what nudity "is", or why it should (or should not) even be drawing it. It only knows what something "looks like" (based on its database of training data) and what other things that it can infer or associate with them (including clothing, and by exclusion nudity). "Nudity" as a concept is defined uniquely by humans, almost exclusively for humans. We don't generally attach this concept to any other species (even though nudity is the default state for every body, pun intended), thus the drawing algorithm isn't trained to associate clothes with animals the same as it associates clothes with humans.

Consider that in all versions of the prompt, I included that they are "wearing a pendant" -- a piece of jewelry, which correlates to our use of clothing in general. See where this is going? But for the nude characters I included that they wear "only" a pendant, this may have broken the association between jewelry and other clothing, effectively telling the algorithm to prioritize "non clothed" references. And anthropomorphic animals (by virtue of their "animal" aspect) are more easily depicted in their body's natural state (i.e. naked) without triggering our perception of nudity.

Final thoughts:

Not shown here are about an hour's worth of iterations through maybe half a dozen seed values (I actually started with just one parameter, male vs. female), some of which produced less consistent results than others.

This actually began with me submitting a prompt of "Renamon" (yes, just "Renamon") on a whim, and being surprised that Stable Diffusion correctly associated those letters with the Digimon's actual colors and general body shape. So after recording a specific seed value, I attempted to refine the results with further parameters like her purple arm sleeves. After an inclusion of "wearing armwraps" resulted in also the addition of legwraps (forming a sort of full "martial arts fighter" outfit) I discovered that this generator blocked (among other things) attempts to mention nudity directly. Disappointing but understandable (like how you don't image-search for Sonic or Gardevoir fanart, right?) ... and, as with all simple filters, it was trivial to workaround this by tweaking my prompt from "wearing armwraps" to "wearing only armwraps".

Specifying "only" proved to yield consistent results with other forms of clothing or accessories (e.g. "anthro wearing (only?) a shirt/neckerchief/hat/etc"), so I set the Renamon prompt aside and began working on the set of results here. I started with just the fox version of this prompt with "male/female" as the only variable parameter, before adding "wearing (only?) a pendant" as a second parameter to see if it impacted the results the same as it did for Renamon (which it did); then changing the prompt from "fox" to "dragon" on a whim, and getting surprisingly consistent results there too.

And, in the process I discovered a tangent worth taking a deeper dive into separately -- but that's a story for anther day.

...and if you've actually read this far down ... well, thanks for attending "my Ted talk" !

Description

Results of some experimentation with a Stable Diffusion AI algorithm.

Broadly speaking (because I am not capable of explaining this in detail!), Stable Diffusion's overall process can be likened to cloud-watching -- one person points to whatever formation of clouds (ideally, cumulus) are in the sky and says something like "that one looks like a cat", explaining the details they see while another person tries to visualize it from their description and where they're pointing.

Stable Diffusion, similarly, initializes a field of random data, then "denoises" it (sharpens, transforms, diffuses -- hence the name) according to the prompt you provided it (the analysis of which is, itself, another algorithm). Notably, this is performed not on prospective image data (i.e. pixels), but on its abstract (or "latent") internal representation. This also informs why the algorithm occasionally weirds out in spectacular ways, e.g. adding extra body parts here and there, because if the drawing algorithm "interprets" a region of the image as "looks like a leg goes here" then the renderer is going to draw a leg, darnit, with basically no consideration for whether that makes any logical, structural, or compositional sense.

But about that initial starting point: Since computers are not "truly" random, that initial random data was itself the result of a "noise generator" process, which in turn was initialized using an RNG seed number. Much like an RNG seed in a "roguelike" video game, Stable Diffusion's RNG seed controls basically everything else not actually mentioned in the input prompt itself (i.e: if you can run the same prompt and the same seed, the generator will yield the same results, down to the same individual pixels), albeit in esoteric, opaque, and unpredictable ways. The input prompt, on the other hand, is obviously not random, and controls the output in (mostly) predictable ways. Meaning, if you can control a specific seed value, you can add or tweak details to fine-tune a given result.

But, back to the subject at hand! The above "cast" of characters was created from a single seed value applying minor variations on otherwise identical prompts, resulting in consistent, predictable differences between each permutation: Notice how all characters share the same basic posture (if you were to perform an overlay test, the position of their feet/tail/head/hands actually lines up almost perfectly!) and mostly similar composition, even though their "fingerprint" details (specific horns/ears, tail length, etc) vary enough for us to regard them as unique (but possibly related) individuals.

Now for each variation in specific:

Variation 1 - "fox" vs. "dragon" (upper vs. lower sets)

Pretty self-explanatory, but notice that I did not actually specify anthropomorphic versions -- Stable Diffusion inferred that on its own, from my use of the term "digitigrade" (the animalistic, "elongated tiptoe" leg structure), without which the result would have been very different.

Many real animals have digitigrade hindlegs already but this is not considered noteworthy, so the algorithm's training data presumably associated the term more with bipedalism/anthropomorphism (perhaps due to the furry fandom's usage of the terms; even if I specified "plantigrade" -- heel on the ground like a human -- this would still bias the design towards anthropomorphism). If I ran enough seeds I could have generated a quadruped version, but that would be a rare exception to the trend.

Variation 2 - "male" vs. "female" (left vs. right sets)

Also pretty self-explanatory. The male is recognizable by his broader shoulders, stronger arms, and angular chest (traits derived from humans, and by extension, anthropomorphic animals) where the female is slimmer and rounder (including in her chest) by comparison. This results from how the algorithm links certain attributes to certain keywords (known as an "algorithmic bias", more on that later).

Additionally, I previously discovered that merely using a gendered pronoun at all (his/her/etc.) is sufficient for the algorithm to prioritize selecting male or female attributes. It's also known that certain words which are gendered in real life are considered gender-neutral to the AI (e.g. "female lion" yields different results than "lioness").

Variation 3 - but why are some of them naked? (central pairs vs. outer pairs)

...And this is where algorithmic bias gets very interesting!

Notice how the female fox wears a full shirt/blouse with dress skirt, while the male fox is topless with long shorts. The female dragon wears a form-fitting tank and split dress skirt, while the male dragon wears a tank, shortpants, with arm- and ankle- wraps.

Why do their outfits differ so widely from each other? Because I never included (in the prompt) any specific kind of clothes, leaving it up to the drawing algorithm to decide on its own.

Then why are the rest naked?

... because I never actually specified any clothes at all?

Well, yes but it's actually more complicated than that. Here goes.

The drawing algorithm doesn't actually know what nudity "is", or why it should (or should not) even be drawing it. It only knows what something "looks like" (based on its database of training data) and what other things that it can infer or associate with them (including clothing, and by exclusion nudity). "Nudity" as a concept is defined uniquely by humans, almost exclusively for humans. We don't generally attach this concept to any other species (even though nudity is the default state for every body, pun intended), thus the drawing algorithm isn't trained to associate clothes with animals the same as it associates clothes with humans.

Consider that in all versions of the prompt, I included that they are "wearing a pendant" -- a piece of jewelry, which correlates to our use of clothing in general. See where this is going? But for the nude characters I included that they wear "only" a pendant, this may have broken the association between jewelry and other clothing, effectively telling the algorithm to prioritize "non clothed" references. And anthropomorphic animals (by virtue of their "animal" aspect) are more easily depicted in their body's natural state (i.e. naked) without triggering our perception of nudity.

Final thoughts:

Not shown here are about an hour's worth of iterations through maybe half a dozen seed values (I actually started with just one parameter, male vs. female), some of which produced less consistent results than others.

This actually began with me submitting a prompt of "Renamon" (yes, just "Renamon") on a whim, and being surprised that Stable Diffusion correctly associated those letters with the Digimon's actual colors and general body shape. So after recording a specific seed value, I attempted to refine the results with further parameters like her purple arm sleeves. After an inclusion of "wearing armwraps" resulted in also the addition of legwraps (forming a sort of full "martial arts fighter" outfit) I discovered that this generator blocked (among other things) attempts to mention nudity directly. Disappointing but understandable (like how you don't image-search for Sonic or Gardevoir fanart, right?) ... and, as with all simple filters, it was trivial to workaround this by tweaking my prompt from "wearing armwraps" to "wearing only armwraps".

Specifying "only" proved to yield consistent results with other forms of clothing or accessories (e.g. "anthro wearing (only?) a shirt/neckerchief/hat/etc"), so I set the Renamon prompt aside and began working on the set of results here. I started with just the fox version of this prompt with "male/female" as the only variable parameter, before adding "wearing (only?) a pendant" as a second parameter to see if it impacted the results the same as it did for Renamon (which it did); then changing the prompt from "fox" to "dragon" on a whim, and getting surprisingly consistent results there too.

And, in the process I discovered a tangent worth taking a deeper dive into separately -- but that's a story for anther day.

...and if you've actually read this far down ... well, thanks for attending "my Ted talk" !