Stable Diffusion XL puts AI-generated visual worlds at your GPU’s command

Post content hidden for low score. Show…

Defenestrar

Ars Legatus Legionis
10,195
Subscriptor++
The blending of two inputs seems to be extremely useful. Often times one wants to have a subject and a scene, for example and it's difficult (in my very limited experience) to merge those well. The county fair is coming up, perhaps I should revisit the Stability installation I played with a year ago and see what can be cooked up in the digital art realm.

I had previously even tried it out for work in synthesizing images for safety presentations - the kind of scenes where you'd never actually set up to shoot for actual safety reasons.

I was also in a training seminar today when HR said they've licensed software to read text in multiple types of AI trained voices for slide and video voiceovers. I remember some of the articles here a while back which introduced me to that tech. Now it's at least one fully fledged commercial product. That alone will save so much time versus the amateur voice acting we have to do for our web-based training sessions. Not to mention kick up the variety and possibilities for keeping it more engaging.

This is the fastest technology singularity I've been alive to witness so far. It's pretty amazing.
 
Upvote
48 (51 / -3)
And here we thought that social media had plumbed the depths of human depravity. Gentle reader, you ain't seen nothin' yet.
Every major advancement in media also seems to be an advancement in depravity in general, and I'm not sure how I'm going to feel about it in the next few years. To paraphrase: I just don't feel like I have any choice but to ride this wave while the scientists are busy figuring out how to make it work instead of asking themselves if they should.
 
Upvote
-19 (20 / -39)

Defenestrar

Ars Legatus Legionis
10,195
Subscriptor++
Up to now, I have found that Midjourney is about 3 months ahead of Stable Diffusion in terms of prompt coherence, image detail, and image quality. But I haven't compared them for a little while - has anything changed?
There's a really good summary of recent updates to Stable Diffusion here. Very much worth a read.
 
Upvote
-8 (9 / -17)
It would be nice to have some approximate numbers here.
According to the release announcement, "This two-stage architecture allows for robustness in image generation without compromising on speed or requiring excess compute resources. SDXL 1.0 should work effectively on consumer GPUs with 8GB VRAM or readily available cloud instances."


Over time, the community has improved speed and lowered memory requirements for previous versions of Stable Diffusion, so hopefully that will be the case for SDXL as well.

Looking forward to trying this out. I think most people (including myself) have still been using the 512x512 models, so a jump to 1024x1024 is pretty significant!
 
Upvote
33 (33 / 0)

Defenestrar

Ars Legatus Legionis
10,195
Subscriptor++
Nah, its just the author's preference. Tons of examples, Kidman is nowhere to be seen.
In the past I know Mr. Edwards has used the same prompt (maybe even seeds) for different generations of the software. I wouldn't be surprised if this is the case for the familiarity of certain images in the article series.
 
Upvote
26 (26 / 0)
The blending of two inputs seems to be extremely useful. Often times one wants to have a subject and a scene, for example and it's difficult (in my very limited experience) to merge those well. The county fair is coming up, perhaps I should revisit the Stability installation I played with a year ago and see what can be cooked up in the digital art realm.
You might look into the ControlNet stuff. I haven't played with it myself, but supposedly it's good for distinguishing/manipulating aspects like subject vs. background, poses, depth (in 3D), etc.

 
Upvote
20 (20 / 0)
And here we thought that social media had plumbed the depths of human depravity. Gentle reader, you ain't seen nothin' yet.
I followed the unstable diffusion NSFW discord a while ago and occasionally peek in to see what they are up to. Some interesting stuff for sure. This box has been opened and will never be closed again.
 
Upvote
36 (36 / 0)
I followed the unstable diffusion NSFW discord a while ago and occasionally peek in to see what they are up to. Some interesting stuff for sure. This box has been opened and will never be closed again.
The Internet is for porn, and so is AI. I doubt OP was concerned about NSFW, it seems to be about "stolen" pictures.
 
Upvote
12 (19 / -7)

benjedwards

Smack-Fu Master, in training
81
Ars Staff
In the past I know Mr. Edwards has used the same prompt (maybe even seeds) for different generations of the software. I wouldn't be surprised if this is the case for the familiarity of certain images in the article series.
It's true, I was attempting to replicate the now-famous "Stable Diffusion lady" from my original Stable Diffusion article last year, but with SDXL.
 
Upvote
64 (64 / 0)
It's true, I was attempting to replicate the now-famous "Stable Diffusion lady" from my original Stable Diffusion article last year, but with SDXL.
I suppose this could be a younger version. She has the same evil eyebrows.

Still, "techinica?" It even takes liberties with specific text prompts?
 
Upvote
15 (15 / 0)
Post content hidden for low score. Show…
I suppose this could be a younger version. She has the same evil eyebrows.

Still, "techinica?" It even takes liberties with specific text prompts?

Clear text with an obvious typo is streets ahead of older versions of SD, which usually produce text that looks like worn-off labels from Amazon Scrabble-tile-jumble brands.
 
Upvote
29 (29 / 0)
Every major advancement in media also seems to be an advancement in depravity in general, and I'm not sure how I'm going to feel about it in the next few years. To paraphrase: I just don't feel like I have any choice but to ride this wave while the scientists are busy figuring out how to make it work instead of asking themselves if they should.
I can't wait for the images of a certain would be president being french kissed by another inmate with spider web tattoos to start showing up. Is that the kind of depravity you meant?
 
Upvote
-1 (8 / -9)
One might expect a difference between AI snd human art because the human learns from the universe of data differently from the AIs. Humans do not do statistical averaging but rather use induction. The human output is derived by combining and particularizing from these universals, rather than as a statistic of a dataset.
 
Last edited:
Upvote
-8 (3 / -11)
Looking at Stable Diffusion's announcement, it appears that AMD GPU's are now supported on Linux (albeit with a much higher RAM requirement). As somebody that hasn't been all that closely following this space, is that AMD support likely to come to Windows in the foreseeable future, or am I going to have to spin up a Linux distro on my PC if I want to try this out?
 
Upvote
6 (6 / 0)

Fatesrider

Ars Legatus Legionis
19,035
Subscriptor
It would be nice to have some approximate numbers here.
You need both decent GPU/CPU speeds and lots of vRAM as well as regular RAM.

Yeah, not numbers, but this runs differently than a chat AI, which needs vRAM for best results.

As a comparison, I used Stable Diffusion on my old rig with an AMD 2700X, 32 GB RAM (probably 5400?) with an NVidia 2070 (8 GB vRAM), and it would take up to 10 minutes to run a batch of 64 500X500 images on default settings.

I loaded EasyDiffusion into my new Linux computer with an AMD 5900X with 64 GB (6000 something) RAM, Nvidia 3060 (12 GB vRAM) and it whips through a batch of 64 500X500 images on default settings in about a minute.

Times will vary depending on iterations, and engine,

So numbers are REALLY subjective and entirely based on your system's capacity. It's not that it WON'T run. It's that some things take longer to run, even on the same system, depending on the settings.

I'll probably have to wait for the Linux community to bring this to my system, because the information for getting it to run is Windows-centric, but I'll keep my eye out for that. It's entertaining when I'm bored, and it might help me with cover art (I do my own anyhow using other programs, so I'm not putting a starving artists out on the streets by doing that).

But I'll still probably resort to my CGI models and program to get it done anyhow. In the end, AI art is just an entertaining distraction for me.
 
Upvote
15 (16 / -1)

marsilies

Ars Tribunus Angusticlavius
19,161
Subscriptor++
How could a text engine that was explicitly told to show a sign with "Ars Technica" actually get the spelling wrong??
Because the image generation doesn't understand text. It knows the general patterns of text, but it doesn't understand what individual letters are or what words are, and such.

This is actually an improvement. I beleive before, anything that was supposed to be text just tended to be gibberish, often with nonsense symbols.

View: https://www.reddit.com/r/StableDiffusion/comments/112z3pt/eli5_please_why_does_ai_struggle_producing_text/
 
Upvote
39 (39 / 0)

marsilies

Ars Tribunus Angusticlavius
19,161
Subscriptor++
In the past I know Mr. Edwards has used the same prompt (maybe even seeds) for different generations of the software. I wouldn't be surprised if this is the case for the familiarity of certain images in the article series.

I think that's likely, for example, from a September 2022 article:
stable_diffusion_hero_8-800x448.jpg

 
Upvote
17 (17 / 0)

Benovite

Smack-Fu Master, in training
71
I'm gonna be the odd man out and say that there's been a regression instead of progression with what I'm seeing here. Although I gather it's perhaps easier to make worse AI-generated images than before? Is that the gist?

Trying to quantify all of this while laughing at the giant hand with backwards fingers. That's using an older version though right? ¯\(ツ)
(my lack of understanding any of this; entirely possible)
 
Upvote
-12 (5 / -17)
Why is it so easy to tell that an image is AI generated?
Because we've barely scratched the surface of what models like these are really capable of. In the blink of an eye, we've gone from barely coherent collections of almost human elements to fantastically detailed and coherent images of virtually anything you can type out. This tech is moving extremely fast, so it won't be long before wonky hands are no longer a tell.
 
Upvote
41 (41 / 0)
I just got a local instance running. Taking requests.

Hi Randomcat, may I offer a suggestion on that, please? One of the ways this has been played with, has been to create "what happened outside the photo" in relation to LP covers. A few months back someone used a previous generative engine to show the wide view of Roxy Music's "Stranded" album cover (it showed more rocks and shoreline, who knew?). Perhaps a wide view of a favourite album cover of yours?
 
Upvote
12 (13 / -1)