mo pixels, mo problems —

Stability AI releases Stable Diffusion XL, its next-gen image synthesis model

New SDXL 1.0 release allows hi-res AI image synthesis that can run on a local machine.

Local control, open philosophy

We downloaded the Stable Diffusion XL 1.0 model and ran it locally on a Windows machine using an RTX 3060 GPU with 12GB of VRAM. Interfaces such as ComfyUI and AUTOMATIC1111's Stable Diffusion web UI make the process more user-friendly than when Stable Diffusion first launched last year, but it still requires some technical finagling to get it working. If you want to try it, this tutorial can point you in the right direction.

Overall, we saw image generations with a dreamlike quality, angling more toward the style of commercial AI image generator Midjourney. SDXL shines by providing greater detail in larger image sizes, as mentioned above. It also seems to follow prompts with more fidelity, although that's debatable.

Other notable improvements include the ability to render hands slightly better than previous SD models, and it's better at rendering text in images. But as with earlier models, generating quality images is still like pulling a slot machine lever and hoping for a good result. Experts find that careful prompting (and lots of trial and error) is the key to better results.

There are also drawbacks to running SDXL locally on consumer hardware, such as higher memory requirements and slower generation times than with Stable Diffusion 1.x and 2.x. (On our test rig, a 1024x1024 image at 20 steps, Euler Ancestral, CFG 8, rendered in 23.3 seconds for SD 1.5 and 26.4 seconds for SDXL 1.0. The resulting SDXL image had fewer repeating elements than the SD 1.5 image.)

So far, SD hobbyists seem to lament the lack of numerous fine-tuned LoRAs available for SD 1.5-style models that enhance aesthetics (such as a 3D-rendered style) or more detailed backdrops for certain scenes, but they expect that the community will fill in those gaps soon enough.

Community is key where Stable Diffusion is concerned since the model can run locally without oversight. That's a boon to an underground scene of amateur synthographers who utilize the software to craft interesting artwork. But it also means that the software can be used to create deepfakes, pornography, and disinformation. To Stability AI, the trade-off between some negative aspects and openness is worth it.

In a technical report on SDXL listed on arXiv earlier this month, Stability complains that "black box" models (such as OpenAI's DALL-E and Midjourney) that don't let users download the weights "make it challenging to assess the biases and limitations of these models in an impartial and objective way." They further claim that the closed nature of those models "hampers reproducibility, stifles innovation, and prevents the community from building upon these models to further the progress of science and art."

That kind of idealism is likely small comfort for artists who feel threatened by technology that utilizes scrapes of artists' work without permission to train models like SDXL. And it won't quiet the lawsuits over copyright. But even so, despite ethical issues with image synthesis technology, it keeps rolling along anyway, and that's exactly the way Stable Diffusion hobbyists like it.

Channel Ars Technica