1

58

submitted 2 years ago by [email protected] to c/[email protected]

14 comments fedilink

This is a copy of /r/stablediffusion wiki to help people who need access to that information

Howdy and welcome to r/stablediffusion! I'm u/Sandcheeze and I have collected these resources and links to help enjoy Stable Diffusion whether you are here for the first time or looking to add more customization to your image generations.

If you'd like to show support, feel free to send us kind words or check out our Discord. Donations are appreciated, but not necessary as you being a great part of the community is all we ask for.

Note: The community resources provided here are not endorsed, vetted, nor provided by Stability AI.

#Stable Diffusion

Local Installation

Active Community Repos/Forks to install on your PC and keep it local.

Online Websites

Websites with usable Stable Diffusion right in your browser. No need to install anything.

Mobile Apps

Stable Diffusion on your mobile device.

Tutorials

Learn how to improve your skills in using Stable Diffusion even if a beginner or expert.

Dream Booth

How-to train a custom model and resources on doing so.

Models

Specially trained towards certain subjects and/or styles.

Embeddings

Tokens trained on specific subjects and/or styles.

Bots

Either bots you can self-host, or bots you can use directly on various websites and services such as Discord, Reddit etc

3rd Party Plugins

SD plugins for programs such as Discord, Photoshop, Krita, Blender, Gimp, etc.

Other useful tools

Diffusion Toolkit - Image viewer/organizer that scans your images for PNGInfo generated.
Pixiz Morphing - Easily transition between 2 photos.
Bulk Image Resizing Made Easy 2.0

#Community

Games

PictionAIry : (Video|2-6 Players) - The image guessing game where AI does the drawing!

Podcasts

This is Not An AI Art Podcast - Doug Smith talks about Ai Art and provides the prompts/workflow on his site.

Databases or Lists

AiArtApps
Stable Diffusion Akashic Records
Questianon's SD Updates 1
Questianon's SD Updates 2
SW-Yw's Stable Diffusion Repo List
Plonk's SD Model List (NSFW)
Nightkall's Useful Lists
Civitai - Website with a list of custom models.

Still updating this with more links as I collect them all here.

FAQ

How do I use Stable Diffusion?

Check out our guides section above!

Will it run on my machine?

Stable Diffusion requires a 4GB+ VRAM GPU to run locally. However, much beefier graphics cards (10, 20, 30 Series Nvidia Cards) will be necessary to generate high resolution or high step images. However, anyone can run it online through DreamStudio or hosting it on their own GPU compute cloud server.
Only Nvidia cards are officially supported.
AMD support is available here unofficially.
Apple M1 Chip support is available here unofficially.
Intel based Macs currently do not work with Stable Diffusion.

How do I get a website or resource added here?

*If you have a suggestion for a website or a project to add to our list, or if you would like to contribute to the wiki, please don't hesitate to reach out to us via modmail or message me.

2

8

MackinationsAi/Window_Trellis: Windows - One click Install (github.com)

submitted 4 days ago by [email protected] to c/[email protected]

0 comments fedilink

3

4

yuvraj108c/ComfyUI-Video-Depth-Anything (github.com)

submitted 4 days ago by [email protected] to c/[email protected]

0 comments fedilink

4

-1

mcmonkeyprojects/SwarmUI: SwarmUI (formerly StableSwarmUI) 0.9.5-Beta Release (github.com)

submitted 6 days ago by [email protected] to c/[email protected]

0 comments fedilink

Release: https://github.com/mcmonkeyprojects/SwarmUI/releases/tag/0.9.5-Beta

5

4

Lumina-Image 2.0 : An Apache-2.0 licensed Efficient, Unified and Transparent Image Generative Model (github.com)

submitted 1 week ago by [email protected] to c/[email protected]

0 comments fedilink

Code: https://github.com/Alpha-VLLM/Lumina-Image-2.0

Models: https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0

6

18

U.S. Copyright Office - Copyright and Artificial Intelligence, Part 2: Copyrightability (copyright.gov)

submitted 1 week ago* (last edited 1 week ago) by [email protected] to c/[email protected]

5 comments fedilink

7

10

rupeshs/fastsdcpu Release v1.0.0 Beta120 (github.com)

submitted 1 week ago by [email protected] to c/[email protected]

0 comments fedilink

Add img2img and Image Variations tabs to the Qt GUI by @monstruosoft

Release: https://github.com/rupeshs/fastsdcpu/releases/tag/v1.0.0-beta.120

8

3

ostris/Flex.1-alpha · Hugging Face (huggingface.co)

submitted 2 weeks ago by [email protected] to c/[email protected]

0 comments fedilink

Flex.1 began as a finetune of FLUX.1-schnell, which allows the model to retain the Apache 2.0 license. It is designed to be fine tunable with Day 1 LoRA training support in AI-Toolkit.

9

7

Introducing ComfyUI RFC Process: Shaping the Future Together (blog.comfy.org)

submitted 2 weeks ago by [email protected] to c/[email protected]

0 comments fedilink

10

7

Packsod/Ref-Picker: Blender image folder organizing tool inspired by PureRef (github.com)

submitted 2 weeks ago by [email protected] to c/[email protected]

0 comments fedilink

11

14

🎂 ComfyUI Turns 2: A Journey and Call for Talent (blog.comfy.org)

submitted 3 weeks ago by [email protected] to c/[email protected]

0 comments fedilink

12

11

ComfyUI now supports Nvidia Cosmos (blog.comfy.org)

submitted 3 weeks ago by [email protected] to c/[email protected]

0 comments fedilink

13

20

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation (i.imgur.com)

submitted 1 month ago by [email protected] to c/[email protected]

0 comments fedilink

Abstract

Story visualization, the task of creating visual narratives from textual descriptions, has seen progress with text-to-image generation models. However, these models often lack effective control over character appearances and interactions, particularly in multi-character scenes. To address these limitations, we propose a new task: customized manga generation and introduce DiffSensei, an innovative framework specifically designed for generating manga with dynamic multi-character control. DiffSensei integrates a diffusion-based image generator with a multimodal large language model (MLLM) that acts as a text-compatible identity adapter. Our approach employs masked cross-attention to seamlessly incorporate character features, enabling precise layout control without direct pixel transfer. Additionally, the MLLM-based adapter adjusts character features to align with panel-specific text cues, allowing flexible adjustments in character expressions, poses, and actions. We also introduce MangaZero, a large-scale dataset tailored to this task, containing 43,264 manga pages and 427,147 annotated panels, supporting the visualization of varied character interactions and movements across sequential frames. Extensive experiments demonstrate that DiffSensei outperforms existing models, marking a significant advancement in manga generation by enabling text-adaptable character customization. The code, model, and dataset will be open-sourced to the community.

Paper: https://arxiv.org/abs/2412.07589

Code: https://github.com/jianzongwu/DiffSensei

Project Page: https://jianzongwu.github.io/projects/diffsensei/

14

10

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control (vmix-diffusion.github.io)

submitted 1 month ago by [email protected] to c/[email protected]

2 comments fedilink

Abstract

While diffusion models show extraordinary talents in text-to-image generation, they may still fail to generate highly aesthetic images. More specifically, there is still a gap between the generated images and the real-world aesthetic images in finer-grained dimensions including color, lighting, composition, etc. In this paper, we propose Cross-Attention Value Mixing Control (VMix) Adapter, a plug-and-play aesthetics adapter, to upgrade the quality of generated images while maintaining generality across visual concepts by (1) disentangling the input text prompt into the content description and aesthetic description by the initialization of aesthetic embedding, and (2) integrating aesthetic conditions into the denoising process through value-mixed cross-attention, with the network connected by zero-initialized linear layers. Our key insight is to enhance the aesthetic presentation of existing diffusion models by designing a superior condition control method, all while preserving the image-text alignment. Through our meticulous design, VMix is flexible enough to be applied to community models for better visual performance without retraining. To validate the effectiveness of our method, we conducted extensive experiments, showing that VMix outperforms other state-of-the-art methods and is compatible with other community modules (e.g., LoRA, ControlNet, and IPAdapter) for image generation.

Paper: https://arxiv.org/abs/2412.20800

Code: https://github.com/fenfenfenfan/VMix (Coming soon)

Project Page: https://vmix-diffusion.github.io/VMix/

15

45

1.58-bit FLUX (i.imgur.com)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]

18 comments fedilink

Abstract

We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images. Notably, our quantization method operates without access to image data, relying solely on self-supervision from the FLUX.1-dev model. Additionally, we develop a custom kernel optimized for 1.58-bit operations, achieving a 7.7x reduction in model storage, a 5.1x reduction in inference memory, and improved inference latency. Extensive evaluations on the GenEval and T2I Compbench benchmarks demonstrate the effectiveness of 1.58-bit FLUX in maintaining generation quality while significantly enhancing computational efficiency.

Paper: https://arxiv.org/abs/2412.18653

Code: https://github.com/Chenglin-Yang/1.58bit.flux (coming soon)

16

5

bghira/SimpleTuner Release v1.2.2 Sana support and SD3.5 (Large + Medium) training fixes (github.com)

submitted 1 month ago by [email protected] to c/[email protected]

0 comments fedilink

17

5

SDNext Xmass Edition 2024-12 (github.com)

submitted 1 month ago by [email protected] to c/[email protected]

0 comments fedilink

Change Log for SD.Next

SD.Next Xmass edition: What's new?

While we have several new supported models, workflows and tools, this release is primarily about quality-of-life improvements:

New memory management engine
list of changes that went into this one is long: changes to GPU offloading, brand new LoRA loader, system memory management, on-the-fly quantization, improved gguf loader, etc.
but main goal is enabling modern large models to run on standard consumer GPUs
without performance hits typically associated with aggressive memory swapping and needs for constant manual tweaks
New documentation website
with full search and tons of new documentation
New settings panel with simplified and streamlined configuration

We've also added support for several new models such as highly anticipated NVLabs Sana (see supported models for full list)
And several new SOTA video models: Lightricks LTX-Video, Hunyuan Video and Genmo Mochi.1 Preview

And a lot of Control and IPAdapter goodies

for SDXL there is new ProMax, improved Union and Tiling models
for FLUX.1 there are Flux Tools as well as official Canny and Depth models,
a cool Redux model as well as XLabs IP-adapter
for SD3.5 there are official Canny, Blur and Depth models in addition to existing 3rd party models
as well as InstantX IP-adapter

Plus couple of new integrated workflows such as FreeScale and Style Aligned Image Generation

And it wouldn't be a Xmass edition without couple of custom themes: Snowflake and Elf-Green!
All-in-all, we're around ~180 commits worth of updates, check the changelog for full list

ReadMe | ChangeLog | Docs | WiKi | Discord

18

7

pharmapsychotic/comfy-cliption: Image to text with CLIP ViT-L/14 in ComfyUI (github.com)

submitted 1 month ago by [email protected] to c/[email protected]

0 comments fedilink

19

8

InvokeAI v5.5 Released (www.youtube.com)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]

0 comments fedilink

Release: https://github.com/invoke-ai/InvokeAI/releases/

20

13

MangoLion/justaaa: React web app that lets creators generate AI images and 3D models using free services like AI Horde and HuggingFace. Built with Vite and ShadCN UI. (justaaa.fyrean.com)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]

3 comments fedilink

Source: https://github.com/MangoLion/justaaa

21

8

Current best video models? (poptalk.scrubbles.tech)

submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]

2 comments fedilink

Has anyone been using any great video models? I've used SVD and it's been... fine, but I see some really great videos online and don't know how they're generated. I'm thinking mostly Img2Txt style, in ComfyUI so I can generate an image and then animate it. (My perfect case would be something like multimodal, where I give it an image and a prompt, and a video would be spit out)

22

12

One Diffusion to Generate Them All (imgur.com)

submitted 1 month ago by [email protected] to c/[email protected]

3 comments fedilink

Abstract

We introduce OneDiffusion, a versatile, large-scale diffusion model that seamlessly supports bidirectional image synthesis and understanding across diverse tasks. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps, while also handling tasks like image deblurring, upscaling, and reverse processes such as depth estimation and segmentation. Additionally, OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs. Our model takes a straightforward yet effective approach by treating all tasks as frame sequences with varying noise scales during training, allowing any frame to act as a conditioning image at inference time. Our unified training framework removes the need for specialized architectures, supports scalable multi-task training, and adapts smoothly to any resolution, enhancing both generalization and scalability. Experimental results demonstrate competitive performance across tasks in both generation and prediction such as text-to-image, multiview generation, ID preservation, depth estimation and camera pose estimation despite relatively small training dataset. Our code and checkpoint are freely available at this https URL

Paper: https://arxiv.org/abs/2411.16318

Code: https://github.com/lehduong/OneDiffusion?tab=readme-ov-file

Model: https://huggingface.co/lehduong/OneDiffusion

Project Page: https://lehduong.github.io/OneDiffusion-homepage/