“less biased” W being the preferable solution, where possible.
Not necessarily. There are two parts to a diffusion model: a tokenizer, and a neural network with a series of layers (W in this case would be a single layer) that react in some way to some tokens. What you really want, is a W “with more information”, no matter if some tokens refer to a more or less “fair” (less biased) portion of it.
It doesn’t really matter if “girl = 99% chance of white girl + 1% of [other skin tone] girl”, and “asian girl = sexualized asian girl”… as long as the “biased” token associations don’t reduce de amount of “[skin tone] girl” variants you can extract with specific prompts, and still react correctly to negative prompts like “asian girl -sexualized”.
LoRAs are a way to bludgeon a whole model into a strong bias, like “everything is a manga”, or “everything is birds”, or “all skin is frogs”, and so on. The interesting thing of LoRAs is that, if you get a base model where “girl = sexualized white girl”, and add an “all faces are asian” LoRA, and a “no sexualized parts” LoRA… then well, you’ve beaten the model into submission without having to use prompts (kind of a pyrrhic victory).
That is, unless you want something like a “multirracial female basketball team”.
That would require the model to encode the “race” as multiple sets of features, then pick one at random for every player in whatever proportion you find acceptable… but for that, you’re likely better off with adding an LLM preprocessor stage to pick a random set of races in your desired proportion, then have it instruct a bounded box diffusion model to draw each player with a specific prompt, so the bias of the model’s tokens would again become irrelevant.
Forcing the model to encode more variants per token, is where you start needing a larger model, or start losing quality.
Not necessarily. There are two parts to a diffusion model: a tokenizer, and a neural network with a series of layers (W in this case would be a single layer) that react in some way to some tokens. What you really want, is a W “with more information”, no matter if some tokens refer to a more or less “fair” (less biased) portion of it.
It doesn’t really matter if “girl = 99% chance of white girl + 1% of [other skin tone] girl”, and “asian girl = sexualized asian girl”… as long as the “biased” token associations don’t reduce de amount of “[skin tone] girl” variants you can extract with specific prompts, and still react correctly to negative prompts like “asian girl -sexualized”.
LoRAs are a way to bludgeon a whole model into a strong bias, like “everything is a manga”, or “everything is birds”, or “all skin is frogs”, and so on. The interesting thing of LoRAs is that, if you get a base model where “girl = sexualized white girl”, and add an “all faces are asian” LoRA, and a “no sexualized parts” LoRA… then well, you’ve beaten the model into submission without having to use prompts (kind of a pyrrhic victory).
That is, unless you want something like a “multirracial female basketball team”.
That would require the model to encode the “race” as multiple sets of features, then pick one at random for every player in whatever proportion you find acceptable… but for that, you’re likely better off with adding an LLM preprocessor stage to pick a random set of races in your desired proportion, then have it instruct a bounded box diffusion model to draw each player with a specific prompt, so the bias of the model’s tokens would again become irrelevant.
Forcing the model to encode more variants per token, is where you start needing a larger model, or start losing quality.