• 2 Posts
  • 118 Comments
Joined 3 months ago
cake
Cake day: May 11th, 2024

help-circle














  • Isn’t your comment more of a perspective on the public perception of AI, the missteps surrounding its implementation, and its current role - rather than an examination of the potential role (practically speaking) of generative AI in a more general AI model? As is the thrust of the post, generative AI will necessarily be part of a larger AI, in part to make up for its weaknesses, in part to utilize its strengths.

    That said, generative AI isn’t nearly as endangered by generated training data as is commonly understood. Even if it were that bad, embodiment is rapidly changing the landscape. There are a ton of papers about how to use larger models to make smaller models more effective, using generative AI to improve generative AI along with efficiencies. Heck, novel efficiencies get developed almost as regularly as novel use cases. We’re always learning how to do more with less.





  • The voice library 11labs added includes some really reliable and expressive models. I’ve only trained a few voice clones, but I find them totally usable for swapping out short lines to avoid having to bring a subject back in to record. I’ll fabricate a sentence or two, but for longer form stuff, I only use AI for the rough cuts. Then I’ll practically record as a last step, once everything’s gone through revision cycles. The “generate a few and chop em together” method is fine for short clips, but becomes tedious for longer stuff.

    Funnily enough, when I say roto, I really just mean tracing the subject to remove it from the background. Background removal’s so baked in to things now, I dunno if people even think of it as roto. But I mostly still prefer the Adobe solutions on this - roto brush in After Effects, for the AI/manual collaboration. As for roto in the A Scanner Darkly sense, I’ve played with a few of the video to video models, but mostly as a lark for fluff B-roll.


  • Full open source, nice! I respect the effort that went into that implementation. I pretty much exclusively use 11 Labs for TTS/RVC, turn up the style, turn down the stability, generate a few, and pick the best. I do find that longer generations tend to lose the thread, so it’s better to batch smaller script segments.

    Unless I misunderstand ya, your controlnet setup is for what would be rigging and animation rather than roto. I do agree that while I enjoy the outputs of pretty much all the automated animators, they’re not ready for prime time yet. Although I’m about to dive into KREA’s new key framing feature and see if that’s any better for that use case.