TikTok’s parent company, ByteDance, has been secretly using OpenAI’s technology to develop its own competing large language model (LLM). “This practice is generally considered a faux pas in the AI world,” writes The Verge’s Alex Heath. “It’s also in direct violation of OpenAI’s terms of service, which state that its model output can’t be used ‘to develop any artificial intelligence models that compete with our products and services.’”

  • TootSweet@lemmy.world
    link
    fedilink
    English
    arrow-up
    107
    arrow-down
    9
    ·
    11 months ago

    OpenAI will steal a whole internet worth of everybody’s data to train their large language model, but gets pissed when others do the same to them.

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          10
          arrow-down
          3
          ·
          11 months ago

          No, even then it isn’t. It’s not stealing. There is literally a whole different body of law defining stealing versus the body of law that defines copyright and intellectual property. The data is still exactly where it was to begin with, therefore it hasn’t been stolen.

          I wish people would stop using wildly inaccurate loaded terminology in these discussions simply to score emotional points.

    • crazyCat@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      11 months ago

      Their take on it, via Sam Altman, is that the AI is reading and learning from the internet and we can’t fault them for that, right? You don’t fault a human from using what they’ve learned, do you? Is the rationale… I don’t know what I think about it though

      • Hacksaw@lemmy.ca
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        4
        ·
        11 months ago

        It’s not a PERSON. The only person involved is literally copying the internet and duct taping it together to form chat gpt. Then they say “the AI is reading and learning like any human would”. No brother, the AI IS MADE FROM a copy of all the stolen words. Before the theft, there is no AI that you can put the words into and have it learn. It’s just a matrix filled with trillions of zeroes. It’s only an AI AFTER you build it from the stolen data.

    • AdamEatsAss@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      29
      ·
      11 months ago

      The didn’t really “steal” the internet data. I don’t think most websites and data logs they used explicitly said “don’t use this to train a large language model.”