I’m interested in hosting something like this, and I’d like to know experiences regarding this topic.

The main reason to host this for privacy reasons and also to integrate my own PKM data (markdown files, mainly).

Feel free to recommend me videos, articles, other Lemmy communities, etc.

  • sj_zero@lotide.fbxl.net
    link
    fedilink
    arrow-up
    6
    arrow-down
    2
    ·
    8 months ago

    I’ve been using a number of different tools which I interface to my nextcloud.

    My main nextcloud has a llm plugin which was really easy to install, you just install the plug-in, make sure that you are configured properly with python in your path, and then run an OCC command to download one of a few models.

    https://localai.io/

    I also hosted localAI, which was a little bit more involved, but the website did a decent enough job of explaining exactly all the things that you needed to do in order to get all the different types of AI model working. Besides LLMs, it also supports text to speech, speech to text, and image generation.

    Two things that are important: first, if you are server doesn’t have a pretty advanced video card then you’re going to be using the CPU exclusively for AI, and that’ll be pretty slow. Second, I found it very quickly that the amount of RAM you have is critical. My main server is a core i5 4th gen, and so I put AI software on another one of my servers which is a core i5 7th gen. You would think that the latter would work a lot better, but it had half the ram, and it basically wasn’t even able to get started.

    Besides hosting ai, if you have a desktop computer or gaming laptop you can run local AI models. There’s a fantastic piece of software called Faraday that works pretty well on my laptop. You can get more and more sophisticated models depending on how much memory you have.

    https://youtu.be/aLy_vVLUHZk

    Krita has AI dal-e support for image generation available as a plug-in. I haven’t used it yet because I just got it started downloading last night before I went to bed, but the installation process has defined in the video seems accurate and was extremely easy and mostly automated.

    https://youtu.be/AU8NDSBIS1U

  • The Assman@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    4
    ·
    8 months ago

    Programming is my main use case for LLMs and I think it’s too slow right now. In the 30 seconds it takes a local llama to give me an answer I’ve already figured it out.

    I’m definitely keeping an eye on it though. Our org is really interesting in training models on our code base and customer data.

    • TCB13@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      8 months ago

      “Uncensored” models are bullshit everything but uncensored. Just ask them for a Windows XP Pro key and you’ll see how uncensored they really are.

  • Buffalobuffalo@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    8 months ago

    Dbzero Lemmy has a relationship with the Horde AI shared LLM group. My primary use is for chat roleplay but they have streamlined guides to hosting your own models for personal or horde use. One of the primary interfaces is SillyTavern but they integrate numerous models

  • SuperiorOne@lemmy.ml
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    2
    ·
    8 months ago

    I’m actively using ollama with docker to run llama2:13b model. It’s generally works fine but heavy on resources as expected.

  • Haggunenons@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    8 months ago

    Mixtral is an amazing one that isn’t super slow or require incredible hardware foe a decent speed.

    In general this guy has really good videos/tutorials for the latest tools.

    • PipedLinkBot@feddit.rocksB
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      8 months ago

      Here is an alternative Piped link(s):

      Mixtral

      Piped is a privacy-respecting open-source alternative frontend to YouTube.

      I’m open-source; check me out at GitHub.

  • Imacat@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    2
    ·
    edit-2
    8 months ago

    There’s a local llama subreddit with a lot of good information and 4chan’s /g/ board will usually have a good thread with a ton of helpful links in the first post. Don’t think there’s anything on lemmy yet. You can run some good models on a decent home pc but training and fine tuning will likely require renting out some cloud gpus.

  • amzd@kbin.social
    link
    fedilink
    arrow-up
    1
    ·
    8 months ago

    ollama + codellama works perfect, I use it from neovim with a plug-in called gen-nvim I think

  • CubitOom@infosec.pub
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 months ago

    Checkout ollama.

    There’s a lot of models you can pull from the official library.

    Using ollama, you can also run external gguf models found on places like huggingface if you use a modelfile with something as simple as

    echo "FROM ~/Documents/ollama/models/$model_filepath" >| ~/Documents/ollama/modelfiles/$model_name.modelfile
    
  • beta_tester@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    8 months ago

    Not with success but I’m using huggingface since a couple of days. You may want to have a look into it

  • TCB13@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    4
    ·
    8 months ago

    Yes, mostly https://gpt4all.io/ only to find out that even the “uncensored” models are bullshit and won’t even provide you with a Windows XP Pro key. That’s kind of my benchmark for models nowadays. :P