Memes

51330 readers

493 users here now

Rules:

Be civil and nice.
Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

founded 6 years ago

MODERATORS

[email protected]

512

2 in a single week that is crazy (lemmy.ml)

submitted 5 months ago by [email protected] to c/[email protected]

20 comments fedilink hide all child comments

blob:https://phtn.app/bce94c48-9b96-4b8e-a4fd-e90166d56ed7

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 5 months ago (4 children)

Any use for programming? Preferably local hosting only?

[–] [email protected] 7 points 5 months ago* (last edited 5 months ago) (3 children)

I mean, if you have huge GPU, sure. Or at least 12GB free vram or a big Mac.

Local LLMs for coding is kinda a niche because most people don’t have a 3090 or 7900 lying around, and you really need 12GB+ free VRAM for the models to start being "smart" and even worth using over free LLM APIs, much less cheap paid ones.

But if you do have the hardware and the time to set a server up, the Deepseek R1 models or the FuseAI merges are great for "slow" answers where the model thinks things out for replying. Qwen 2.5 32B coder is great for quick answers on 24GB VRAM. Arcee 14B is great for 12GB VRAM.

Sometimes running a small model on a "fast" less vram efficient backend is better for stuff like cursor code completion.

[–] [email protected] 1 points 5 months ago (1 children)

Would a 12g 3060 work?

[–] [email protected] 1 points 5 months ago* (last edited 5 months ago)

Yes! Try this model: https://huggingface.co/arcee-ai/Virtuoso-Small-v2

Or the 14B thinking model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

But for speed and coherence, instead of ollama, I'd recommend running it through Aphrodite or TabbyAPI as a backend, depending if you prioritize speed or long inputs. They both act as generic OpenAI endpoints.

I'll even step you through it and upload a quantization for your card, if you want, as it looks like there's not a good-sized exl2 on huggingface.

load more comments (1 replies)