this post was submitted on 30 Jan 2025
492 points (99.6% liked)
Memes
46388 readers
2379 users here now
Rules:
- Be civil and nice.
- Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Yes! Try this model: https://huggingface.co/arcee-ai/Virtuoso-Small-v2
Or the 14B thinking model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
But for speed and coherence, instead of ollama, I'd recommend running it through Aphrodite or TabbyAPI as a backend, depending if you prioritize speed or long inputs. They both act as generic OpenAI endpoints.
I'll even step you through it and upload a quantization for your card, if you want, as it looks like there's not a good-sized exl2 on huggingface.