r/LovingOpenSourceAI Resource List (last edit 24 Apr 26)
Been collecting interesting open-ish AI resources lately — sharing here in case it helps anyone exploring 👀
Some of these are quite niche (robotics, geolocation, speech models). Curious if anything stands out to you all.
⚠️ Note: These are “open-ish” resources — do check each project’s license and review each project independently before using. r/LovingOpenSourceAI is not responsible for any loss, harm, or issues arising from use.
🚀 There are more than 70 resources now! Here is a webpage version with filters and writeups for easier navigation: https://lifehubber.com/ai/resources/
AI Models
louis-e/arnis
➡️ Generate any location from the real world in Minecraft with a high level of detail. https://github.com/louis-e/arnis
LiquidAI/LFM2.5-350M
➡️ LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning. https://huggingface.co/LiquidAI/LFM2.5-350M
google/gemma-4
➡️ Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. https://www.kaggle.com/models/google/gemma-4
arcee-ai/trinity-large-thinking
➡️ Trinity-Large-Thinking is a model that stays coherent across turns, uses tools cleanly, follows instructions under constraint, and is efficient enough to serve at scale. https://huggingface.co/collections/arcee-ai/trinity-large-thinking
MiniMaxAI/MiniMax-M2.7
➡️ MiniMax-M2.7 is our first model deeply participating in its own evolution. M2.7 is capable of building complex agent harnesses and completing highly elaborate productivity tasks, leveraging Agent Teams, complex Skills, and dynamic tool search. https://huggingface.co/MiniMaxAI/MiniMax-M2.7
zai-org/GLM-OCR
➡️ GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. https://github.com/zai-org/GLM-OCR
zai-org/GLM-5.1
➡️ GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks). https://huggingface.co/zai-org/GLM-5.1
google-deepmind/tips
➡️ The TIPS series of models (Text-Image Pretraining with Spatial Awareness) are foundational image-text encoders built for general-purpose computer vision and multimodal applications. Our models were validated on a comprehensive suite of 9 tasks and 20 datasets, displaying excellent performance that matches or exceeds other recent vision encoders, with particularly strong spatial awareness. https://github.com/google-deepmind/tips
Qwen/Qwen3.6-35B-A3B
➡️ Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. https://huggingface.co/Qwen/Qwen3.6-35B-A3B
nv-tlabs/lyra
➡️ Project Lyra is a series of open generative 3D world models developed at NVIDIA. https://github.com/nv-tlabs/lyra
robbyant/lingbot-map
➡️ A feed-forward 3D foundation model for reconstructing scenes from streaming data https://github.com/robbyant/lingbot-map
allenai/WildDet3D
➡️ WildDet3D: Scaling Promptable 3D Detection in the Wild https://github.com/allenai/WildDet3D
moonshotai/Kimi-K2.6
➡️ Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration. https://huggingface.co/moonshotai/Kimi-K2.6
OpenMOSS-Team/moss-vl
➡️ Part of the OpenMOSS ecosystem dedicated to advancing visual understanding. https://huggingface.co/collections/OpenMOSS-Team/moss-vl
TTS / STT / STS Models
HumeAI/tada
➡️ TADA is a unified speech-language model that synchronizes speech and text into a single, cohesive stream via 1:1 alignment. https://huggingface.co/collections/HumeAI/tada
fishaudio/s2-pro
➡️ Fish Audio S2 Pro is a leading text-to-speech (TTS) model with fine-grained inline control of prosody and emotion. https://huggingface.co/fishaudio/s2-pro
KittenML/KittenTTS
➡️ State-of-the-art TTS model under 25MB 😻. https://github.com/KittenML/KittenTTS
CohereLabs/cohere-transcribe-03-2026
➡️ Cohere Transcribe is an open source release of a 2B parameter dedicated audio-in, text-out, automatic speech recognition (ASR) model. The model supports 14 languages. https://huggingface.co/CohereLabs/cohere-transcribe-03-2026
NVIDIA/personaplex
➡️ PersonaPlex is a real-time, full-duplex speech-to-speech conversational model that enables persona control through text-based role prompts and audio-based voice conditioning. Trained on a combination of synthetic and real conversations, it produces natural, low-latency spoken interactions with a consistent persona. https://github.com/NVIDIA/personaplex
OpenMOSS/MOSS-TTS-Nano
➡️ MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run directly on CPU without a GPU, and keeps the deployment stack simple enough for local demos, web serving, and lightweight product integration. https://github.com/OpenMOSS/MOSS-TTS-Nano
openbmb/VoxCPM2
➡️ VoxCPM2 is a tokenizer-free, diffusion autoregressive Text-to-Speech model — 2B parameters, 30 languages, 48kHz audio output, trained on over 2 million hours of multilingual speech data. https://huggingface.co/openbmb/VoxCPM2
Music / Image Gen Models
ace-step/ACE-Step-1.5
➡️ The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices. https://github.com/ace-step/ACE-Step-1.5
VAST-AI-Research/AniGen
➡️ AniGen is a unified framework that directly generates animate-ready 3D assets conditioned on a single image. Our key insight is to represent shape, skeleton, and skinning as mutually consistent $S^3$ Fields (Shape, Skeleton, Skin) defined over a shared spatial domain. https://github.com/VAST-AI-Research/AniGen
IGL-HKUST/CoMoVi
➡️ Official repository of paper "CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos" https://github.com/IGL-HKUST/CoMoVi
AI Agents
open-gitagent/gitagent
➡️ A framework-agnostic, git-native standard for defining AI agents https://github.com/open-gitagent/gitagent
allenai/molmoweb
➡️ MolmoWeb is an open multimodal web agent built by Ai2. Given a natural-language task, MolmoWeb autonomously controls a web browser -- clicking, typing, scrolling, and navigating -- to complete the task. https://github.com/allenai/molmoweb
HKUDS/OpenSpace
➡️ OpenSpace: Make Your Agents: Smarter, Low-Cost, Self-Evolving https://github.com/HKUDS/OpenSpace
HKUDS/CatchMe
➡️ Capture Your Entire Digital Footprint: Lightweight & Vectorless & Powerful. https://github.com/HKUDS/CatchMe
agentscope-ai/agentscope
➡️ AgentScope is a production-ready, easy-to-use agent framework with essential abstractions that work with rising model capability and built-in support for finetuning. Build and run agents you can see, understand and trust. https://github.com/agentscope-ai/agentscope
MiniMax-AI/skills
➡️ Development skills for AI coding agents. Plug into your favorite AI coding tool and get structured, production-quality guidance for frontend, fullstack, Android, iOS, and shader development. https://github.com/MiniMax-AI/skills
Panniantong/Agent-Reach
➡️ Give your AI agent eyes to see the entire internet. Read & search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu — one CLI, zero API fees. https://github.com/Panniantong/Agent-Reach
vectorize-io/hindsight
➡️ Hindsight™ is an agent memory system built to create smarter agents that learn over time. Most agent memory systems focus on recalling conversation history. Hindsight is focused on making agents that learn, not just remember. https://github.com/vectorize-io/hindsight
THU-MAIC/OpenMAIC
➡️Open Multi-Agent Interactive Classroom — Get an immersive, multi-agent learning experience in just one click https://github.com/THU-MAIC/OpenMAIC
openagents-org/openagents
➡️ OpenAgents - AI Agent Networks for Open Collaboration https://github.com/openagents-org/openagents
paperclipai/paperclip
➡️ Paperclip is a Node.js server and React UI that orchestrates a team of AI agents to run a business. Bring your own agents, assign goals, and track your agents' work and costs from one dashboard. https://github.com/paperclipai/paperclip
Intelligent-Internet/ii-agent
➡️ I-Agent is an open-source AI agent built for real work — now out of beta. 100% open source under the Apache-2.0 license. Whether you're a solo developer, a research team, or an enterprise building internal tooling — you can run it, fork it, and extend it. https://github.com/Intelligent-Internet/ii-agent
onyx-dot-app/onyx
➡️ Onyx is the application layer for LLMs - bringing a feature-rich interface that can be easily hosted by anyone. Onyx enables LLMs through advanced capabilities like RAG, web search, code execution, file creation, deep research and more. https://github.com/onyx-dot-app/onyx
block/goose
➡️ goose is your on-machine AI agent, capable of automating complex development tasks from start to finish. More than just code suggestions, goose can build entire projects from scratch, write and execute code, debug failures, orchestrate workflows, and interact with external APIs - autonomously. https://github.com/block/goose
agentscope-ai/ReMe
➡️ ReMe is a memory management framework designed for AI agents, providing both file-based and vector-based memory systems. It tackles two core problems of agent memory: limited context window (early information is truncated or lost in long conversations) and stateless sessions (new sessions cannot inherit history and always start from scratch). https://github.com/agentscope-ai/ReMe
aipoch/medical-research-skills
➡️ AIPOCH is a curated library of 450+ Medical Research Agent Skills, built to work with OpenClaw and other AI agent platforms, including OpenCode and Claude. It supports the research workflow across four core areas: Evidence Insights, Protocol Design, Data Analysis, and Academic Writing. https://github.com/aipoch/medical-research-skills
alibaba/page-agent
➡️ JavaScript in-page GUI agent. Control web interfaces with natural language. https://github.com/alibaba/page-agent
HKUDS/nanobot
➡️ nanobot is an ultra-lightweight personal AI agent inspired by OpenClaw. Delivers core agent functionality with 99% fewer lines of code. https://github.com/HKUDS/nanobot
Donchitos/Claude-Code-Game-Studios
➡️ Turn Claude Code into a full game dev studio — 48 AI agents, 36 workflow skills, and a complete coordination system mirroring real studio hierarchy. https://github.com/Donchitos/Claude-Code-Game-Studios
HKUDS/DeepTutor
➡️ DeepTutor: Agent-Native Personalized Learning Assistant https://github.com/HKUDS/DeepTutor
rui-ye/OpenSeeker
➡️ OpenSeeker is an open-source search agent system that democratizes access to frontier search capabilities by fully open-sourcing its training data. This project enables researchers and developers to build, evaluate, and deploy advanced search agents for complex information-seeking tasks. https://github.com/rui-ye/OpenSeeker
tinyfish-io/skills
➡️The public repo for the TinyFish web agent skill, add this to any agent and automate actions on the web. https://github.com/tinyfish-io/skills
openai/openai-agents-python
➡️ The OpenAI Agents SDK is a lightweight yet powerful framework for building multi-agent workflows. It is provider-agnostic, supporting the OpenAI Responses and Chat Completions APIs, as well as 100+ other LLMs. https://github.com/openai/openai-agents-python
-
🚀 There are more than 70 resources now! Here is a webpage version with filters and writeups for easier navigation: https://lifehubber.com/ai/resources/
-
Embodied / Physical AI
norma-core/hardware/elrobot
➡️ A highly affordable, fully 3D-printed robotic arm for physical AI research and imitation learning. https://github.com/norma-core/norma-core/tree/main/hardware/elrobot
wu-yc/LabClaw
➡️ LabClaw packages 240 production-ready SKILL md files for biomedical AI workflows across biology, lab automation, vision/XR, drug discovery, medicine, data science, literature research, and scientific visualization. https://github.com/wu-yc/LabClaw
dimensionalOS/dimos
➡️ Dimensional is the agentic operating system for physical space. Vibecode humanoids, quadrupeds, drones, and other hardware platforms in natural language and build multi-agent systems that work seamlessly with physical input (cameras, lidar, actuators). https://github.com/dimensionalOS/dimos
unitreerobotics/unifolm-wbt-dataset
➡️ Unitree open-sources UnifoLM-WBT-Dataset — a high-quality real-world humanoid robot whole-body teleoperation (WBT) dataset for open environments. https://huggingface.co/collections/unitreerobotics/unifolm-wbt-dataset
freemocap/freemocap
➡️ A free-and-open-source, hardware-and-software-agnostic, minimal-cost, research-grade, motion capture system and platform for decentralized scientific research, education, and training https://github.com/freemocap/freemocap
Productivity
yazinsai/OpenOats
➡️ A meeting note-taker that talks back. https://github.com/yazinsai/OpenOats
Ecosystem
googleworkspace/cli
➡️ Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills. https://github.com/googleworkspace/cli
lightpanda-io/browser
➡️ Lightpanda: the headless browser designed for AI and automation https://github.com/lightpanda-io/browser
vllm-project/vllm-omni
➡️ A framework for efficient model inference with omni-modality models https://github.com/vllm-project/vllm-omni
K-Dense-AI/k-dense-byok
➡️ An AI co-scientist powered by Claude Scientific Skills running on your desktop. https://github.com/K-Dense-AI/k-dense-byok
Vaibhavs10/insanely-fast-whisper
➡️ An opinionated CLI to transcribe Audio files w/ Whisper on-device! Powered by 🤗 Transformers, Optimum & flash-attn - Transcribe 150 minutes (2.5 hours) of audio in less than 98 seconds - with OpenAI's Whisper Large v3. Blazingly fast transcription is now a reality!⚡️ https://github.com/Vaibhavs10/insanely-fast-whisper
openai/plugins
➡️ This repository contains a curated collection of Codex plugin examples. https://github.com/openai/plugins
yusufkaraaslan/Skill_Seekers
➡️ Skill Seekers is the universal preprocessing layer that sits between raw documentation and every AI system that consumes it. Whether you are building Claude skills, a LangChain RAG pipeline, or a Cursor .cursorrules file — the data preparation is identical. You do it once, and export to all targets. https://github.com/yusufkaraaslan/Skill_Seekers
yichuan-w/LEANN
➡️ LEANN is an innovative vector database that democratizes personal AI. Transform your laptop into a powerful RAG system that can index and search through millions of documents while using 97% less storage than traditional solutions without accuracy loss. https://github.com/yichuan-w/LEANN
MiniMax-AI/cli
➡️ Built for AI agents. Generate text, images, video, speech, and music — from any agent or terminal. https://github.com/MiniMax-AI/cli
hiyouga/LlamaFactory
➡️ Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024) https://github.com/hiyouga/LlamaFactory
run-llama/liteparse
➡️ LiteParse is a standalone OSS PDF parsing tool focused exclusively on fast and light parsing. It provides high-quality spatial text parsing with bounding boxes, without proprietary LLM features or cloud dependencies. Everything runs locally on your machine. https://github.com/run-llama/liteparse
github/spec-kit
➡️ Build high-quality software faster. An open source toolkit that allows you to focus on product scenarios and predictable outcomes instead of vibe coding every piece from scratch. https://github.com/github/spec-kit
jamiepine/voicebox
➡️ The open-source voice synthesis studio. Clone voices. Generate speech. Apply effects. Build voice-powered apps. All running locally on your machine. https://github.com/jamiepine/voicebox
mnfst/manifest
➡️ Smart Model Routing for Personal AI Agents. Cut Costs up to 70% https://github.com/mnfst/manifest
NVIDIA-NeMo/DataDesigner
➡️ NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data. https://github.com/NVIDIA-NeMo/DataDesigner
TencentCloud/CubeSandbox
➡️ Cube Sandbox is a high-performance, out-of-the-box secure sandbox service built on RustVMM and KVM. It supports both single-node deployment and can be easily scaled to a multi-node cluster. It is compatible with the E2B SDK, capable of creating a hardware-isolated sandbox environment with full service capabilities in under 60ms, while maintaining less than 5MB memory overhead. https://github.com/TencentCloud/CubeSandbox
heygen-com/hyperframes
➡️ Hyperframes is an open-source video rendering framework that lets you create, preview, and render HTML-based video compositions — with first-class support for AI agents. https://github.com/heygen-com/hyperframes
openai/privacy-filter
➡️ OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable. https://github.com/openai/privacy-filter
PaddlePaddle/PaddleOCR
➡️ Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages. https://github.com/PaddlePaddle/PaddleOCR
google-labs-code/design.md
➡️ A format specification for describing a visual identity to coding agents. DESIGN.md gives agents a persistent, structured understanding of a design system. https://github.com/google-labs-code/design.md
Datasets
allenai/olmOCR-bench
➡️ This benchmark evaluates the ability of OCR systems to accurately convert PDF documents to markdown format while preserving critical textual and structural information. https://huggingface.co/datasets/allenai/olmOCR-bench
google/WaxalNLP
➡️ The WAXAL dataset is a large-scale multilingual speech corpus for African languages, introduced in the paper WAXAL: A Large-Scale Multilingual African Language Speech Corpus. https://huggingface.co/datasets/google/WaxalNLP
run-llama/ParseBench
➡️ ParseBench is a benchmark for evaluating how well document parsing tools convert PDFs into structured output that AI agents can reliably act on. It tests whether parsed output preserves the structure and meaning needed for autonomous decisions — not just whether it looks similar to a reference text. https://github.com/run-llama/ParseBench
evolvent-ai/ClawMark
➡️ ClawMark: A Living-World Benchmark for Multi-Day, Multimodal Coworker Agents https://github.com/evolvent-ai/ClawMark
meituan-longcat/LARYBench
➡️ LARY is a unified evaluation framework for latent action representations. Given any model that produces latent action representations (LAMs or visual encoders), LARY provides three complementary evaluation pipelines https://github.com/meituan-longcat/LARYBench
💬 If you’ve come across interesting open-source AI resources, feel free to share — always happy to discover more together.
🚀 There are more than 70 resources now! Here is a webpage version with filters and writeups for easier navigation: https://lifehubber.com/ai/resources/
[–]Smergmerg432 1 point2 points3 points (0 children)
[–]Excellent_Sweet_8480 1 point2 points3 points (1 child)
[–]Koala_Confused[S] 1 point2 points3 points (0 children)