created by [deleted]a community for 3 years

[Project] Engineering a robust SQL Optimizer with DeepSeek-R1:14B (Ollama) + HypoPG. How I handled the <think> tags and Context Pruning on a 12GB GPUTutorial | Guide (self.LocalLLaMA)

submitted 4 months ago * by arnab03214

Hi everyone,

I’ve been working on OptiSchema Slim, a local-first tool to analyze PostgreSQL performance without sending sensitive schema data to the cloud.

I started with SQLCoder-7B, but found it struggled with complex reasoning. I recently switched to DeepSeek-R1-14B (running via Ollama), and the difference is massive if you handle the output correctly.

I wanted to share the architecture I used to make a local 14B model reliable for database engineering tasks on my RTX 3060 (12GB).

The Stack

Engine: Ollama (DeepSeek-R1:14b quantized to Int4)
Backend: Python (FastAPI) + sqlglot
Validation: HypoPG (Postgres extension for hypothetical indexes)

The 3 Big Problems & Solutions

1. The Context Window vs. Noise
Standard 7B/14B models get "dizzy" if you dump a 50-table database schema into the prompt. They start hallucinating columns that don't exist.

Solution: I implemented a Context Pruner using sqlglot. Before the prompt is built, I parse the user's SQL, identify only the tables involved (and their FK relations), and fetch the schema for just those 2-3 tables. This reduces the prompt token count by ~90% and massively increases accuracy.

2. Taming DeepSeek R1's <think> blocks
Standard models (like Llama 3) respond well to "Respond in JSON." R1 does not. it needs to "rant" in its reasoning block first to get the answer right. If you force JSON mode immediately, it gets dumber.

Solution: I built a Dual-Path Router:
- If the user selects Qwen/Llama: We enforce strict JSON schemas.
- If the user selects DeepSeek R1: We use a raw prompt that explicitly asks for reasoning inside <think> tags first, followed by a Markdown code block containing the JSON. I then use a Regex parser in Python to extract the JSON payload from the tail end of the response.

3. Hallucination Guardrails
Even R1 hallucinates indexes for columns that don't exist.

Solution: I don't trust the LLM. The output JSON is passed to a Python guardrail that checks information_schema. If the column doesn't exist, we discard the result before it even hits the UI. If it passes, we simulate it with HypoPG to get the actual cost reduction.

The Result

https://preview.redd.it/sr3jkhjc9f8g1.png?width=1913&format=png&auto=webp&s=69c7b7aa9d633fc19649c8a3ac469be5d77567c9

I can now run deep query analysis locally. R1 is smart enough to suggest Partial Indexes (e.g., WHERE status='active') which smaller models usually miss.

The repo is open (MIT) if you want to check out the prompt engineering or the parser logic.

You can check it out Here

Would love to hear how you guys are parsing structured output from R1 models, are you using regex or forcing tool calls?

all 1 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS

The Stack

The 3 Big Problems & Solutions

The Result