Dquan’s LLM Notes

Posts

Apr 17, 2026
Harnesses Aren't Portable — Why Each CLI Agent Has Its Own
Apr 5, 2026
Anatomy of a Claude Code Session — What's Built-in, What's Configurable, and What You Control
Every time you launch Claude Code, a small orchestra of context layers assembles before you type a single character. The system prompt loads. Built-in tools register. Your CLAUDE.md files get read. Skills discover themselves. MCP servers connect. Memory loads from previous sessions. Most of this is invisible — and understanding it tells you where to invest your customization effort.
Apr 3, 2026
Skills vs Custom Commands in Claude Code — When to Use Which
If you’ve been building workflows in Claude Code, you’ve probably noticed two ways to create slash commands: Skills (.claude/skills/<name>/SKILL.md) and Custom Commands (.claude/commands/<name>.md). They both create /name in the slash menu. They both accept $ARGUMENTS. So what’s the difference, and when should you use each?
Apr 3, 2026
Prompt Engineering — Why It Works, Not Just How
There are hundreds of posts about how to write better prompts. This isn’t one of them. This post is about why prompts work — what’s happening mathematically when you add a system prompt, give few-shot examples, or describe the problem context. Once you understand the mechanism, the “tips and tricks” become obvious consequences.
Apr 3, 2026
Knowledge Graph RAG — The Promise of Structured Retrieval and the Hidden Cost of Building It
My thesis was on knowledge graph embeddings, so when GraphRAG started trending I was genuinely excited. Finally, knowledge graphs getting the attention they deserve in the LLM era. But having lived in that world, I also know what people aren’t talking about: the cost of actually building and maintaining a knowledge graph from scratch.
Apr 3, 2026
Claude Code as Your Team's Knowledge Layer — CLAUDE.md, Hooks, Skills, and the Onboarding Problem
Think about what happens when a new developer joins your team. There’s a knowledge transfer session — someone walks them through the architecture, the coding conventions, the “we tried X but it didn’t work” stories. They spend weeks absorbing tribal knowledge that lives in people’s heads and Slack threads.
Apr 2, 2026
Why Claude Code Works — And Why You Might Not Need RAG (Embedding) Anymore
I’ve spent the past year and a half building LLM-powered applications, from early RAG (embedding) pipelines to agentic coding workflows. Here’s what I’ve learned about why Claude Code succeeds where traditional embedding-based RAG often struggles.
Apr 2, 2026
From Prompt Hacks to Structured Output — How LLMs Learned to Speak JSON
When you build software, almost everything speaks JSON. APIs, configs, databases, frontend-backend communication — it’s all JSON. So when LLMs came along and could only return free-form text, the first thing we all wanted was: can you just give me a JSON object?
Apr 2, 2026
Prompt Priority — Who Wins When Instructions Conflict, and Why Caching Order Matters
There are two things about LLM prompts that most people don’t think about carefully enough: who wins when instructions conflict, and what order your prompt is actually assembled in. These are related, and once you understand both, you’ll structure your prompts very differently.
Apr 2, 2026
Prompt Caching — The Hidden Layer That Saves You Money and Time
If you’re building LLM-powered applications and not thinking about prompt caching, you’re probably paying more than you need to. This is one of those features that doesn’t get enough attention compared to model capabilities, but it has a direct impact on cost and latency.
Apr 2, 2026
PDF Meets LLM — The Tools, Trade-offs, and Pricing of Document Processing
PDF processing was one of the first things I worked on as an AI engineer. Back then it was all about OCR pipelines. Now with multimodal LLMs, you can send a document page as an image and ask the model to understand it. But that doesn’t mean OCR is dead — far from it.
Apr 2, 2026
How to Make LLM Output Consistent — Lessons from Building a Scoring System
If you’ve worked with LLMs long enough, you’ve hit this problem: you run the same prompt twice and get different results. For a chatbot, that’s fine. For a scoring system where you need reliable, repeatable judgments? It’s a real problem.
Apr 2, 2026
Why I Chose arq and RQ Over Celery for LLM Workloads
If you’re building LLM-powered applications with FastAPI, you need a task queue. LLM API calls are slow — 2 to 30 seconds per request. You can’t block your web server on that. But the default answer in the Python world has always been Celery, and for LLM workloads, Celery is overkill.