Retrieval-Augmented Generation (RAG) is the pattern behind almost every serious LLM application in production today. Instead of asking a model to answer from memory (which leads to hallucinations), you retrieve relevant context from your own data and inject it into the prompt. The model answers from that context, not from training data.
This post walks through the exact implementation I used for the AI chat on this portfolio.
The Stack
- PostgreSQL + pgvector β stores embeddings and handles ANN search via an HNSW index
- Voyage AI β generates 1024-dimension text embeddings
- Claude Sonnet β the generation step
- Laravel β ties it all together
Step 1: Schema
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE embeddings (
id bigserial PRIMARY KEY,
embeddable_type text NOT NULL,
embeddable_id bigint NOT NULL,
chunk_index int NOT NULL DEFAULT 0,
chunk_text text NOT NULL,
embedding vector(1024),
created_at timestamptz DEFAULT now()
);
CREATE INDEX embeddings_hnsw_idx
ON embeddings USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
The HNSW index gives sub-millisecond ANN search even with hundreds of thousands of vectors.
Step 2: Ingestion
For each piece of content (post, project description), I split the text into ~400-token chunks with a 50-token overlap, then embed each chunk:
$chunks = $this->splitter->split($text, chunkSize: 400, overlap: 50);
foreach ($chunks as $i => $chunk) {
$vector = $this->voyageClient->embed($chunk);
Embedding::updateOrCreate(
['embeddable_type' => $type, 'embeddable_id' => $id, 'chunk_index' => $i],
['chunk_text' => $chunk, 'embedding' => json_encode($vector)]
);
}
Step 3: Retrieval
When a question arrives, embed it and do a cosine similarity search:
$queryVector = $this->voyageClient->embed($question);
$chunks = DB::select("
SELECT chunk_text,
1 - (embedding <=> :v::vector) AS similarity
FROM embeddings
ORDER BY embedding <=> :v::vector
LIMIT 8
", ['v' => json_encode($queryVector)]);
Step 4: Generation
Build a system prompt with the retrieved chunks, then call Claude:
$context = collect($chunks)->pluck('chunk_text')->implode("\n\n---\n\n");
$response = $this->claude->messages()->create([
'model' => 'claude-sonnet-4-6',
'max_tokens' => 1024,
'system' => "You are an assistant answering questions about Al Amin Ahamed's work.\n\nContext:\n{$context}",
'messages' => [['role' => 'user', 'content' => $question]],
]);
Performance Notes
- HNSW
ef_search=40gives 99%+ recall at ~2ms p99 - Cache the query embedding for identical questions
- Stream the response via SSE β users see tokens as they arrive
The full source is on GitHub if you want to dig deeper.
Al Amin Ahamed
Senior software engineer & AI practitioner. Building things in Laravel, PHP, and TypeScript.
About me βOne email a month. No noise.
What I shipped, what I read, occasional deep dive. Unsubscribe anytime.