Skip to content

Building a RAG Pipeline in Laravel with pgvector

8 min read

Retrieval-Augmented Generation (RAG) is the pattern behind almost every serious LLM application in production today. Instead of asking a model to answer from memory (which leads to hallucinations), you retrieve relevant context from your own data and inject it into the prompt. The model answers from that context, not from training data.

This post walks through the exact implementation I used for the AI chat on this portfolio.

The Stack

  • PostgreSQL + pgvector β€” stores embeddings and handles ANN search via an HNSW index
  • Voyage AI β€” generates 1024-dimension text embeddings
  • Claude Sonnet β€” the generation step
  • Laravel β€” ties it all together

Step 1: Schema

CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE embeddings ( id bigserial PRIMARY KEY, embeddable_type text NOT NULL, embeddable_id bigint NOT NULL, chunk_index int NOT NULL DEFAULT 0, chunk_text text NOT NULL, embedding vector(1024), created_at timestamptz DEFAULT now() ); CREATE INDEX embeddings_hnsw_idx ON embeddings USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

The HNSW index gives sub-millisecond ANN search even with hundreds of thousands of vectors.

Step 2: Ingestion

For each piece of content (post, project description), I split the text into ~400-token chunks with a 50-token overlap, then embed each chunk:

$chunks = $this->splitter->split($text, chunkSize: 400, overlap: 50); foreach ($chunks as $i => $chunk) { $vector = $this->voyageClient->embed($chunk); Embedding::updateOrCreate( ['embeddable_type' => $type, 'embeddable_id' => $id, 'chunk_index' => $i], ['chunk_text' => $chunk, 'embedding' => json_encode($vector)] ); }

Step 3: Retrieval

When a question arrives, embed it and do a cosine similarity search:

$queryVector = $this->voyageClient->embed($question); $chunks = DB::select(" SELECT chunk_text, 1 - (embedding <=> :v::vector) AS similarity FROM embeddings ORDER BY embedding <=> :v::vector LIMIT 8 ", ['v' => json_encode($queryVector)]);

Step 4: Generation

Build a system prompt with the retrieved chunks, then call Claude:

$context = collect($chunks)->pluck('chunk_text')->implode("\n\n---\n\n"); $response = $this->claude->messages()->create([ 'model' => 'claude-sonnet-4-6', 'max_tokens' => 1024, 'system' => "You are an assistant answering questions about Al Amin Ahamed's work.\n\nContext:\n{$context}", 'messages' => [['role' => 'user', 'content' => $question]], ]);

Performance Notes

  • HNSW ef_search=40 gives 99%+ recall at ~2ms p99
  • Cache the query embedding for identical questions
  • Stream the response via SSE β€” users see tokens as they arrive

The full source is on GitHub if you want to dig deeper.

Share X / Twitter LinkedIn
A

Al Amin Ahamed

Senior software engineer & AI practitioner. Building things in Laravel, PHP, and TypeScript.

About me β†’

One email a month. No noise.

What I shipped, what I read, occasional deep dive. Unsubscribe anytime.