Skip to content
How I built RAG retrieval against a SQLite database using pure-PHP cosine similarity — when it scales, when it doesn't,...

RAG on SQLite: Pure-PHP Cosine Similarity Without pgvector

Al Amin Ahamed

Al Amin Ahamed

Senior Software Engineer

· Updated 16 hours ago 9 min read

pgvector is the default answer for vector search in Postgres. But this portfolio runs on SQLite in production — single-file backups, zero ops, fits the traffic profile. SQLite has no vector type, no HNSW index, no <=> cosine operator. So how do you do RAG without it?

The honest answer: pure-PHP cosine similarity, computed on the application side. It scales further than you'd think.

When SQLite + PHP Cosine Is Enough

Below ~50k chunks and ~10 QPS, scanning all vectors per query is fine. A 1024-dim cosine over 50k vectors takes ~150ms in PHP. Faster than network round-trips to a vector DB.

Above that, you want pgvector + HNSW. Below it, the simplicity of "one SQLite file = entire knowledge base" wins.

Schema

SQL
CREATE TABLE embeddings (
  id INTEGER PRIMARY KEY,
  embeddable_type TEXT NOT NULL,
  embeddable_id INTEGER NOT NULL,
  chunk_index INTEGER NOT NULL DEFAULT 0,
  chunk_text TEXT NOT NULL,
  embedding TEXT NOT NULL,
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX embeddings_owner_idx ON embeddings (embeddable_type, embeddable_id);

The embedding column is a JSON-encoded float array. SQLite stores it as TEXT — no extension required.

RetrievalService — Driver-Aware

The trick is one service that picks the right backend at runtime:

PHP
final class RetrievalService
{
    public function __construct(private EmbeddingService $embeddings) {}

    public function search(string $query, int $limit = 8): Collection
    {
        $vector = $this->embeddings->embed($query);

        return DB::connection()->getDriverName() === 'pgsql'
            ? $this->searchPgvector($vector, $limit)
            : $this->searchCosinePhp($vector, $limit);
    }
}

The PHP Cosine Path

PHP
private function searchCosinePhp(array $query, int $limit): Collection
{
    $queryNorm = $this->norm($query);

    return Embedding::query()
        ->select(['id', 'embeddable_type', 'embeddable_id', 'chunk_text', 'embedding'])
        ->cursor()
        ->map(function (Embedding $row) use ($query, $queryNorm) {
            $vector = json_decode($row->embedding, true);
            $score = $this->cosine($query, $vector, $queryNorm);

            return ['row' => $row, 'score' => $score];
        })
        ->sortByDesc('score')
        ->take($limit)
        ->values();
}

private function cosine(array $a, array $b, float $aNorm): float
{
    $dot = 0.0;
    $bNorm = 0.0;

    for ($i = 0, $n = count($a); $i < $n; $i++) {
        $dot += $a[$i] * $b[$i];
        $bNorm += $b[$i] * $b[$i];
    }

    return $dot / ($aNorm * sqrt($bNorm));
}

private function norm(array $v): float
{
    $sum = 0.0;
    foreach ($v as $x) {
        $sum += $x * $x;
    }
    return sqrt($sum);
}

cursor() instead of get() — embeddings are heavy (1024 floats × 4 bytes per chunk) and you don't want them all in memory.

The pgvector Path (For When You Outgrow It)

PHP
private function searchPgvector(array $query, int $limit): Collection
{
    $vector = '[' . implode(',', $query) . ']';

    return Embedding::query()
        ->selectRaw("*, 1 - (embedding <=> ?::vector) AS score", [$vector])
        ->orderByRaw("embedding <=> ?::vector", [$vector])
        ->limit($limit)
        ->get();
}

Same interface, different SQL. Switching backends is a single env var.

Numbers from Production

Portfolio embeddings table: 1,247 chunks. Per-query latency:

  • PHP cosine: 4-7ms
  • Embedding generation (Ollama mxbai-embed-large): 80-120ms
  • Total RAG retrieval: ~100ms

Embedding generation dominates. Optimizing the cosine math past ~1ms is wasted effort.

Migration Path

When you outgrow pure-PHP scan:

  1. Move SQLite → Postgres (pgloader handles this in one command)
  2. Add embedding vector(1024) column
  3. Backfill: UPDATE embeddings SET embedding = (json_text)::vector
  4. Create HNSW index
  5. Flip the env var

The RetrievalService doesn't change. The interface absorbed the difference from day one.

When Not To Do This

  • Multi-tenant apps with per-tenant embeddings — Postgres row-level security is cleaner
  • More than 100k chunks — full scan latency stops being acceptable
  • Multiple workers needing concurrent writes — SQLite's single-writer model bottlenecks

For a personal site, a small SaaS, or a side project, pure-PHP cosine on SQLite is the right amount of engineering.

Share 𝕏 in
Al Amin Ahamed

Al Amin Ahamed

Senior software engineer & AI practitioner. 5+ years shipping Laravel platforms, WordPress plugins, WooCommerce extensions, and AI-driven products.

About me →

More from the blog

Need this kind of work shipped?

Available for freelance and consulting.

Laravel platforms, WordPress plugins, WooCommerce extensions, and AI integrations.