I maintain several Python services that each manage their own embedding pipeline. The pattern is always the same: API calls to an external model, error handling, retry logic, something to keep the embeddings in sync with the rows. I wanted to eliminate that boilerplate entirely: not move it elsewhere, but declare it in the schema and let the database handle it.
The result is ai-native-pg: a PostgreSQL extension that adds ai.embed() and ai.classify() as IMMUTABLE SQL functions running local inference via ONNX Runtime with no external API calls. I designed the interface and supervised the implementation. The code was written by AI. Four days of spare time, $127 in tokens.
How the AI was used
Almost every line of code was written by AI. The design was not.
I defined the foundation: the SQL interface, the IMMUTABLE/STABLE contract, the core architecture. From there, the design evolved through iteration. The AI generated code, I reviewed, corrected, and constrained it, and both the implementation and the design shifted as we worked through what actually held up.
It did not work cleanly every time. The AI would confidently produce code that looked right but violated PostgreSQL semantics in ways that only surface under specific conditions. Recognizing when an approach was wrong, not just broken, was the consistent human contribution.
The AI accelerated implementation. The design and correctness remained mine.
Token cost
Total token cost across the full project was $127, covering implementation, iteration, tests, and documentation. It does not account for human time, domain expertise, or the design decisions that shaped the work.
314 million tokens total. The low cost is almost entirely explained by prompt caching. The same context was reused heavily across a multi-day session, so most of those tokens were reads against a warm cache rather than fresh input. Without caching, the same work would have cost around $950.
The experiment was about feasibility, not eliminating engineering work.
What this actually looks like
In most systems, embedding generation is an application concern: you call an external model before writing to the database and manage consistency yourself. The idea here is different: move that logic into the schema, so the model runs at write time as part of the row definition.
Start with a running database:
docker run --rm -p 5432:5432 -e POSTGRES_PASSWORD=postgres ghcr.io/dmonroy/ai-native-pg:dev
Generate an embedding:
SELECT ai.embed('Postgres can do more than you think');
That returns a 768-dimensional vector. No API call. No network round-trip. The model is already loaded.
Now use it in a schema. A product reviews table where embeddings are generated automatically on every insert:
CREATE TABLE reviews (
id serial primary key,
body text,
embedding vector GENERATED ALWAYS AS (ai.embed(body)) STORED
);
CREATE INDEX ON reviews USING hnsw (embedding vector_cosine_ops);
Insert rows normally:
INSERT INTO reviews (body) VALUES
('The product exceeded my expectations'),
('Terrible quality, broke after a week'),
('Decent value for the price');
Query by semantic similarity:
SELECT body
FROM reviews
ORDER BY embedding <=> ai.embed('poor build quality')
LIMIT 3;
Classification works the same way. Add a sentiment label as a generated column:
ALTER TABLE reviews
ADD COLUMN sentiment text
GENERATED ALWAYS AS (
ai.classify(body, ARRAY['positive', 'negative', 'neutral'])
) STORED;
One practical note: adding a stored generated column to an existing table causes a full table rewrite. On a large table, plan for that cost before running this against production data.
Now every insert gets both an embedding and a sentiment label with no application code involved:
INSERT INTO reviews (body) VALUES ('Outstanding, would buy again');
SELECT body, sentiment FROM reviews;
And once the data is enriched at write time, reading it requires nothing special:
SELECT body FROM reviews WHERE sentiment = 'negative';
No embeddings. No similarity operators. No model at query time. Plain SQL against a plain column.
That is what AI-native means here: AI expressed at the schema level, running as part of normal database operations, with no external coordination required.
IMMUTABLE functions
The core design decision is that ai.embed and the ai.classify variants are declared IMMUTABLE or STABLE in the PostgreSQL catalog, depending on whether classification is against a text array or a PostgreSQL enum type.
PostgreSQL's IMMUTABLE contract says the function returns the same result for the same arguments and does not do database lookups or use information not directly present in its arguments. The optimizer uses this to pre-evaluate constant-argument calls, cache results, and allow functions to appear in generated column definitions and expression indexes.
The extension satisfies this by convention: no I/O happens during function execution. The ONNX model is loaded from disk once per backend process, via a _PG_init() hook that runs when the shared library is loaded into each backend. After that, every call to ai.embed is pure computation: tokenize the input, run matrix operations, normalize the output, return the vector. No file reads. No network calls. This is why each backend process holds its own copy of the model, and why connection pooling matters. This pattern is not explicitly endorsed in the PostgreSQL spec, but it is widely used by extension authors for exactly this reason.
The _PG_init() pattern used here is established community convention for extension authors who need expensive setup done once per session rather than once per call. There is no single canonical reference from the PostgreSQL project for this pattern, but it is widely used and understood by extension authors.
This is why ai.embed can be used in a generated column:
CREATE TABLE docs (
content text,
embedding vector GENERATED ALWAYS AS (ai.embed(content)) STORED
);
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);
Every inserted row automatically gets an embedding. The index stays current. No triggers, no background jobs, no application-side embedding logic.
The alternative is calling an external embedding API from application code. That works, but it means your embeddings live outside the database transaction. You have to manage consistency yourself: what happens when a row is written but the embedding call fails? What happens when you backfill? Generated columns handle this automatically. The embedding is part of the row, computed by the same engine that writes it, consistent by definition.
What IMMUTABLE does not cover
The enum-based ai.classify(content, NULL::my_enum) variant is STABLE, not IMMUTABLE. PostgreSQL enum types can be altered; labels can be added. Promising the same output for the same input across all future states of the type would be wrong. STABLE means consistent within a single transaction, which is the correct and honest contract.
This also means the enum variant cannot appear in a GENERATED ALWAYS AS column definition. PostgreSQL requires IMMUTABLE for generated columns. The generated column examples above use the text-array variant, which is IMMUTABLE because the label set is fixed in the function arguments. If you want auto-classification against a PostgreSQL enum type at write time, use a trigger instead.
Memory and connections
The model lives in process-private memory, not shared memory. Each backend process (one per active connection) holds its own copy. At 137MB per backend for nomic-embed-text-v1.5, 50 connections costs 6.5GB of RAM. Connection pooling (PgBouncer or similar) is required for production. This is the same constraint that applies to any extension loading significant data per backend; it is not specific to AI models.
The model stack
ONNX Runtime
ONNX Runtime is the inference engine. The model is exported from its training framework (PyTorch, in this case) into the ONNX format once, then loaded and run by ONNX Runtime with no Python, no training framework, no GPU driver required.
The reason for ONNX is determinism. CPU inference through ONNX Runtime is deterministic for practical purposes on a given binary build: the same model, same weights, same ONNX Runtime version, same CPU will produce the same output for the same input. ONNX Runtime does not formally guarantee bit-for-bit reproducibility across runs, but for a feedforward embedding model with no stochastic operators, the output is stable in practice. Across different CPU microarchitectures, floating-point results can differ at the bit level due to FMA and SIMD differences, but this does not violate PostgreSQL's IMMUTABLE contract, which applies to a single Postgres installation, not cross-machine portability.
GPU inference is out of scope for this experiment. Respecting PostgreSQL's volatility categories is a core design principle here, not a workaround, and GPU floating-point is not reproducible across runs, which rules out IMMUTABLE. That is a PostgreSQL constraint by design, and I am not working around it.
INT8 quantization reduces the model's on-disk and in-memory footprint and speeds up inference on CPUs with integer multiply-accumulate support. It does not eliminate floating-point math entirely: activations and dequantization still involve FP operations.
nomic-embed-text-v1.5
The current model is nomic-embed-text-v1.5. It generates 768-dimensional embeddings with a context window of 8192 tokens.
It was chosen for its Apache 2.0 license, which puts no restrictions on use or distribution. At 137MB quantized (on-disk size; ), it fits in a Docker image without inflating it into a problem. It is also a Matryoshka model: the architecture is trained so that the first N dimensions of the full 768-dim output are independently meaningful. You can truncate to 64, 128, 256, or 512 dimensions post-inference and retain a coherent embedding, trading storage and query cost against precision without switching models. The current implementation always returns 768 dims, which is the right tradeoff for this experiment.
The model scores 62.28 on MTEB, which puts it in the competitive range for its size class. It is not the highest-scoring model available, but it is close, and the other factors matter more for this use case.
Constraints
This is a database extension. It runs in your Postgres instance.
- It does not replace application logic. It provides primitives; what you build with them is still your responsibility
- It is not a vector database. pgvector provides the vector type and HNSW index support, but the goal is AI primitives in SQL, not vector search as a product
- The AI is not hidden.
ai.embedandai.classifyare explicit function calls visible in your schema definitions and queries - It is not autonomous. Nothing runs without a write, a query, or a schema definition that calls these functions
- Cost at scale is real. At 137MB per backend, 50 connections is 6.5GB of RAM. Connection pooling is not optional for production workloads
Why Postgres
Postgres already provides transactions, rich types, extension points, deterministic execution, and a trusted ecosystem. Those are the technical reasons.
The honest reason is also familiarity. I know Postgres well, trust it, and wanted to work in it. That shaped the question: how far can these primitives be pushed before reaching for something new? This project is one answer to that.
What this enables
The most immediate use for me is removing embedding logic from application code. I maintain Python services that each manage their own embedding pipeline: calls to an external API, error handling, retry logic, consistency checks. Replacing all of that with a generated column is not hypothetical; it is something I am actively doing across my own projects. I have run it on a 40,000-document corpus with embedding generation at 5-10ms per call and HNSW search performance of 21-32ms.
Because inference runs locally inside the database process, embeddings and classifications never leave your infrastructure. There are no API keys to manage, no egress costs, no external model provider to depend on. A row is written, the model runs, the result is stored. All within the same transaction, on the same machine, with no network hop.
It also means no changes to application code, no separate infrastructure to run. The schema carries the AI logic. Applications that already write to Postgres keep writing to Postgres. They get AI-enriched data back without knowing the model exists.
Code
- GitHub: https://github.com/dmonroy/ai-native-pg
- Extension name: ai-native-pg
- License: Apache 2.0
- Docker images: available for amd64 and arm64 via GitHub Container Registry. All builds carry a
-devsuffix at this stage — stable tags will follow a versioned release. ghcr.io/dmonroy/ai-native-pg:dev(alias for pg18-dev)ghcr.io/dmonroy/ai-native-pg:pg18-devghcr.io/dmonroy/ai-native-pg:pg17-devghcr.io/dmonroy/ai-native-pg:pg16-devghcr.io/dmonroy/ai-native-pg:pg15-devghcr.io/dmonroy/ai-native-pg:pg14-dev
Status
I use it in my own production workflows and consider it stable for that context: classification and semantic search on real datasets, including a 40,000-document corpus. The inference and schema contract are stable. What may still change is the SQL interface. API stability is the main gap before a versioned release.
The extension ships with 19 SQL test files and a C unit test suite covering embedding, classification, the IMMUTABLE contract, and concurrent operations. CI runs on every push against PostgreSQL 14-18.
This is a low-level systems design experiment testing whether constrained AI-assisted coding sessions can produce high-quality code on real problems, and what that means for developer experience at the systems level. Not a product announcement.
If you want to contribute, the highest-value gap right now is a test suite that exercises the IMMUTABLE contract under schema migrations. Open an issue or a draft PR. If you find a bug, file an issue and I'll look at it.