We Just Launched a Live Embeddings API. Here's the Whole Story.

For months, the only way to use PyckLM's embeddings was to install the MCP server locally, index your codebase, and let it run. That was intentional - we wanted to prove the model worked before selling access to it.

It works. The API is now live.

Here is the full story: what we built, why it took this long, and what you can do with it today.

How We Got Here

Pyckle started as a code search problem.

AI coding assistants were hallucinating, missing context, and returning vague answers on anything beyond a trivial codebase. The root cause was always the same: the tool didn't understand the code. It was matching keywords and hoping for the best.

We believed the fix was an embedding model that actually understood code - not general text, not documentation prose, but function signatures, naming conventions, internal abstractions, and the specific vocabulary teams use when they write software.

So we trained one.

Not a fine-tuned version of something off HuggingFace. A proprietary model trained on over 57,000 code-to-query pairs across five programming languages, plus domain-specific triplets drawn from real developer workflows.

We called it PyckLM.

What Makes PyckLM Different

Generic embedding models don't understand that validate_token is the answer to "where does session verification happen." They don't know that useAuthorizationMiddleware is about authentication. They weren't trained to map the gap between how developers describe something and how that thing is actually written.

PyckLM was.

The training objective is a contrastive triplet loss: for every training example, a query, a code chunk that answers it, and a chunk that doesn't. The model learns to pull queries toward their answers and push them away from noise. At scale, this trains the model to understand code the way developers actually use it.

The results validated the approach. On held-out triplets, cosine accuracy reached 91.6%. On real codebases:

L0 queries (query directly describes the code): near-perfect hit rate
L1 queries (query uses different vocabulary than the code): high hit rate - this is the core problem
L2 queries (query describes behavior or symptoms, not code): meaningful improvement over baselines, with room to grow

More importantly: search quality improves with usage. Real query logs become training data. The model fine-tunes on what users actually search for. A generic embedding model trained on the internet cannot do this. PyckLM can.

The API: What It Does

The Embeddings API exposes PyckLM as a hosted service. You send text - code, queries, documentation, anything - and you get back dense vector embeddings.

What you do with those embeddings is up to you. Common use cases:

Semantic code search - embed your codebase, embed queries, retrieve by cosine similarity. Find handleAuthCallback by searching "where does OAuth complete."

Retrieval-augmented generation - embed your documentation or codebase, retrieve relevant context before a model call, reduce hallucination and improve answer quality.

Code similarity and deduplication - find near-duplicate functions across a large codebase, surface candidates for consolidation.

Change impact analysis - embed function signatures before and after a refactor, surface semantically related functions that may be affected.

Knowledge base search - embed notes, architecture decisions, and runbooks, retrieve them via natural language.

The model was trained on code, but it handles mixed-content retrieval well. Technical prose, documentation, and code all embed in the same space.

Pricing and Access

Two tiers:

Free - available via MCPize marketplace integration. Rate limited to 10 requests per minute. Good for prototyping and small codebases.

Pro - monthly or yearly via direct checkout at pyckle.co/products. 60 requests per minute. Priority response. Direct checkout - no marketplace cut.

Enterprise - custom rate limits, SLA, private deployment options. Contact us.

Free tier keys are provisioned through MCPize. Pro keys are provisioned immediately after checkout - you get a pk_live_ key, and you're querying within 60 seconds.

How the Billing Architecture Works

We run two billing paths:

Path 1 - Direct (LemonSqueezy): Checkout on pyckle.co, webhook fires on purchase, API key provisioned with embeddings_pro tier immediately. No intermediary, no marketplace fee.

Path 2 - Marketplace (MCPize): Discovery and install via the MCPize MCP marketplace. MCPize sends a proxy secret with each request that bypasses the tier gate at the API layer.

Both paths produce working API keys. The difference is how they're issued and what revenue cut applies. We kept both because they serve different discovery patterns. Developers browsing MCPize find it there. Developers on pyckle.co find it there.

What's Next

The current model is strong. It is not the final model.

Fine-tuning on real query logs is the next step. Usage data from the API becomes training signal - actual searches, graded by whether they returned useful results.

Knowledge distillation is on the roadmap. A larger teacher model, a smaller student model that runs cheaper and faster at the same quality bar.

Domain adaptation is the long game. The more a developer uses the API, the more their specific codebase vocabulary gets represented in the model's training distribution.

Getting Started

The API endpoint is live. Docs, quickstarts, and integration examples are at pyckle.co.

For direct access: pyckle.co/products -> Embeddings API section -> checkout.

For MCP-native access: find Pyckle in the MCPize marketplace.

The model trained to understand your codebase. Now it's available to anything that can make an HTTP request.

Pyckle is building persistent memory for developer AI workflows - semantic search that gets more accurate the longer you use it.