- Pyckle account (free tier works)
- Python 3.9+ or curl
- API key from your dashboard
Embeddings turn text into vectors — fixed-length arrays of floats that encode meaning. Once you have a vector, you can compare it to others with cosine similarity, feed it into a search index, or cluster it. This guide gets you from zero to a working embedding call in five minutes.
Step 1: Get Your API Key from the Dashboard
Log into your Pyckle dashboard and navigate to Settings → API Keys. Create a new key and copy it somewhere safe — you won't see it again. Set it as an environment variable so it never touches your code.
export PYCKLE_API_KEY="pyk_live_your_key_goes_here"
Every request authenticates with this key via the Authorization: Bearer header. If a request returns 401, this is the first thing to check.
Never hardcode your API key in source files. Use environment variables or a secrets manager. Leaked keys can be rotated from the dashboard under Settings → API Keys → Revoke.
Step 2: Make Your First Embedding Request with curl
The Pyckle embedding endpoint uses the standard embedding API format. The request shape is identical — model, input, done. Point it at https://api.pyckle.dev/v1 and you're live.
curl https://api.pyckle.dev/v1/embeddings \
-H "Authorization: Bearer $PYCKLE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "pyckle-embed-general",
"input": "How do I reduce customer churn in a SaaS business?"
}'
You'll get back a JSON response immediately. The embedding vector lives at data[0].embedding. It's a list of floats — that's your text, now represented as a point in high-dimensional space.
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0231, -0.0847, 0.1203, ..., 0.0512]
}
],
"model": "pyckle-embed-general",
"usage": {
"prompt_tokens": 13,
"total_tokens": 13
}
}
Drop-in compatible means your existing embedding code works against Pyckle with one change: swap the base URL. No SDK changes, no refactoring, no new abstractions.
Step 3: Embed Text with the Python Client
Install the openai Python package if you don't have it — it serves as the compatible client for Pyckle's API. Point it at Pyckle's base URL and pass your API key. Everything else is standard SDK usage.
pip install openai
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["PYCKLE_API_KEY"],
base_url="https://api.pyckle.dev/v1",
)
response = client.embeddings.create(
model="pyckle-embed-general",
input="How do I reduce customer churn in a SaaS business?",
)
vector = response.data[0].embedding
print(f"Dimensions: {len(vector)}")
print(f"First 5 values: {vector[:5]}")
print(f"Tokens used: {response.usage.total_tokens}")
Run it. You'll see the dimension count and a sample of the float values printed to stdout. A 1536-dimension result means you're using the general-purpose model. If you switch to the code-optimized model, you'll get 384 dimensions instead — smaller, faster, tuned for source code retrieval.
Swap the base_url back to your previous provider's endpoint and use their key. The same client code runs — that's the point of a compatible API. You can benchmark both services side by side without touching your application logic.
Step 4: Embed a Batch of Strings
Sending one string at a time is fine for experimentation. In production, batch your inputs. The API accepts a list under input and returns a corresponding list under data, ordered by index. One round trip, many vectors.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["PYCKLE_API_KEY"],
base_url="https://api.pyckle.dev/v1",
)
texts = [
"How do I reduce customer churn in a SaaS business?",
"What is the difference between gross and net revenue retention?",
"How do cohort analysis and churn rate relate to LTV?",
"Best practices for customer success team structure at Series B.",
]
response = client.embeddings.create(
model="pyckle-embed-general",
input=texts,
)
for item in response.data:
dims = len(item.embedding)
print(f"[{item.index}] {dims}d — {texts[item.index][:60]}")
print(f"\nTotal tokens: {response.usage.total_tokens}")
The same pattern works with curl — pass an array as the input value and the response data array will have one entry per string, each with its own index field so you can match them back to your originals even if you parallelize across batches.
curl https://api.pyckle.dev/v1/embeddings \
-H "Authorization: Bearer $PYCKLE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "pyckle-embed-general",
"input": [
"How do I reduce customer churn in a SaaS business?",
"What is net revenue retention?",
"How does LTV relate to churn rate?"
]
}'
Step 5: Understand the Response — Dimensions, Model, Usage
Every response has the same structure. Know what each field is telling you before you start building on top of it.
{
"object": "list", // always "list" for embedding responses
"data": [
{
"object": "embedding", // always "embedding"
"index": 0, // position in your input array
"embedding": [...] // the vector — list of float32 values
}
],
"model": "pyckle-embed-general", // model that produced the embedding
"usage": {
"prompt_tokens": 42, // tokens in your input
"total_tokens": 42 // same as prompt_tokens for embeddings
}
}
Dimensions depend on which model you use. pyckle-embed-general produces 1536-dimensional vectors — good for semantic search over prose, documents, and Q&A. pyckle-embed-code produces 384-dimensional vectors — smaller and purpose-built for source code retrieval, function lookup, and codebase search. Fewer dimensions means faster similarity search and lower storage cost at scale.
Model tells you exactly what produced the vector. Store this alongside your embeddings in your database. If you re-embed with a different model later, vectors from different models are not comparable — cosine similarity across model boundaries is meaningless.
Usage tracks token consumption for billing. Embedding tokens are cheap, but at scale (millions of documents) it adds up. Batch your inputs and cache embeddings for static content — there's no reason to re-embed text that hasn't changed.
PyckLM is Pyckle's code embedding model — trained from scratch on code search tasks and deployed as the engine behind semantic search in the Pyckle platform. The public embedding API exposes PyckLM for code retrieval and a general-purpose model for prose. You pick the model that fits your retrieval task.
To compare two embeddings in Python, use cosine similarity. NumPy makes this a one-liner:
import numpy as np
def cosine_similarity(a, b):
a, b = np.array(a), np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# 1.0 = identical meaning, 0.0 = unrelated, -1.0 = opposite
score = cosine_similarity(response.data[0].embedding, response.data[1].embedding)
print(f"Similarity: {score:.4f}")
Values above 0.85 are typically strong semantic matches. The threshold that works for your use case depends on your data — measure it against labeled pairs rather than guessing.