# Getting Started with the Pyckle Embeddings API

This guide covers everything you need to go from no account to making your first embedding request. Estimated time: 10 minutes.

---

## Prerequisites

- A pyckle.co account (free to create)
- Python 3.9+ or Node.js 18+ (examples in both)
- `httpx` (Python) or `node-fetch` / native `fetch` (Node.js)

---

## Step 1: Get an API Key

**Free tier (via MCPize):**

1. Go to [mcpize.com](https://mcpize.com) and search for Pyckle
2. Install the Pyckle MCP server
3. MCPize provisions a free-tier key automatically — it's included in the MCP configuration

**Pro tier (direct):**

1. Go to [pyckle.co/products](https://pyckle.co/products)
2. Scroll to the Embeddings API section
3. Choose monthly or yearly, click checkout
4. After purchase, your `pk_live_` API key is shown immediately and emailed to you

Pro keys are provisioned instantly. No waiting for approval.

---

## Step 2: Make Your First Request

### Python

```python
import httpx

API_KEY = "pk_live_your_key_here"  # pragma: allowlist secret

response = httpx.post(
    "https://api.pyckle.co/v1/embeddings",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "input": ["where does JWT token validation happen"],
        "model": "pycklelm-1",
    },
    timeout=30,
)
response.raise_for_status()

result = response.json()
embedding = result["data"][0]["embedding"]

print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
```

Output:
```
Embedding dimensions: 768
First 5 values: [0.023, -0.041, 0.087, 0.012, -0.059]
```

### Node.js

```javascript
const apiKey = "pk_live_your_key_here"; // pragma: allowlist secret

const response = await fetch("https://api.pyckle.co/v1/embeddings", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    input: ["where does JWT token validation happen"],
    model: "pycklelm-1",
  }),
});

const result = await response.json();
const embedding = result.data[0].embedding;

console.log(`Embedding dimensions: ${embedding.length}`);
```

### cURL

```bash
curl -X POST https://api.pyckle.co/v1/embeddings \
  -H "Authorization: Bearer pk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["where does JWT token validation happen"],
    "model": "pycklelm-1"
  }'
```

---

## Step 3: Batch Embeddings

The API accepts up to 100 texts per request. Batch your inputs to minimize round trips:

```python
import httpx
from pathlib import Path

def embed_texts(texts: list[str], api_key: str) -> list[list[float]]:
    """Embed a list of texts, up to 100 per call."""
    if len(texts) > 100:
        raise ValueError("Max 100 texts per request")

    response = httpx.post(
        "https://api.pyckle.co/v1/embeddings",
        headers={"Authorization": f"Bearer {api_key}"},
        json={"input": texts, "model": "pycklelm-1"},
        timeout=30,
    )
    response.raise_for_status()

    # Preserve order — API returns embeddings in input order
    data = response.json()["data"]
    return [item["embedding"] for item in sorted(data, key=lambda x: x["index"])]

# Embed multiple code snippets at once
code_snippets = [
    "def validate_token(token: str) -> bool: ...",
    "def create_user(email: str, password: str) -> User: ...",
    "async def send_welcome_email(user_id: int) -> None: ...",
]

embeddings = embed_texts(code_snippets, api_key="pk_live_...")
print(f"Got {len(embeddings)} embeddings")
```

---

## Step 4: Handle Rate Limits

Free tier: 10 requests per minute.
Pro tier: 60 requests per minute.

The API returns `429 Too Many Requests` when rate limited. Handle it with exponential backoff:

```python
import time
import httpx

def embed_with_retry(
    texts: list[str],
    api_key: str,
    max_retries: int = 3
) -> list[list[float]]:
    for attempt in range(max_retries):
        try:
            response = httpx.post(
                "https://api.pyckle.co/v1/embeddings",
                headers={"Authorization": f"Bearer {api_key}"},
                json={"input": texts, "model": "pycklelm-1"},
                timeout=30,
            )

            if response.status_code == 429:
                wait = 2 ** attempt  # 1s, 2s, 4s
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
                continue

            response.raise_for_status()
            data = response.json()["data"]
            return [item["embedding"] for item in sorted(data, key=lambda x: x["index"])]

        except httpx.TimeoutException:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)

    raise RuntimeError("Max retries exceeded")
```

For large codebases (1000+ files), use the async version to maximize throughput within your rate limit:

```python
import asyncio
import httpx

async def embed_large_codebase(
    all_texts: list[str],
    api_key: str,
    requests_per_minute: int = 60,
) -> list[list[float]]:
    batch_size = 100
    batches = [all_texts[i:i+batch_size] for i in range(0, len(all_texts), batch_size)]

    # Calculate delay between requests to stay under rate limit
    delay = 60 / requests_per_minute  # seconds between requests

    results = []
    async with httpx.AsyncClient() as client:
        for i, batch in enumerate(batches):
            if i > 0:
                await asyncio.sleep(delay)

            response = await client.post(
                "https://api.pyckle.co/v1/embeddings",
                headers={"Authorization": f"Bearer {api_key}"},
                json={"input": batch, "model": "pycklelm-1"},
                timeout=30,
            )
            response.raise_for_status()

            data = response.json()["data"]
            batch_embeddings = [item["embedding"] for item in sorted(data, key=lambda x: x["index"])]
            results.extend(batch_embeddings)

            print(f"Indexed {min((i+1)*batch_size, len(all_texts))}/{len(all_texts)} chunks")

    return results
```

---

## Step 5: Compute Similarity

Once you have embeddings, compute cosine similarity to find the most relevant chunks:

```python
import numpy as np

def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr = np.array(a)
    b_arr = np.array(b)
    return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))

def find_most_similar(
    query_embedding: list[float],
    corpus_embeddings: list[list[float]],
    corpus_texts: list[str],
    top_k: int = 5
) -> list[dict]:
    scores = [
        (i, cosine_similarity(query_embedding, emb))
        for i, emb in enumerate(corpus_embeddings)
    ]
    scores.sort(key=lambda x: x[1], reverse=True)

    return [
        {"text": corpus_texts[i], "score": score}
        for i, score in scores[:top_k]
    ]

# Example usage
code_samples = [
    "def validate_jwt(token: str) -> dict: ...",
    "def create_user(email: str) -> User: ...",
    "def send_email(to: str, subject: str) -> None: ...",
    "def check_permissions(user: User, resource: str) -> bool: ...",
]

# Embed the corpus
corpus_embeddings = embed_texts(code_samples, api_key="pk_live_...")

# Embed a query
query = "where does token verification happen"
query_embedding = embed_texts([query], api_key="pk_live_...")[0]

# Find most similar
results = find_most_similar(query_embedding, corpus_embeddings, code_samples)
for r in results:
    print(f"Score: {r['score']:.3f} | {r['text'][:60]}...")
```

---

## API Reference

### Endpoint

```
POST https://api.pyckle.co/v1/embeddings
```

### Headers

| Header | Value |
|--------|-------|
| `Authorization` | `Bearer pk_live_your_key` |
| `Content-Type` | `application/json` |

### Request Body

```json
{
  "input": ["text1", "text2", "..."],
  "model": "pycklelm-1"
}
```

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `input` | `string[]` | Yes | Texts to embed. Max 100 per request. |
| `model` | `string` | No | Model name. Default: `"pycklelm-1"` |

### Response

```json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.023, -0.041, 0.087, ...]
    }
  ],
  "model": "pycklelm-1",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12
  }
}
```

The response format is compatible with the OpenAI Embeddings API. If you're already using OpenAI embeddings, you can swap in the Pyckle endpoint with a base URL change.

### Rate Limits

| Tier | Requests/minute | Max input texts/request |
|------|----------------|------------------------|
| Free | 10 | 100 |
| Pro | 60 | 100 |
| Enterprise | Custom | 100 |

### Error Codes

| Code | Meaning |
|------|---------|
| `401` | Invalid or missing API key |
| `403` | Valid key, wrong tier (e.g., using free key on Pro endpoint) |
| `422` | Input validation error (e.g., more than 100 texts) |
| `429` | Rate limit exceeded |
| `500` | Internal server error — retry with backoff |

---

## Next Steps

- **Build a full code search index**: See the Developer's Guide to Semantic Code Search
- **Build a codebase RAG pipeline**: See Building RAG Systems for Codebases
- **Install the MCP server**: For Claude Code and other MCP-compatible AI assistants, install the Pyckle MCP server via MCPize to get seamless semantic code search

---

*Questions? Reach out at pyckle.co/contact or find us on the MCPize marketplace.*
