Upgrade to Pro — Custom PyckLM Tuning

Fine-tune the embedding model on your codebase so search results match your team's patterns, not generic ones.

🎧
Listen to this guide 4 min
Download MP3

Custom tuning adapts PyckLM's embedding weights to your codebase's vocabulary, patterns, and naming conventions — so search results get sharper the more you use Pyckle.

Prerequisites
  • Active Pyckle Pro subscription
  • At least 500 files indexed
  • Python 3.9+

Step 1: Understand What Custom Tuning Changes

Custom tuning fine-tunes PyckLM's embedding model on your indexed codebase. It learns your project's naming patterns, domain terms, and code structure — things a general model treats as noise. It does not re-index your files, change your ChromaDB schema, or affect any other project's index.

The tuned model replaces the default embedder for search and session context only. Indexing still uses the base model unless you re-index after switching.

Key Insight

Tuning improves recall for domain-specific terms. If your codebase uses names like OrchestratorDispatchLoop or internal acronyms, the base model often misses them. A tuned model won't.

Step 2: Trigger a Tuning Job from the CLI

Run the tuning command against your indexed project path. Pyckle pulls training pairs from your session history and chunk graph — so the more searches you've run, the better the training signal.

pyckle tune start --path /home/kp112081/your-project --epochs 3 --output-tag my-project-v1

The --output-tag flag names the tuned model checkpoint. Use something you'll recognize. The job runs server-side; your terminal returns a job ID immediately.

Warning

You need at least 200 prior search queries logged to generate meaningful training pairs. Run pyckle tune check --path /home/kp112081/your-project first — it tells you if you have enough signal before billing starts.

Step 3: Monitor Training Progress

Use the job ID returned at submission to poll status. Training typically finishes in 8–12 minutes for codebases under 10k chunks.

pyckle tune status --job-id tune_a3f92c

Output shows epoch progress, loss curve, and estimated time remaining. When status reads COMPLETE, the checkpoint is ready to activate.

pyckle tune status --job-id tune_a3f92c
# Job:     tune_a3f92c
# Status:  COMPLETE
# Epochs:  3/3
# Loss:    0.041 → 0.019
# Tag:     my-project-v1
# Ready:   yes
Try This

Pipe status to a watch loop so you don't have to keep re-running it: watch -n 30 'pyckle tune status --job-id tune_a3f92c'

Step 4: Switch to Your Tuned Model

Activate the checkpoint for your project. This updates the project config and swaps the embedder on next search — no restart needed.

pyckle tune activate --path /home/kp112081/your-project --tag my-project-v1

Confirm the switch took effect:

pyckle config show --path /home/kp112081/your-project | grep embedder
# embedder: tuned/my-project-v1

To revert to the base model at any time, run pyckle tune deactivate --path /home/kp112081/your-project. Your tuned checkpoints are preserved — deactivating does not delete them.

Key Insight

If you re-index after activating, Pyckle uses the tuned embedder to generate new chunk vectors. This compounds the quality gain — embeddings and the search model speak the same dialect.

Step 5: Measure Search Quality Improvement

Run the built-in quality benchmark against a set of queries. Pyckle compares results from the base model and your tuned model side by side using nDCG@10.

pyckle tune benchmark \
  --path /home/kp112081/your-project \
  --queries queries.txt \
  --tag my-project-v1

queries.txt is a plain text file — one query per line. Use searches you actually run, not synthetic ones. Real queries expose the real gap.

# Sample queries.txt
authentication middleware
retry logic for API calls
database connection pool
error handling in async tasks

Output reports nDCG@10 for base vs. tuned, plus per-query rank deltas. A 10–25% nDCG improvement is typical on the first tuning run.

Try This

Re-tune every two weeks if your codebase is actively growing. Session query history accumulates fast, and the second tuning run almost always beats the first.

← All Guides

More Resources

Free Ebooks

36 Technical Books

Download free ebooks on embeddings, RAG, and code search.

How-To Books

8 In-Depth Guides

Step-by-step technical books on AI-assisted development.