Enterprise Data Platform

Ingest. Enrich.
Embed. Search.

A pure ingestion and enrichment platform for any data source. Documents, APIs, databases, structured and unstructured — Vector normalises, enriches, embeds, and indexes it all.

Feed in PDFs, DOCX, HTML, Markdown, CSV, JSON, API responses, database exports, or raw text. Vector's pipeline chunks, enriches with metadata, generates GPU-accelerated embeddings, and serves sub-50ms semantic search with intelligent reranking.

<50ms
p95 Search Latency
Any
Data Source
1M+
Vectors / Node
100%
On-Premise
NEW — Candengo Mem

Your AI Remembers.

Cross-device, team-shared memory for AI coding agents. Every discovery, decision, and bugfix — captured automatically, synced via Candengo Vector, available everywhere.

terminal
$ npx candengo-mem init
Connected as david@team.com
$ claude
Session memory loaded: 12 observations
# Works across Claude, Codex, Cursor...

From Raw Data to Intelligence

A fully automated ingestion-to-query pipeline that handles any data source

📥
Ingest
Documents, APIs, databases, CSV, JSON, HTML — any structured or unstructured source
Enrich
Metadata extraction, entity tagging, classification, and format normalisation
Embed
GPU-accelerated dense vector embeddings via transformer models
🗃
Index
HNSW vector index in Qdrant for fast approximate nearest-neighbour search
🔍
Search + Rerank
Semantic retrieval with cross-encoder reranking for precision

Built for Production

Everything you need to ingest, enrich, and search data at enterprise scale

📥

Multi-Format Ingestion

Ingest from any source — PDFs, DOCX, HTML, CSV, JSON, API responses, database exports, plain text, and more. One pipeline handles structured and unstructured data alike.

Data Enrichment Pipeline

Automatic metadata extraction, entity recognition, classification tagging, and format normalisation. Raw data goes in, enriched, searchable knowledge comes out.

GPU-Accelerated Embeddings

CUDA-powered transformer models generate dense vector embeddings at throughput rates that leave CPU-only solutions behind. Toggle GPU on or off from the admin dashboard.

🎯

Cross-Encoder Reranking

Two-stage retrieval: fast vector recall followed by neural cross-encoder reranking. Precision jumps 15-30% over embedding-only search, with selectable reranker models.

📦

Workpack System

Drop-in ZIP bundles that pre-load domain knowledge, prompt templates, and advisory workflows. Install via drag-and-drop — no code changes required.

🔐

Multi-Tenant Isolation

Full tenant isolation with per-tenant API keys, user management, and scoped data access. Superadmin console for fleet oversight across all tenants.

📈

Real-Time Dashboard

Live metrics with 30-minute time-series charts. Track ingestion rate, search volume, p95 latencies, vector counts, and per-API-key usage analytics.

🔑

Scoped API Keys

Issue API keys with granular scopes — ingest, search, admin, or any combination. Per-key usage stats, rotation support, and instant revocation from the UI.

📊

Prometheus Metrics

Native /metrics endpoint exposing request counts, latency histograms, vector store size, and GPU utilisation. Plug straight into Grafana or your existing stack.

📑

Smart Chunking

Format-aware splitting with configurable overlap, metadata injection, and structure preservation. Handles tables, headers, nested JSON, and multi-level hierarchies.

🛡

PII-Safe Logging

Toggle PII-safe mode to strip sensitive data from all logs and audit trails. Built for regulated industries — healthcare, finance, legal — without sacrificing observability.

🔄

Source Connectors

Plug into REST APIs, webhooks, file shares, S3 buckets, and database connections. Schedule recurring ingestion or trigger on-demand — Vector pulls and processes automatically.

Workpacks

Pre-built knowledge bundles that turn Vector into a domain expert instantly

Advisory

IT Director Workpack

Strategic IT advisory covering digital transformation, cloud migration, cybersecurity governance, vendor selection, and technology roadmaps.

📄 50+ documents 🎯 Tuned prompts
Advisory

IT Manager Workpack

Operational IT guidance for infrastructure management, service desk optimisation, patch management, and team workflows.

📄 40+ documents 🎯 Tuned prompts
Advisory

Front Desk Workpack

Customer-facing knowledge base for reception, visitor management, booking systems, and front-of-house procedures.

📄 30+ documents 🎯 Tuned prompts
Compliance

GDPR Compliance Pack

Data protection policies, breach response procedures, DPIA templates, and regulatory guidance for GDPR compliance workflows.

📄 60+ documents ✅ Audit-ready
Industry

Healthcare Knowledge Pack

Clinical protocols, patient pathways, NHS Digital standards, and healthcare information governance frameworks.

📄 80+ documents 🛡 PII-safe
Industry

Legal Advisory Pack

Contract analysis templates, case law summaries, regulatory filing guides, and legal research workflows for in-house counsel.

📄 70+ documents 🔑 Scoped access

RESTful API.
Zero Friction.

Every capability in Candengo Vector is exposed through a clean, versioned REST API. Ingest documents, run semantic searches, and generate answers — all from a single endpoint. Authenticate with scoped API keys and get structured JSON responses.

  • POST /v1/ingest   Upload & embed documents
  • POST /v1/search   Semantic vector search
  • POST /v1/query   RAG-powered Q&A
  • GET /v1/documents   List ingested documents
  • DEL /v1/documents/{id}   Remove a document
  • GET /health   Health & readiness check
curl
# Ingest a document curl -X POST https://your-instance/v1/ingest \ -H "Authorization: Bearer cv_key_..." \ -F "file=@report.pdf" \ -F "metadata={\"dept\":\"legal\"}" # Semantic search curl -X POST https://your-instance/v1/search \ -H "Authorization: Bearer cv_key_..." \ -H "Content-Type: application/json" \ -d '{ "q": "data retention policy for EU clients", "top_k": 5, "rerank": true }' # RAG query with citations curl -X POST https://your-instance/v1/query \ -H "Authorization: Bearer cv_key_..." \ -H "Content-Type: application/json" \ -d '{ "q": "What is our GDPR breach notification process?", "cite_sources": true }'

Architecture

Production-ready stack, deployable in minutes

💫

FastAPI

Async Python backend with automatic OpenAPI schema generation and Pydantic validation

🖸

Qdrant

High-performance vector database with HNSW indexing, filtering, and payload storage

🌀

Transformer Models

Sentence-transformers for embeddings with optional CUDA acceleration on NVIDIA GPUs

💫

Docker Ready

Single container deployment with docker-compose. Mount a volume and go — zero configuration

Ready to Transform Your Knowledge Workflows?

Candengo Vector is enterprise software, licensed and deployed to your infrastructure. Self-hosted, air-gapped capable, with dedicated onboarding and support. Contact our sales team for pricing and a personalised demo.

Contact Sales

sales@candengo.com