Job Description
We’re hiring a " Senior AI Engineer" to build production-grade components for an AI-first, data-centric platform. You will implement agentic capabilities (intent, planner, router/composer), integrate knowledge-graph reasoning alongside a strong RAG baseline, and instrument robust evaluation and observability. The ideal candidate writes clean, reliable code, understands LLM systems and data retrieval trade-offs, and can optimize for latency, quality, and cost. Key Responsibilities Agent Implementation: Build and harden Intent , Planner , and Router/Composer agents with typed JSON I/O, retries/timeouts, and idempotency; emit call-graph traces and correlation IDs. RAG Baseline & Retrieval: Implement document prep, chunking/embeddings, hybrid retrieval and (where available) reranking; maintain a high-quality baseline path for side-by-side comparisons. Prompt/Config Tuning: Version and tune prompts, routing policies (small→large model escalation), temperature/top-p settings, and caching; document routing outcomes and cost/latency budgets. Evaluation Hooks: Integrate test sets and scoring (faithfulness/correctness, precision/recall, multi-hop coverage, latency); enable automated re-evaluation on any change (model/agent/prompt/data). Observability & Cost Controls: Instrument traces/metrics/logs (token usage, latency P50/P95, error codes); surface cost-per-answer dashboards; implement backpressure and graceful degradation. Security & Guardrails: Enforce policy-as-code and entitlement checks (role/row/column), PII/PHI handling, content moderation, and HITL approval prompts for state-changing actions. Quality & CI/CD: Write unit/integration/contract tests; participate in PR reviews; ship via CI/CD with feature flags and environment promotion; maintain API/connector schemas and docs. Required Skills Applied LLM Engineering: 2-4+ years building production services; hands-on with LLM tool/function-calling, agent frameworks, and prompt/version management. Knowledge & Retrieval: Practical experience with Knowledge Graphs (RDF/SPARQL or property graph/Gremlin) and RAG pipelines (chunking, embeddings, retrieval/reranking). Data/Model Ecosystem: One or more vector DBs (pgvector, Pinecone, Weaviate, Milvus) and search (OpenSearch/Elasticsearch); familiarity with major model platforms (Azure OpenAI, Vertex, Anthropic, open-weights). Backend Skills: Proficiency in Python and/or TypeScript/Node.js ; strong REST/gRPC API design, JSON Schema/OpenAPI, retries/backoff/idempotency, and error taxonomies. Observability & Reliability: OpenTelemetry (traces/metrics/logs), performance profiling, resiliency patterns (circuit breakers, bulkheads, DLQ/queues). Security by Design: OIDC/SSO, secrets management, least-privilege access, audit logging, and secure coding for AI/data services. CI/CD & Testing: Git-based workflows, automated pipelines, unit/integration/contract tests, and environment promotion practices. Good to Have Skills Evaluation Engineering: Judge-model setups, A/B testing, rubric design, and regression dashboards. Performance & FinOps: Async I/O, caching strategies, connection pooling, and token/runtime budget enforcement. Runtime & Platform: Containers/Kubernetes, service mesh/API gateways, feature flags, blue/green or canary releases. UX for Explainability: Collaborating on rationale/explanations (source lists, subgraph summaries) and clear HITL approval prompts. This role is ideal for a hands-on engineer who enjoys turning advanced reasoning patterns into robust, observable services-balancing quality, safety, and cost at enterprise scale.