Skip to content
View vinzlercodes's full-sized avatar
⛰️
Scaling new Goals
⛰️
Scaling new Goals

Block or report vinzlercodes

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
vinzlercodes/README.md

Hi, I'm Vinayak Sengupta

AI Product & Platform Engineer | Agentic Workflows | LLM Systems | Retrieval, Fine-Tuning & Evals

I build governed agentic workflow platforms where LLMs, tools, human approvals, evals, and audit trails turn AI capability into usable enterprise products across healthcare, finance, risk, and developer automation.

LinkedIn Medium GitHub Email Resume

Vinayak Sengupta builds governed agentic workflow platforms across healthcare, finance, risk, developer tooling, and evals

Healthcare AI Finance Decisioning Risk Simulation Agent Tooling Evals


What I Build

Area Current focus
Agent platforms Multi-agent coordination, deterministic tool execution, human-in-the-loop confirmations, approval-gated writes, tool-call graphs, and enterprise workflow automation.
Domain AI products Healthcare prior authorization, evidence-adjusted finance decisioning, risk simulation, document intelligence, and workflow copilots with clear non-production and safety boundaries.
LLM operations SFT, DPO, PPO, QLoRA, Axolotl, vLLM, checkpoint recovery, model serving, and failure diagnostics for production AI workflows.
Retrieval and evaluation RAG/GraphRAG over long-form and graph-structured documents, ranking quality, BM25, MMR, reranking, deterministic evals, replay artifacts, and nDCG@k measurement.
Explainability and analytics Model-agnostic attribution, PDP workflows, KPI design, DuckDB/Arrow pipelines, SHAP alignment, and stakeholder-facing analytics products.
Open-source agent tooling Codex/Hermes remote control, AI engineering telemetry, artifact-format evaluation, reusable CLI workflows, and privacy-safe publication pipelines.
Platform engineering FastAPI services, TypeScript/Next.js apps, Kubernetes reconciliation loops, productized launch flows, health checks, observability, and reusable enterprise assets.

I care about systems that are measurable, debuggable, and useful to the people who have to run them after the demo is over.

Recent Work

Aible - Data Scientist, AI-Native Systems & Enterprise Workflows

  • Architected an NVIDIA NeMo Agent Toolkit LLM runtime for dynamic reasoning chains, multi-agent coordination, deterministic tool execution, and enterprise workflow automation.
  • Built a FastAPI execution layer that converts user/model intent into runnable agent configurations across tools, document workflows, prediction scoring, and human-in-the-loop approvals.
  • Designed a pluggable enterprise tool registry with persisted execution metadata and tool-call graphs for observability, reproducibility, workflow reuse, and operational debugging.
  • Productized an OpenClaw-based agent platform with sandbox launch flows, tenant configuration, health checks, gateway reachability checks, Slack access, streaming responses, and failure diagnostics.

AI Operations, Fine-Tuning & Reliability

  • Led a fault-tolerant fine-tune-and-serve platform for enterprise AI use cases, translating customer requirements into scalable experimentation and deployment workflows.
  • Implemented SFT and DPO workflows using Axolotl and QLoRA; automated checkpoint detection and recovery, reducing manual setup and monitoring effort by 80%.
  • Enabled a Fortune 50 telecom client to launch a security metadata classifier on schedule through a productionized fine-tuning and serving workflow.

Retrieval, Document Intelligence & Explainability

  • Designed retrieval workflows for graph-structured and long-form enterprise documents while balancing ranking quality, token constraints, modular experimentation, and production evaluation.
  • Improved retrieval nDCG@k by 25% through iterative tuning of BM25, Maximal Marginal Relevance, and reranking components.
  • Led a standardized model explainability/PDP analytics workflow that reduced per-feature computation time by 17x while maintaining median curve fidelity around 0.90.

Current Product Builds

Portfolio system map connecting healthcare, finance, risk, and developer tooling to a governed agent runtime

Build Domain What it explores
Open Prior Auth Workbench Healthcare FHIR-first prior authorization workbench for requirement discovery, deterministic questionnaire prefill, evidence intake, packet assembly, payer-status loops, human review, audit trails, and agent evals.
DecisionRisk Risk MiroFish-first decision rehearsal engine with evidence graphs, scenario ensembles, adversarial council review, transparent risk metrics, provenance validation, replay artifacts, and auditable Risk Dockets.
CanopyLedger (Coming Soon) Finance Private evidence-adjusted borrowing-base decisioning build for coffee trade finance, with immutable facility snapshots, reproducible collateral valuation, governed decisions, idempotent APIs, and integrity checks.
codex-telegram-remote Agent tooling Hermes-native Telegram control for Codex threads, workspace/status checks, remote steering, approval routing, and completion summaries through local app-server boundaries.
artifact-format-eval Agent evals API-key-free benchmark comparing Markdown, HTML, JSON-rendered, and notebook artifacts across cost, accessibility, security, reviewability, and reader-task coverage.

Technical Toolkit

Agentic AI & LLM systems multi-agent orchestration · tool registries · approval gates · runtime traces · deterministic evals · AgentOps · human-in-the-loop flows · RAG · GraphRAG · SFT · DPO · PPO · QLoRA · Axolotl · vLLM · LangChain · LlamaIndex · NVIDIA NeMo Toolkit · NeMo Guardrails · OpenAI · Vertex AI

Platforms, data & backend Python · TypeScript · SQL · Cypher · FastAPI · Flask · Next.js · PySpark · DuckDB · PostgreSQL · SQLite · MongoDB · Neo4j · Chroma · AWS · GCP · Docker · Kubernetes · GitHub Actions · OpenTelemetry · Langfuse

ML, product & analytics PyTorch · TensorFlow · Keras · scikit-learn · LightGBM · SHAP · ONNX · PDP · KPI design · stakeholder discovery · PRDs · MVP roadmaps · success metrics

Selected Public Systems

Project Signal
Open Prior Auth Workbench Healthcare agent workflow substrate with ToolNet-style tools, approvals, traces, deterministic evals, synthetic data, and standards-shaped local gateway routes.
DecisionRisk Risk simulation platform for consequential decision rehearsal, scenario replay, grounded rationale, dissent tracking, safety gates, and reviewable regression reports.
codex-telegram-remote Open-source agentic developer tooling for controlling Codex from Telegram through Hermes with local runtime state and approval boundaries.
artifact-format-eval Evaluation harness for agent-generated artifacts, measuring accessibility, security, reviewability, mutation impact, and reader-task coverage without API keys.
ai-coach-profile-publisher Privacy-safe publisher for AI Engineer Coach metrics, sanitized JSON dispatch, GitHub Actions rendering, and public SVG profile cards.

Foundations

Earlier work spans gaming-industry analytics, customer churn prediction, recommendation systems, disaster-response NLP pipelines, SATD refactoring recommendation, hierarchical-attention document classification, and breast histopathology image classification research. That foundation now feeds more product-shaped agent, retrieval, eval, and decision-workflow systems.

Writing & Research

Auto-updated from my Medium RSS feed.

I write technical explainers that connect system design to reader-visible tradeoffs: retrieval quality, long-document summarization, RAG failure modes, clustering, applied analytics, and how AI outputs should be evaluated as artifacts rather than demos.

Other work: PPO post-training for Llama text-to-SQL, SATD detection and refactoring recommendation, and histopathology carcinoma classification using multi-level spatial fusion.

Talks & Community

  • Authored the core problem statement and evaluation metrics for the UC Berkeley AI Summit 2023 - Data Science Hackathon.
  • Represented Aible at Ai4 2023, Google Next 2024, and AWS Summit 2024, translating technical systems into demos and customer conversations.
  • Write long-form pieces on document summarization, retrieval systems, RAG evaluation, customer segmentation, applied AI, and gaming industry analysis.

GitHub Activity & Analytics

AI Engineering Coach Metrics

Sanitized AI Engineering Coach metrics, including practice score, anti-pattern rate, context health, prompt quality, review verification, tool mastery, and agentic SDLC coverage

These metrics summarize how I use AI coding tools in practice, not just how much AI-generated code I produce. The card is generated from sanitized local AI Engineer Coach aggregates and is meant to show AI engineering discipline across context quality, prompt clarity, review habits, tool usage, and agentic SDLC coverage.

Metric What it means
AI Practice Score Overall signal of AI-assisted engineering maturity across the tracked categories.
Anti-pattern Rate Number of detected AI workflow anti-patterns per 100 requests. Lower is better.
Resolution Rate Share of detected anti-patterns that were improved or resolved over the measured period.
Context Health How well my projects provide the context an AI agent needs: instructions, workspace structure, and agent-readiness.
Prompt Quality How clearly I frame tasks, constraints, expected outputs, and review criteria for AI tools.
Review / Verification How consistently AI-generated work is checked through review, testing, validation, or manual inspection.
Tool Mastery How effectively I use AI tools, workflows, and coding assistants beyond simple prompt-and-paste usage.
Agentic SDLC Coverage How broadly I use AI across planning, implementation, testing, review, documentation, and iteration.

Public card only. Raw prompts, private code, workspace names, file paths, model names, screenshots, and detailed anti-pattern records are not published.

Generated GitHub metrics for vinzlercodes, including contribution activity, repository counts, community stats, and most-used languages

Signal What to look for
Languages A practical mix of data, backend, notebooks, and web-facing work rather than a single narrow stack.
Repositories Recent public systems show a shift from notebooks and ML pipelines toward agentic workflow products, eval harnesses, and developer tooling.
Writing Medium activity makes the technical reasoning visible, especially around retrieval, summarization, RAG evaluation, and applied analytics.
Activity feed Recent public GitHub events are generated below so profile movement is visible between larger project updates.

Recent GitHub Activity

  1. 🎉 Merged PR #83 in vinzlercodes/Open_Prior_Auth_Workbench
  2. 💪 Opened PR #83 in vinzlercodes/Open_Prior_Auth_Workbench
  3. 🎉 Merged PR #82 in vinzlercodes/Open_Prior_Auth_Workbench
  4. 💪 Opened PR #82 in vinzlercodes/Open_Prior_Auth_Workbench
  5. 🎉 Merged PR #81 in vinzlercodes/Open_Prior_Auth_Workbench
  6. 💪 Opened PR #81 in vinzlercodes/Open_Prior_Auth_Workbench
  7. 🎉 Merged PR #80 in vinzlercodes/Open_Prior_Auth_Workbench
  8. 🎉 Merged PR #79 in vinzlercodes/Open_Prior_Auth_Workbench
  9. 🎉 Merged PR #78 in vinzlercodes/Open_Prior_Auth_Workbench
  10. 💪 Opened PR #80 in vinzlercodes/Open_Prior_Auth_Workbench

Fun fact: I will absolutely over-analyze both fragrance notes and video-game industry trends.

Pinned Loading

  1. DecisionRisk DecisionRisk Public

    DecisionRisk: simulate, debate, and quantify the downside of consequential decisions before acting.

    Python

  2. Open_Prior_Auth_Workbench Open_Prior_Auth_Workbench Public

    A FHIR-first prior authorization workbench that discovers coverage requirements, retrieves and prefills documentation questionnaires, assembles submission-ready packets, and tracks lifecycle state …

    TypeScript

  3. codex-telegram-remote codex-telegram-remote Public

    Control Codex from Telegram through Hermes

    Python

  4. ai-coach-profile-publisher ai-coach-profile-publisher Public

    Publish privacy-safe AI Engineer Coach metrics to your GitHub profile README with a local CLI, sanitized JSON dispatch, GitHub Actions workflow, and SVG metrics card.

    JavaScript

  5. artifact-format-eval artifact-format-eval Public

    API-key-free benchmark for comparing Markdown, HTML, JSON-rendered, and notebook artifacts across cost, accessibility, security, reviewability, and reader-task coverage.

    HTML