ML and AI Systems Engineer with 3+ years building and deploying ML systems at scale. First hire on a customer-facing ML support team managing multi-node NVIDIA H100 GPU clusters at 99.9% uptime.
About
Hello! I'm Raphael. I'm a person who gets excited about linguistics, loves diving into video game statistics, and has a bunch of other random interests floating around. Professionally, I'm working in the cool world of ML/AI.
Experience
Stealth Startup
GPU Cloud Infrastructure Provider
ML Solutions Engineer (First Hire) | Software Engineer
Feb 2024 - Present
- First engineer on a 5-person customer-facing ML support team supporting several early-stage AI startups on GPU infrastructure
- Managed multi-node NVIDIA H100 GPU Kubernetes clusters, maintaining 99.9% uptime across enterprise AI infrastructure
- Helped design and implement automated node repair processes, reducing mean time to repair from 36-72 hours to under 1 hour
- Built internal knowledge base and ticketing codebase RAG system to accelerate issue resolution across the support team
- Leading infrastructure readiness for next-generation NVIDIA GB100 (Blackwell) GPU deployments
- Developed bidirectional Jira-Kubernetes operators in Go, automating incident response for ~20 weekly node failures
- Optimized SLURM and Kubernetes ML workflows, improving training job throughput for customers running distributed workloads
Weights & Biases
Machine Learning Support Engineer
Jan 2023 - Jan 2024
- Debugged and resolved 600+ technical issues for ML practitioners at OpenAI, NVIDIA, and Microsoft, covering model integrations, LLM deployments, and on-premise instances
- Triaged and traced 50+ bugs across the Weights & Biases SDK, web application, backend services, and managed instances, contributing bugfix PRs and new integrations
- Managed ~20 customer requests daily while running debugging sessions, cross-team syncs, and building internal tooling (W&B integrations, frontend features)
Projects
Active
Clinical intelligence platform — query patient records in natural language via a 3-node multi-agent RAG pipeline with 20+ medical tools.
LangGraph, FastAPI, Next.js, pgvector, AWS Bedrock, ECS, RDS, Ollama
Active
MCP server exposing healthcare AI tools for RAG-powered clinical queries, document reranking, and FHIR data ingestion. Published on PyPI and Smithery.
Python, MCP
Kubernetes operator with 3 custom CRDs for automated metrics collection, threshold alerting, and a monitoring dashboard.
Go, Kubebuilder, Prometheus, Grafana, Next.js
Game performance analyzer with XGBoost ML scoring, 8-metric GPI breakdown, AI coaching, and live game overlays.
FastAPI, XGBoost, React, Chart.js, Ollama
AI agent office simulation — procedurally generated personalities (Big Five), emergent relationships, LLM-driven conversations, and a drama director.
Godot 4, GDScript, Ollama
MLP, CNN, and Transformer self-attention implemented from scratch with hand-derived forward and backward passes — no frameworks, no autograd.
Python, NumPy
Skills
Education
Masters in Artificial Intelligence
Penn State · May 2025
Bachelor of Science in Statistics
UCLA · June 2022