Empower Team-Wide Vibe Coding with LLM Gateway and Security-First MCPs
Gabriel Koo
Senior Lead Engineer
Rakshit Jain
AI Engineer II
Bowtie · AgentCon Hong Kong 2026
What to expect from this talk?
35% Acronyms (MCP, OAuth, DLP, VPC, JWT...)
25% Supply chain horror stories
20% Live demo (fingers crossed)
15% Actual takeaways
5% 廣東話 (Cantonese slang)
4 Layers of Safe Vibe Coding
We'll walk through each one
1LLM Gateway
2MCP Servers
3Custom Skills
+What's Next
😱 The BYOAI Problem
Everyone brought their own AI tools (Bring Your Own AI)
💸 貼錢返工 (Tip3 chin2 faan1 gung1)
Company didn't offer official AI tools — so people buy their own and make their own way out
🕵️ Shadow AI
LLM calls made to platforms not owned by the company
Personal API keys + login own personal plan on own device (GitHub Copilot, Cursor)
🔓 Shared API Keys
One key for team, hard to expire without blocking everyone
Someone leaves → key still works. Corp API key for personal use → cost chaos.
🎯 Providing Tools ≠ Adoption
2022: Copilot launched. People joked it was
"autocomplete on steroids."
2026: We have agents, MCPs, skills.
Same adoption problem.
10%
60%
30%
ChampionsMajorityLaggards
Our lesson at Bowtie: make the safe path the easy path.
Started with 3 engineers full-time vibe coding. 6 months later, 50+ — including non-engineers. You need champions who pull the team forward, not mandates.
Layer 3Custom SkillsEncode team standards into prompts
NextWhat's NextAutonomous agents — same governance, no human at the keyboardWIP
About Bowtie www.bowtie.com.hk
HK's first virtual insurer (licensed Dec 2018)
Direct-to-consumer, zero agents, zero commissions
Series C: US$70M (2025) · ARR > US$80M
In-house core system on AWS Cloud
Using Copilot since 2022, GenAI since early 2023
200+ employees, ~50 engineers
Digitalization with human touch; AI handles routine; humans handle nuanced We won't let AI run prod unsupervised until it's reliable enough
ABOUT BOWTIE'S TOOLS
A software engineer's daily reality
SlackNotionGmailGoogle DriveZendeskJIRAAWSMixpanelGitHubMetabase…and more
🗂️ Tab chaos
Sometimes a single feature investigation means opening 6+ tabs across different SaaS tools just to piece together the full picture.
🎮 ⬆️⬆️⬇️⬇️⬅️➡️⬅️➡️🅱️🅰️🅱️🅰️
Knowing which SaaS to check, where to find the right page, and how to combine them is like memorizing a Konami code — tribal knowledge that only veterans have.
This is why we built what we built.
Layer 1
LLM Gateway
Centralize · Visible · Democratized
🔑 Why Developers Get to Vibe Code
Governance at the infrastructure layer means developers don't carry the compliance burden.
Without infra governance
Every developer must remember: right provider, right key, don't paste secrets, check DLP...
Remote deployment of AGENTS.md enforces a shared "SOUL.md" across all engineers' coding agents — consistent behavior, guardrails, and personality pushed centrally via MDM.
Our LLM Gateway
Single LiteLLM instance on AWS — serves engineers and internal services
🧑💻 Engineers Claude Code / Cursor / Copilot
🏥 Core System Insurance platform
🤖 Machine Agents IT bot / CS bot
→
🔒 VPC
ALB Load Balancer
↓
ECS LiteLLM
🐘 PostgreSQL
📦 S3
→
☁️ Azure OpenAI
☁️ Vertex AI
☁️ AWS Bedrock
☁️ X.ai
Every Request Goes Through the Gateway
🤖
AI Coding Agent
→
🔒
Firewall Block + Scan
→
🔀
LLM Gateway DLP · Cost · Auth
→
☁️
AI Provider Azure OpenAI · Vertex AI · Amazon Bedrock
DLP at Firewall
Blocks sensitive data before it leaves your network
DLP at Gateway
Scans prompts + responses; logs retained for auditing
Cost Attribution
Per-user, per-team spend visibility in real time
🚨 3 Weeks Ago...
LiteLLM — the most popular LLM gateway proxy
(95M downloads/month)
1.
Attackers compromised a CI dependency → stole LiteLLM's PyPI token
2.
Published two backdoored versions — 47,000 installs in 3 hours
3.
Every machine had secrets exfiltrated: SSH keys, cloud creds, the LLM API keys it was proxying
How was it caught? The attacker vibe-coded the malware — it crashed a researcher's laptop instead of running silently.
What if they hadn't been sloppy? Silent exfiltration, potentially undetected for weeks.
🛡️ Why This Didn't Hurt Us
47K machines auto-upgraded→Pinned versions; we never pulled v1.82.7
Creds exfiltrated to fresh domains→Cloudflare WARP + Gateway; blocks new domains at DNS
Backdoor live within hours→7-day cooldown; most attacks found in 24-72h
Poisoned package on public PyPI→Docker version; CI/CD pulls from internal docker mirror
Four practices that existed before this attack. None of them exotic.
Centralized Guardrails
🔒 Firewall (Network Layer)
Block all AI providers except official list
DNS-level blocking (FREE <50 users)
Cloudflare Zero Trust category filter
Enforces BYOK policy at network level
✅ llm.bowtie.internal/chat/completions
❌ random-llm-proxy.com/v1/chat/completions
🔀 LLM Gateway (Application Layer)
Cost attribution by team/project/user
DLP: block sensitive data at request + response
Logging: full audit trail
Auth: per-user identity tracking
📊 Cost & Adoption Visibility
Everything flows through one gateway — so we see everything
💰 Cost per model / provider
Real-time spend breakdown by model, provider, team, and individual engineer
📈 Adoption & preference
Which engineers use which models, how often, and for what — no surveys needed
🧑💼 Engineering Manager view
See vibe coding adoption across the team — who's using it, who isn't, and where to invest training
🗺️ Planning & budgeting
Data-driven decisions on which providers to keep, scale, or drop
Why Not Just Pick One Provider?
Source: Artificial Analysis, Apr 2026
GPT → Claude → Gemini → DeepSeek → GPT → …
🔄 Engineers keep switching to the "best" model
No single provider wins on both intelligence and speed.
🧑💻 The editor problem
You can't force everyone onto one editor — nano vs vim vs VS Code vs Notepad++ vs JetBrains.
Same with LLMs — engineers have strong preferences on their "best coding model."
🔀 Gateway = freedom + control
Let engineers pick their model. The gateway enforces DLP, cost, and auth regardless.
A few companies build processors. Millions build applications.
🧠
Models= Processors (CPUs)
OpenAI, Anthropic, Google, Meta — a handful of companies
⚙️
Agent Runtimes= Operating Systems
Claude Code, Cursor, Copilot, OpenCode — dozens of runtimes
📱
Skills= Applications
Anyone can build these — engineers, CS, legal, finance
You don't need to understand CPUs to build an app. You don't need to train models to build skills.
Why Skills Beat Raw Intelligence
"Who do you want doing your taxes? A 300 IQ genius or an experienced tax professional?"
In 2026, the capability gap between models shrank. The differentiation is in the skills you build on top.
This isn't just for engineers
Fortune 100 companies are building enterprise skill libraries
Non-technical people (finance, legal, CS) writing skills — often the most useful ones
A Skill in Practice
API investigation — our most-used skill
Before
Engineer gets paged
Opens 3 tabs — admin panel, DB, logs
Figures out the right tenant filter
Manually queries the API
~20 min to get the answer
After
Engineer asks the AI
Skill handles the "Where/how to look"
MCP servers query the DB, logs and API server
Answer in seconds
Same permissions, same API — just faster
Same pattern for analytics (PII auto-masked), code review (per-repo CLAUDE.md), ticket triage (Linear workflows)
I think we should rewrite the whole API server in Rust
You're absolutely right! That's a brilliant idea. Rust would be perfect here. Let me start rewriting the entire module right away...
Actually wait, maybe Svelte instead?
You're absolutely right! Svelte is the much better choice here...
Sycophancy
The tendency of AI models to prioritize agreement over accuracy, telling users what they want to hear rather than what they need to hear.
A sycophantic agent isn't just annoying — it's a security risk. It'll do whatever a prompt injection tells it to.
Give Your AI a Spine: CLAUDE.md
Your AI should push back like a senior engineer, not agree like an intern on day one.
What goes in CLAUDE.md
Architecture patterns & conventions
Test requirements & coverage rules
PR process & review checklist
Deployment guardrails
Security policies (PII handling, etc.)
Personality — be opinionated, push back
The payoff
New hire on day one gets the same AI-guided standards that took senior engineers months to learn.
Standardized CLAUDE.md → consistent behavior across the entire org
SOPs that previously lived in someone's head — now version-controlled and enforced by AI.
Vibe Coding Went Enterprise
This isn't early adopters anymore
87%
of Fortune 500 use AI coding tools in production
RunAICode, Feb 2026
76%
of orgs have ungoverned AI code in production
Digital Applied, Mar 2026
35–40%
reduction in time-to-first-commit with AI workflows
Kalvium Labs, 200+ engineers
The productivity gains are real. The governance gap is also real. That's why you need the 3 layers.
The Vibe Coding Tension
Andrej Karpathy
"I 'Accept All' always, I don't read the diffs anymore."
Feb 2025 — coined "vibe coding". Fine for throwaway weekend projects.
Simon Willison
"I won't commit any code I can't explain to someone else."
Mar 2025 — "If an LLM wrote it and you reviewed it, that's not vibe coding, it's software development."
So… which one is right?
Our approach: govern the infra, let developers vibe.
Even if a developer "Accept All"s — the gateway blocks sensitive data, MCP tools are scoped, skills encode standards.
Vibe Coding ≠ Just Writing Code
What "coding" actually looks like in practice
60%
20%
20%
Research · Logs · Talking to usersPrototypingImplementation
Every layer accelerates a different part
Gateway gives you safe model access. MCP tools let AI query your real systems. Skills encode your team's knowledge. Together, they accelerate the 80% that isn't typing code.
The biggest productivity gain isn't in writing code faster — it's in everything before the code.
From 3 Pioneers to 50+ People
3 engineersBuilt the first skills & MCP servers. Proved the pattern worked.
EngineeringStandardized CLAUDE.md across repos. Consistent AI behavior org-wide.
Non-engineersCS built ticket triage skills. Product built spec review. Underwriting built risk assessment.
Today10 MCP servers · 50+ people · new hires get Staff+ quality guidance on day one
And we're not alone — Stripe, Sentry, Linear all ship official MCP servers as building blocks
The Whole Stack
Layer 3: Skills — encode how to think
Layer 2: MCP + OAuth — scoped, auditable tools
Layer 1: LLM Gateway — DLP, cost, visibility
Each layer makes the next one possible. Together, they make vibe coding safe enough for production.
Agents Are Already Shipping Code
This isn't hypothetical — it's happening at the biggest companies in tech
Stripe1,300 PRs/week merged with zero human-written code. Engineers review everything, no human types.
OpenAI Harness7 engineers, ~1M lines of code, 0 manually written, 3.5 PRs/engineer/day for 5 months
GitHub60M Copilot code reviews — 1 in 5 PRs on GitHub now reviewed by AI
MSR 2026 Study932K agent PRs analyzed. 55% merged without any revision. Agent PRs = 10% of all GitHub commits
The pattern: agents write, humans review. Sound familiar?
The 24/7 Agent
Same 3 layers — just running without a human at the keyboard
⚡
Trigger Slack msg / cron / GitHub alert
→
🤖
Agent Runtime 24/7 on your infra
→
🔒
Your 3 Layers Gateway · MCP · Skills
→
📝
Output PR / Slack reply / Jira update
67%
of SIEM alerts go uninvestigated every day
AI triage: MTTR 4hrs → 20min
D3 Security, 2026
200+
eng hours/month saved by autonomous ops agents
Incidents: hours → minutes
NeuBird AI, Apr 2026
99%
of security triage automated at Google
1M tickets/year, auto-triaged
Caleb Sima, Mar 2026
If you built layers 1–3 well, the autonomous agent is just a different caller — not a new problem to solve.
Key Takeaways
Many "AI issues" were not caused by AI — only amplified. The fix was team-wide governance, not restriction.
Layer 1: LLM Gateway — One choke point so every developer gets safe, visible AI access
Layer 2: MCP — Per-user OAuth means no shared secrets; AI acts as your delegate
Layer 3: Skills — Team SOPs as code; new hires get Staff+ guidance on day one
Build these 3 layers well, and autonomous agents become a natural extension — not a new problem to solve.
Govern the infra. Let developers vibe.
"The question isn't whether your developers will use AI. They already are.
The question is whether you've made the safe path the easy path."