Bowtie

Empower Team-Wide Vibe Coding
with LLM Gateway and Security-First MCPs

Gabriel Koo
Senior Lead Engineer
Rakshit Jain
AI Engineer II

Bowtie  ·  AgentCon Hong Kong 2026

What to expect from this talk?

35% Acronyms (MCP, OAuth, DLP, VPC, JWT...)
25% Supply chain horror stories
20% Live demo (fingers crossed)
15% Actual takeaways
5% 廣東話 (Cantonese slang)

4 Layers of Safe Vibe Coding

We'll walk through each one

1 LLM Gateway
2 MCP Servers
3 Custom Skills
+ What's Next

😱 The BYOAI Problem

Everyone brought their own AI tools (Bring Your Own AI)

💸 貼錢返工
(Tip3 chin2 faan1 gung1)

Company didn't offer official AI tools — so people buy their own and make their own way out

🕵️ Shadow AI

LLM calls made to platforms not owned by the company

Personal API keys + login own personal plan on own device (GitHub Copilot, Cursor)

🔓 Shared API Keys

One key for team, hard to expire without blocking everyone

Someone leaves → key still works. Corp API key for personal use → cost chaos.

🎯 Providing Tools ≠ Adoption

2022: Copilot launched. People joked it was "autocomplete on steroids."

2026: We have agents, MCPs, skills. Same adoption problem.

10%
60%
30%
Champions Majority Laggards

Our lesson at Bowtie: make the safe path the easy path.

Started with 3 engineers full-time vibe coding. 6 months later, 50+ — including non-engineers. You need champions who pull the team forward, not mandates.

Our Approach

4 layers to scale AI safely across 50+ engineers

Layer 1 LLM Gateway Single choke point; DLP, cost visibility
Layer 2 MCP Servers OAuth auth; scoped, auditable tools
Layer 3 Custom Skills Encode team standards into prompts
Next What's Next Autonomous agents — same governance, no human at the keyboard WIP

About Bowtie www.bowtie.com.hk

  • HK's first virtual insurer (licensed Dec 2018)
  • Direct-to-consumer, zero agents, zero commissions
  • Series C: US$70M (2025) · ARR > US$80M
  • In-house core system on AWS Cloud
  • Using Copilot since 2022, GenAI since early 2023
  • 200+ employees, ~50 engineers
Digitalization with human touch; AI handles routine; humans handle nuanced
We won't let AI run prod unsupervised until it's reliable enough

ABOUT BOWTIE'S TOOLS

A software engineer's daily reality

Slack Notion Gmail Google Drive Zendesk JIRA AWS Mixpanel GitHub Metabase …and more

🗂️ Tab chaos

Sometimes a single feature investigation means opening 6+ tabs across different SaaS tools just to piece together the full picture.

🎮 ⬆️⬆️⬇️⬇️⬅️➡️⬅️➡️🅱️🅰️🅱️🅰️

Knowing which SaaS to check, where to find the right page, and how to combine them is like memorizing a Konami code — tribal knowledge that only veterans have.

This is why we built what we built.

Layer 1

LLM Gateway

Centralize · Visible · Democratized

🔑 Why Developers Get to Vibe Code

Governance at the infrastructure layer means developers don't carry the compliance burden.

Without infra governance

Every developer must remember: right provider, right key, don't paste secrets, check DLP...

Compliance through training + trust = fragile

With infra governance

Developers just code. Gateway handles routing, DLP, cost tracking, audit automatically.

Compliance by design = developers vibe freely

🧬 AGENTS.md → SOUL.md

Remote deployment of AGENTS.md enforces a shared "SOUL.md" across all engineers' coding agents — consistent behavior, guardrails, and personality pushed centrally via MDM.

Our LLM Gateway

Single LiteLLM instance on AWS — serves engineers and internal services

🧑‍💻 Engineers
Claude Code / Cursor / Copilot
🏥 Core System
Insurance platform
🤖 Machine Agents
IT bot / CS bot

🔒 VPC

ALB
Load Balancer
ECS
LiteLLM
🐘 PostgreSQL
📦 S3
☁️ Azure OpenAI
☁️ Vertex AI
☁️ AWS Bedrock
☁️ X.ai

Every Request Goes Through the Gateway

🤖
AI Coding Agent
🔒
Firewall
Block + Scan
🔀
LLM Gateway
DLP · Cost · Auth
☁️
AI Provider
Azure OpenAI · Vertex AI · Amazon Bedrock

DLP at Firewall

Blocks sensitive data before it leaves your network

DLP at Gateway

Scans prompts + responses; logs retained for auditing

Cost Attribution

Per-user, per-team spend visibility in real time

🚨 3 Weeks Ago...

LiteLLM — the most popular LLM gateway proxy (95M downloads/month)

1. Attackers compromised a CI dependency → stole LiteLLM's PyPI token

2. Published two backdoored versions47,000 installs in 3 hours

3. Every machine had secrets exfiltrated: SSH keys, cloud creds, the LLM API keys it was proxying

How was it caught? The attacker vibe-coded the malware — it crashed a researcher's laptop instead of running silently.

What if they hadn't been sloppy? Silent exfiltration, potentially undetected for weeks.

🛡️ Why This Didn't Hurt Us

47K machines auto-upgraded Pinned versions; we never pulled v1.82.7
Creds exfiltrated to fresh domains Cloudflare WARP + Gateway; blocks new domains at DNS
Backdoor live within hours 7-day cooldown; most attacks found in 24-72h
Poisoned package on public PyPI Docker version; CI/CD pulls from internal docker mirror

Four practices that existed before this attack. None of them exotic.

Centralized Guardrails

🔒 Firewall (Network Layer)

  • Block all AI providers except official list
  • DNS-level blocking (FREE <50 users)
  • Cloudflare Zero Trust category filter
  • Enforces BYOK policy at network level
  • llm.bowtie.internal/chat/completions

  • random-llm-proxy.com/v1/chat/completions

🔀 LLM Gateway (Application Layer)

  • Cost attribution by team/project/user
  • DLP: block sensitive data at request + response
  • Logging: full audit trail
  • Auth: per-user identity tracking

📊 Cost & Adoption Visibility

Everything flows through one gateway — so we see everything

💰 Cost per model / provider

Real-time spend breakdown by model, provider, team, and individual engineer

📈 Adoption & preference

Which engineers use which models, how often, and for what — no surveys needed

🧑‍💼 Engineering Manager view

See vibe coding adoption across the team — who's using it, who isn't, and where to invest training

🗺️ Planning & budgeting

Data-driven decisions on which providers to keep, scale, or drop

Why Not Just Pick One Provider?

Source: Artificial Analysis, Apr 2026

GPTClaudeGeminiDeepSeekGPT → …
🔄 Engineers keep switching to the "best" model

No single provider wins on both intelligence and speed.

🧑‍💻 The editor problem

You can't force everyone onto one editor — nano vs vim vs VS Code vs Notepad++ vs JetBrains.

Same with LLMs — engineers have strong preferences on their "best coding model."

🔀 Gateway = freedom + control

Let engineers pick their model. The gateway enforces DLP, cost, and auth regardless.

我全部都要 — "We want everything!"

Layer 2

MCP Servers

Scoped · Per-User · Auditable

Before: The MCP Mess

Everyone connecting MCPs with shared secrets

🤯 One shared token → Who made that API call?

Case Study: Notion MCP

What the shared-secret era actually looked like

❌ Before: stdio + shared secret

{
  "mcpServers": {
    "notionApi": {
      "command": "npx",
      "args": ["-y", "@notionhq/notion-mcp-server"],
      "env": {
        "NOTION_TOKEN": "ntn_****"
      }
    }
  }
}

No audit trail · Token rotation breaks everyone · Who accessed what?

✅ After: HTTP + OAuth 2.0

{
  "mcpServers": {
    "notion": {
      "url": "https://mcp.notion.com/mcp"
    }
  }
}

Per-user tokens · Full audit trail · Revoke one user ≠ break everyone

This is the pattern we follow for all Bowtie MCP servers.

Three Types of MCP

stdio ⚠️

Local processes (stdin/stdout)

Usually hardcodes a shared admin API key; no per-user auth, no credential rotation

No OAuth support
Good for local dev only

SSE ⛔ Deprecated

Server-Sent Events

Deprecated Mar 2025. Stateful — bad for stateless hosting like AWS Lambda. Two-endpoint model, scaling challenges.

Migrate now if using SSE

HTTP ✅ Current Standard

Streamable HTTP

Stateless; single endpoint, standard POST requests. Think of it like gRPC / Protobuf — structured RPC over HTTP.

Best for production
Supports OAuth 2.0 natively

💡 Why HTTP over stdio?

Your dev has 10 Node.js versions? Python 3.8-3.12? Different tool versions? HTTP MCP doesn't care — it's just a URL:

{
  "mcpServers": {
    "notion": { "url": "https://mcp.notion.com/mcp" }
  }
}

MCP + OAuth Architecture

flowchart LR
    subgraph Internet["🌐"]
      Dev["💻 Developer"]
      CLI["🤖 AI Agent\n(Claude Code)"]
    end
    
    subgraph VPC["🔒 VPC Boundary"]
      MCP["📡 MCP Server\n(HTTP + OAuth)"]
      Tools["🛠️ Tools\n(list_tickets, get_user_info, ...)"]
    end
    
    subgraph Auth["🔐 OAuth 2.0"]
      Cognito["Amazon Cognito\n(User Pool)"]
    end
    
    CLI -->|MCP + Bearer JWT| MCP
    MCP --> Tools
    
    Dev -->|1. Login| Cognito
    Cognito -->|2. Access Token| Dev
    Dev -->|3. MCP Call + Token| MCP
    MCP -->|4. Validate| Cognito
  

Each user has their own token. AI acts as the user's delegate. Token revocable anytime.

The Golden Rule

"If you can't do it in the UI,
the agent can't do it via MCP"

What this means:

  • MCP = wrapper around existing APIs
  • Reuse existing permissions
  • No new privileged endpoints
  • AI is your delegate

Security benefits:

  • No shared connection secrets
  • Per-user token audit
  • OAuth handles auth flows
  • Token revocation works

Our MCP Toolkit

9 servers — all Streamable HTTP, all behind the gateway

🔌
Bowtie API — OAuth + Amazon Cognito, query live prod data as yourself; same token expiry as the web app
📊
Metabase — SQL queries with browser-based auth, PII-safe; same token expiry as the web app
🎫
Zendesk — Ticket lookup and triage
💬
Slack — Search channels, post updates, read threads
📝
Notion — Read & write pages/databases; understands schema of our tables
🏥
VHIS — Voluntary Health Insurance Scheme (HK); plan lookups, product codes, provider info
🔗
Internal Deep URI — Deep-link builder for internal tools
📚
KB Search — Search & fetch from our knowledge base with company info

Same endpoint pattern: https://mcp.bowtie.internal/{server}/mcp (example)

Who Uses What

Same tools, different workflows — the surprise was who adopted fastest

Engineering

Bowtie API + Metabase

"Why did policy #12345's renewal fail?" — answer in seconds, not 20 min across 3 tabs

Notion + Slack + Internal Portal

Incident response: pull specs, check threads, deep-link to internal tools — all from the AI

Beyond Engineering

Zendesk + FAQ + VHIS

CS triages tickets with product knowledge built in — same auth, same governance

Notion + Metabase

Product & Underwriting query data and update specs — no code required

3 engineers → 50+ people in 6 months — because the safe path was the easy path

Wrapping Bowtie API with OAuth MCP

Existing Cognito-backed API → MCP HTTP server with full OAuth 2.0

🤖
AI Agent
Claude Code / Cursor
🔐
OAuth MCP Wrapper
type: HTTP + OAuth 2.0
Authorization Code + PKCE
🔑
Amazon Cognito
User Pool + JWT
🏥
Bowtie API
Existing endpoints

How it works

  1. Agent discovers OAuth metadata via .well-known
  2. User logs in via Cognito (browser popup)
  3. Wrapper receives user's JWT, forwards to Bowtie API
  4. API sees the real user — not a shared service account

Why this is powerful

  • Zero API changes — wraps existing endpoints
  • Per-user identity — agent acts as the user
  • Token lifecycle — same expiry as web app
  • MCP spec compliant — works with any MCP client

🔴 Live Demo (UX Flow Mockup)

MCP OAuth with cognito-local + Claude Code — UX flow demonstration

gabrielkoo.github.io/agentcon-2026-hk-demo

1. Unprotected API
2. 401 without token
3. Tools available with JWT
4. User identity confirmed

Layer 3

Custom Skills

"Build Skills, Not Agents"

What Is a Skill?

A portable package of knowledge, procedures, and code your agent loads on demand

Instructions + executable code

A SKILL.md with procedures and guardrails, plus optional scripts/ (Python, Bash, JS), references/, and assets/.

3-stage progressive disclosure

1. Description (~100 tokens, always loaded) → 2. Full instructions (on activation) → 3. Scripts & references (only when needed)

Portable across 9+ agents

Same skill works in Claude Code, Copilot, Cursor, Codex, Kiro, Goose, Amp, and more. Open standard — write once, run everywhere.

api-investigation/ ├── SKILL.md--- │ name: api-investigation │ description: Investigate API │ issues using logs & metrics │ --- │ ## Steps │ 1. Check error rates │ 2. Pull CloudWatch logs │ 3. Cross-ref deploy history │ ## Guardrails │ - Read-only / Escalate if P0 ├── scripts/ │ └── fetch_metrics.py ├── references/ │ └── runbook.md └── assets/ └── alert_template.json

The Computing Analogy

A few companies build processors. Millions build applications.

🧠
Models = Processors (CPUs)

OpenAI, Anthropic, Google, Meta — a handful of companies

⚙️
Agent Runtimes = Operating Systems

Claude Code, Cursor, Copilot, OpenCode — dozens of runtimes

📱
Skills = Applications

Anyone can build these — engineers, CS, legal, finance

You don't need to understand CPUs to build an app.
You don't need to train models to build skills.

Why Skills Beat Raw Intelligence

"Who do you want doing your taxes? A 300 IQ genius or an experienced tax professional?"

In 2026, the capability gap between models shrank.
The differentiation is in the skills you build on top.

This isn't just for engineers

Fortune 100 companies are building enterprise skill libraries

Non-technical people (finance, legal, CS) writing skills — often the most useful ones

A Skill in Practice

API investigation — our most-used skill

Before

  1. Engineer gets paged
  2. Opens 3 tabs — admin panel, DB, logs
  3. Figures out the right tenant filter
  4. Manually queries the API
  5. ~20 min to get the answer

After

  1. Engineer asks the AI
  2. Skill handles the "Where/how to look"
  3. MCP servers query the DB, logs and API server
  4. Answer in seconds

Same permissions, same API — just faster

Same pattern for analytics (PII auto-masked), code review (per-repo CLAUDE.md), ticket triage (Linear workflows)

I think we should rewrite the whole API server in Rust
You're absolutely right! That's a brilliant idea. Rust would be perfect here. Let me start rewriting the entire module right away...
Actually wait, maybe Svelte instead?
You're absolutely right! Svelte is the much better choice here...

Sycophancy

The tendency of AI models to prioritize agreement over accuracy, telling users what they want to hear rather than what they need to hear.

A sycophantic agent isn't just annoying — it's a security risk.
It'll do whatever a prompt injection tells it to.

Give Your AI a Spine: CLAUDE.md

Your AI should push back like a senior engineer, not agree like an intern on day one.

What goes in CLAUDE.md

  • Architecture patterns & conventions
  • Test requirements & coverage rules
  • PR process & review checklist
  • Deployment guardrails
  • Security policies (PII handling, etc.)
  • Personality — be opinionated, push back

The payoff

New hire on day one gets the same AI-guided standards that took senior engineers months to learn.

Standardized CLAUDE.md → consistent behavior across the entire org

SOPs that previously lived in someone's head — now version-controlled and enforced by AI.

Vibe Coding Went Enterprise

This isn't early adopters anymore

87%

of Fortune 500 use AI coding tools in production

RunAICode, Feb 2026

76%

of orgs have ungoverned AI code in production

Digital Applied, Mar 2026

35–40%

reduction in time-to-first-commit with AI workflows

Kalvium Labs, 200+ engineers

The productivity gains are real. The governance gap is also real.
That's why you need the 3 layers.

The Vibe Coding Tension

Andrej Karpathy

"I 'Accept All' always, I don't read the diffs anymore."

Feb 2025 — coined "vibe coding". Fine for throwaway weekend projects.

Simon Willison

"I won't commit any code I can't explain to someone else."

Mar 2025 — "If an LLM wrote it and you reviewed it, that's not vibe coding, it's software development."

So… which one is right?

Our approach: govern the infra, let developers vibe.

Even if a developer "Accept All"s — the gateway blocks sensitive data, MCP tools are scoped, skills encode standards.

Vibe Coding ≠ Just Writing Code

What "coding" actually looks like in practice

60%
20%
20%
Research · Logs · Talking to users Prototyping Implementation

Every layer accelerates a different part

Gateway gives you safe model access. MCP tools let AI query your real systems. Skills encode your team's knowledge. Together, they accelerate the 80% that isn't typing code.

The biggest productivity gain isn't in writing code faster — it's in everything before the code.

From 3 Pioneers to 50+ People

3 engineers Built the first skills & MCP servers. Proved the pattern worked.
Engineering Standardized CLAUDE.md across repos. Consistent AI behavior org-wide.
Non-engineers CS built ticket triage skills. Product built spec review. Underwriting built risk assessment.
Today 10 MCP servers · 50+ people · new hires get Staff+ quality guidance on day one

And we're not alone — Stripe, Sentry, Linear all ship official MCP servers as building blocks

The Whole Stack

Layer 3: Skills — encode how to think
Layer 2: MCP + OAuth — scoped, auditable tools
Layer 1: LLM Gateway — DLP, cost, visibility

Each layer makes the next one possible.
Together, they make vibe coding safe enough for production.

Agents Are Already Shipping Code

This isn't hypothetical — it's happening at the biggest companies in tech

Stripe 1,300 PRs/week merged with zero human-written code. Engineers review everything, no human types.
OpenAI Harness 7 engineers, ~1M lines of code, 0 manually written, 3.5 PRs/engineer/day for 5 months
GitHub 60M Copilot code reviews — 1 in 5 PRs on GitHub now reviewed by AI
MSR 2026 Study 932K agent PRs analyzed. 55% merged without any revision. Agent PRs = 10% of all GitHub commits

The pattern: agents write, humans review. Sound familiar?

The 24/7 Agent

Same 3 layers — just running without a human at the keyboard

Trigger
Slack msg / cron /
GitHub alert
🤖
Agent Runtime
24/7 on your infra
🔒
Your 3 Layers
Gateway · MCP · Skills
📝
Output
PR / Slack reply /
Jira update

67%

of SIEM alerts go uninvestigated every day

AI triage: MTTR 4hrs → 20min

D3 Security, 2026

200+

eng hours/month saved by autonomous ops agents

Incidents: hours → minutes

NeuBird AI, Apr 2026

99%

of security triage automated at Google

1M tickets/year, auto-triaged

Caleb Sima, Mar 2026

If you built layers 1–3 well, the autonomous agent is just a different caller — not a new problem to solve.

Key Takeaways

Many "AI issues" were not caused by AI — only amplified. The fix was team-wide governance, not restriction.

Layer 1: LLM Gateway — One choke point so every developer gets safe, visible AI access
Layer 2: MCP — Per-user OAuth means no shared secrets; AI acts as your delegate
Layer 3: Skills — Team SOPs as code; new hires get Staff+ guidance on day one

Build these 3 layers well, and autonomous agents become a natural extension — not a new problem to solve.

Govern the infra.
Let developers vibe.

"The question isn't whether your developers will use AI. They already are.
The question is whether you've made the safe path the easy path."

Gabriel Koo & Rakshit Jain
Bowtie

github.com/gabrielkoo/agentcon-2026-hk-demo
github.com/the-quantum-nargle/agentcon-2026-hk-slides

Q&A — Talk with us off the stage!

Join more AWS User Group Hong Kong events 👇