Empower Team Wide Vibe Coding

Empower Team-Wide Vibe Coding
with LLM Gateway and Security-First MCPs

Gabriel Koo

Senior Lead Engineer

Rakshit Jain

AI Engineer II

Bowtie · AgentCon Hong Kong 2026

What to expect from this talk?

35% Acronyms (MCP, OAuth, DLP, VPC, JWT...)

25% Supply chain horror stories

20% Live demo (fingers crossed)

15% Actual takeaways

5% 廣東話 (Cantonese slang)

4 Layers of Safe Vibe Coding

We'll walk through each one

      1
      LLM Gateway
    

      2
      MCP Servers
    

      3
      Custom Skills
    

      +
      What's Next
    

😱 The BYOAI Problem

Everyone brought their own AI tools (Bring Your Own AI)

💸 貼錢返工
(Tip3 chin2 faan1 gung1)

Company didn't offer official AI tools — so people buy their own and make their own way out

🕵️ Shadow AI

LLM calls made to platforms not owned by the company

Personal API keys + login own personal plan on own device (GitHub Copilot, Cursor)

🔓 Shared API Keys

One key for team, hard to expire without blocking everyone

Someone leaves → key still works. Corp API key for personal use → cost chaos.

🎯 Providing Tools ≠ Adoption

2022: Copilot launched. People joked it was "autocomplete on steroids."

2026: We have agents, MCPs, skills. Same adoption problem.

10%

60%

30%

Champions Majority Laggards

Our lesson at Bowtie: make the safe path the easy path.

Started with 3 engineers full-time vibe coding. 6 months later, 50+ — including non-engineers. You need champions who pull the team forward, not mandates.

The "adoption problem" is: giving people access to AI tools doesn't mean they'll use them well — or at all. This was true with Copilot in 2022 and it's still true in 2026 with agents, MCPs, skills. The tools got more powerful, but the human adoption gap stayed the same. The bar breaks down into three groups: - 10% Champions — the early adopters who figured it out themselves. At Bowtie, that was 3 engineers who started using AI seriously. - 60% Majority — willing but waiting. They need it to be easy, safe, and clearly endorsed. - 30% Laggards — skeptical or resistant, won't adopt unless it's the default path. The punchline: "Make the safe path the easy path." Don't mandate usage — make the governed gateway easier than personal keys, and champions will pull the majority forward organically. This slide is the bridge between the BYOAI problem and our solution (3-layer architecture). The message: security alone isn't enough. You also need to solve for adoption by making governance feel frictionless. At Bowtie we started with 3 engineers. Within 6 months, 50+ people were on the platform — including CS, Product, and Underwriting.

Our Approach

4 layers to scale AI safely across 50+ engineers

      Layer 1
      LLM Gateway Single choke point; DLP, cost visibility
    

      Layer 2
      MCP Servers OAuth auth; scoped, auditable tools
    

      Layer 3
      Custom Skills Encode team standards into prompts
    

      Next
      What's Next Autonomous agents — same governance, no human at the keyboard WIP
    

About Bowtie www.bowtie.com.hk

HK's first virtual insurer (licensed Dec 2018)
Direct-to-consumer, zero agents, zero commissions
Series C: US$70M (2025) · ARR > US$80M
In-house core system on AWS Cloud
Using Copilot since 2022, GenAI since early 2023
200+ employees, ~50 engineers

Digitalization with human touch; AI handles routine; humans handle nuanced
We won't let AI run prod unsupervised until it's reliable enough

ABOUT BOWTIE'S TOOLS

A software engineer's daily reality

Slack Notion Gmail Google Drive Zendesk JIRA AWS Mixpanel GitHub Metabase …and more

🗂️ Tab chaos

Sometimes a single feature investigation means opening 6+ tabs across different SaaS tools just to piece together the full picture.

🎮 ⬆️⬆️⬇️⬇️⬅️➡️⬅️➡️🅱️🅰️🅱️🅰️

Knowing which SaaS to check, where to find the right page, and how to combine them is like memorizing a Konami code — tribal knowledge that only veterans have.

This is why we built what we built.

Layer 1

LLM Gateway

Centralize · Visible · Democratized

🔑 Why Developers Get to Vibe Code

Governance at the infrastructure layer means developers don't carry the compliance burden.

Without infra governance

Every developer must remember: right provider, right key, don't paste secrets, check DLP...

Compliance through training + trust = fragile

With infra governance

Developers just code. Gateway handles routing, DLP, cost tracking, audit automatically.

Compliance by design = developers vibe freely

🧬 AGENTS.md → SOUL.md

Remote deployment of AGENTS.md enforces a shared "SOUL.md" across all engineers' coding agents — consistent behavior, guardrails, and personality pushed centrally via MDM.

Our LLM Gateway

Single LiteLLM instance on AWS — serves engineers and internal services

🧑‍💻 Engineers
Claude Code / Cursor / Copilot

🏥 Core System
Insurance platform

🤖 Machine Agents
IT bot / CS bot

→

🔒 VPC

ALB
Load Balancer

↓

ECS
LiteLLM

🐘 PostgreSQL

📦 S3

→

☁️ Azure OpenAI

☁️ Vertex AI

☁️ AWS Bedrock

☁️ X.ai

Every Request Goes Through the Gateway

🤖

AI Coding Agent

→

🔒

Firewall
Block + Scan

→

🔀

LLM Gateway
DLP · Cost · Auth

→

☁️

AI Provider
Azure OpenAI · Vertex AI · Amazon Bedrock

DLP at Firewall

Blocks sensitive data before it leaves your network

DLP at Gateway

Scans prompts + responses; logs retained for auditing

Cost Attribution

Per-user, per-team spend visibility in real time

🚨 3 Weeks Ago...

LiteLLM — the most popular LLM gateway proxy (95M downloads/month)

1. Attackers compromised a CI dependency → stole LiteLLM's PyPI token

2. Published two backdoored versions — 47,000 installs in 3 hours

3. Every machine had secrets exfiltrated: SSH keys, cloud creds, the LLM API keys it was proxying

How was it caught? The attacker vibe-coded the malware — it crashed a researcher's laptop instead of running silently.

What if they hadn't been sloppy? Silent exfiltration, potentially undetected for weeks.

🛡️ Why This Didn't Hurt Us

      47K machines auto-upgraded
      →
      Pinned versions; we never pulled v1.82.7
    

      Creds exfiltrated to fresh domains
      →
      Cloudflare WARP + Gateway; blocks new domains at DNS
    

      Backdoor live within hours
      →
      7-day cooldown; most attacks found in 24-72h
    

      Poisoned package on public PyPI
      →
      Docker version; CI/CD pulls from internal docker mirror
    

Four practices that existed before this attack. None of them exotic.

Centralized Guardrails

🔒 Firewall (Network Layer)

Block all AI providers except official list
DNS-level blocking (FREE <50 users)
Cloudflare Zero Trust category filter
Enforces BYOK policy at network level
✅ llm.bowtie.internal/chat/completions
❌
random-llm-proxy.com/v1/chat/completions

🔀 LLM Gateway (Application Layer)

Cost attribution by team/project/user
DLP: block sensitive data at request + response
Logging: full audit trail
Auth: per-user identity tracking

📊 Cost & Adoption Visibility

Everything flows through one gateway — so we see everything

💰 Cost per model / provider

Real-time spend breakdown by model, provider, team, and individual engineer

📈 Adoption & preference

Which engineers use which models, how often, and for what — no surveys needed

🧑‍💼 Engineering Manager view

See vibe coding adoption across the team — who's using it, who isn't, and where to invest training

🗺️ Planning & budgeting

Data-driven decisions on which providers to keep, scale, or drop

Why Not Just Pick One Provider?

Source: Artificial Analysis, Apr 2026

GPT → Claude → Gemini → DeepSeek → GPT → …
🔄 Engineers keep switching to the "best" model

No single provider wins on both intelligence and speed.

🧑‍💻 The editor problem

You can't force everyone onto one editor — nano vs vim vs VS Code vs Notepad++ vs JetBrains.

Same with LLMs — engineers have strong preferences on their "best coding model."

🔀 Gateway = freedom + control

Let engineers pick their model. The gateway enforces DLP, cost, and auth regardless.

我全部都要 — "We want everything!"

Layer 2

MCP Servers

Scoped · Per-User · Auditable

Before: The MCP Mess

Everyone connecting MCPs with shared secrets

🤯 One shared token → Who made that API call?

Case Study: Notion MCP

What the shared-secret era actually looked like

❌ Before: stdio + shared secret

{
  "mcpServers": {
    "notionApi": {
      "command": "npx",
      "args": ["-y", "@notionhq/notion-mcp-server"],
      "env": {
        "NOTION_TOKEN": "ntn_****"
      }
    }
  }
}

No audit trail · Token rotation breaks everyone · Who accessed what?

✅ After: HTTP + OAuth 2.0

{
  "mcpServers": {
    "notion": {
      "url": "https://mcp.notion.com/mcp"
    }
  }
}

Per-user tokens · Full audit trail · Revoke one user ≠ break everyone

This is the pattern we follow for all Bowtie MCP servers.

Three Types of MCP

stdio ⚠️

Local processes (stdin/stdout)

Usually hardcodes a shared admin API key; no per-user auth, no credential rotation

No OAuth support
Good for local dev only

SSE ⛔ Deprecated

Server-Sent Events

Deprecated Mar 2025. Stateful — bad for stateless hosting like AWS Lambda. Two-endpoint model, scaling challenges.

Migrate now if using SSE

HTTP ✅ Current Standard

Streamable HTTP

Stateless; single endpoint, standard POST requests. Think of it like gRPC / Protobuf — structured RPC over HTTP.

Best for production
Supports OAuth 2.0 natively

💡 Why HTTP over stdio?

Your dev has 10 Node.js versions? Python 3.8-3.12? Different tool versions? HTTP MCP doesn't care — it's just a URL:

{
  "mcpServers": {
    "notion": { "url": "https://mcp.notion.com/mcp" }
  }
}

MCP + OAuth Architecture

flowchart LR
    subgraph Internet["🌐"]
      Dev["💻 Developer"]
      CLI["🤖 AI Agent\n(Claude Code)"]
    end
    
    subgraph VPC["🔒 VPC Boundary"]
      MCP["📡 MCP Server\n(HTTP + OAuth)"]
      Tools["🛠️ Tools\n(list_tickets, get_user_info, ...)"]
    end
    
    subgraph Auth["🔐 OAuth 2.0"]
      Cognito["Amazon Cognito\n(User Pool)"]
    end
    
    CLI -->|MCP + Bearer JWT| MCP
    MCP --> Tools
    
    Dev -->|1. Login| Cognito
    Cognito -->|2. Access Token| Dev
    Dev -->|3. MCP Call + Token| MCP
    MCP -->|4. Validate| Cognito

Each user has their own token. AI acts as the user's delegate. Token revocable anytime.

The Golden Rule

"If you can't do it in the UI,
the agent can't do it via MCP"

What this means:

MCP = wrapper around existing APIs
Reuse existing permissions
No new privileged endpoints
AI is your delegate

Security benefits:

No shared connection secrets
Per-user token audit
OAuth handles auth flows
Token revocation works

Our MCP Toolkit

9 servers — all Streamable HTTP, all behind the gateway

🔌Bowtie API — OAuth + Amazon Cognito, query live prod data as yourself; same token expiry as the web app

📊Metabase — SQL queries with browser-based auth, PII-safe; same token expiry as the web app

🎫Zendesk — Ticket lookup and triage

💬Slack — Search channels, post updates, read threads

📝Notion — Read & write pages/databases; understands schema of our tables

🏥VHIS — Voluntary Health Insurance Scheme (HK); plan lookups, product codes, provider info

🔗Internal Deep URI — Deep-link builder for internal tools

📚KB Search — Search & fetch from our knowledge base with company info

Same endpoint pattern: https://mcp.bowtie.internal/{server}/mcp (example)

Who Uses What

Same tools, different workflows — the surprise was who adopted fastest

Engineering

Bowtie API + Metabase

"Why did policy #12345's renewal fail?" — answer in seconds, not 20 min across 3 tabs

Notion + Slack + Internal Portal

Incident response: pull specs, check threads, deep-link to internal tools — all from the AI

Beyond Engineering

Zendesk + FAQ + VHIS

CS triages tickets with product knowledge built in — same auth, same governance

Notion + Metabase

Product & Underwriting query data and update specs — no code required

3 engineers → 50+ people in 6 months — because the safe path was the easy path

Wrapping Bowtie API with OAuth MCP

Existing Cognito-backed API → MCP HTTP server with full OAuth 2.0

🤖

AI Agent
Claude Code / Cursor

→

🔐

OAuth MCP Wrapper
type: HTTP + OAuth 2.0
Authorization Code + PKCE

→

🔑

Amazon Cognito
User Pool + JWT

→

🏥

Bowtie API
Existing endpoints

How it works

Agent discovers OAuth metadata via .well-known
User logs in via Cognito (browser popup)
Wrapper receives user's JWT, forwards to Bowtie API
API sees the real user — not a shared service account

Why this is powerful

Zero API changes — wraps existing endpoints
Per-user identity — agent acts as the user
Token lifecycle — same expiry as web app
MCP spec compliant — works with any MCP client

🔴 Live Demo (UX Flow Mockup)

MCP OAuth with cognito-local + Claude Code — UX flow demonstration

gabrielkoo.github.io/agentcon-2026-hk-demo

1. Unprotected API
2. 401 without token
3. Tools available with JWT
4. User identity confirmed

Layer 3

Custom Skills

"Build Skills, Not Agents"

What Is a Skill?

A portable package of knowledge, procedures, and code your agent loads on demand

Instructions + executable code

A SKILL.md with procedures and guardrails, plus optional scripts/ (Python, Bash, JS), references/, and assets/.

3-stage progressive disclosure

1. Description (~100 tokens, always loaded) → 2. Full instructions (on activation) → 3. Scripts & references (only when needed)

Portable across 9+ agents

Same skill works in Claude Code, Copilot, Cursor, Codex, Kiro, Goose, Amp, and more. Open standard — write once, run everywhere.

api-investigation/
├── SKILL.md
│   ---
│   name: api-investigation
│   description: Investigate API
│     issues using logs & metrics
│   ---
│   ## Steps
│   1. Check error rates
│   2. Pull CloudWatch logs
│   3. Cross-ref deploy history
│   ## Guardrails
│   - Read-only / Escalate if P0
├── scripts/
│   └── fetch_metrics.py
├── references/
│   └── runbook.md
└── assets/
    └── alert_template.json

TALKING POINTS: - "CLAUDE.md is your employee handbook — always loaded. A Skill is a specific SOP you pull off the shelf when you need it." - "But skills aren't just markdown. They can bundle executable scripts — Python, Bash, JS — that run deterministically alongside the LLM's reasoning." - "Think: the LLM decides WHEN to investigate, the script deterministically FETCHES the metrics. Best of both worlds." - "Progressive disclosure is key — you can have 50+ skills installed and only ~100 tokens per skill are loaded at startup. Full instructions + scripts load only when relevant." - "Oct 2025: Claude Code ships SKILL.md. Within 2 months, every major coding agent adopted the format. 9 agents now support it." - "73K stars on the Anthropic skills repo. 3,700+ community skills. This is an ecosystem now, not a feature." - "It's an open standard — agentskills.io has the spec. Microsoft, Anthropic, GitHub all support it."

The Computing Analogy

A few companies build processors. Millions build applications.

🧠

Models = Processors (CPUs)

OpenAI, Anthropic, Google, Meta — a handful of companies

⚙️

Agent Runtimes = Operating Systems

Claude Code, Cursor, Copilot, OpenCode — dozens of runtimes

📱

Skills = Applications

Anyone can build these — engineers, CS, legal, finance

You don't need to understand CPUs to build an app.
You don't need to train models to build skills.

TALKING POINTS: - "A handful of companies build processors — Intel, AMD, Apple. A few more build operating systems. But millions of people build apps." - "Same pattern: a handful build foundation models, a few dozen build runtimes, but ANYONE can build skills." - FUNNY: "You don't need to understand quantum physics to use a calculator. You don't need to train a model to write a skill." - ANTHROPIC REPORT: "Trend 5 — 'Agentic coding expands to new surfaces and users.' Non-engineers creating automations is the fastest-growing segment." - OPENCLAW PARALLEL: "OpenClaw has 3,700+ community skills on ClawHub. SOUL.md for personality, SKILLS.md for capabilities — same pattern as our CLAUDE.md." - "The unlock at Bowtie: non-technical people (finance, legal, CS) writing the MOST useful skills, because they have the deepest domain expertise." - "A developer who writes a tax calculation skill is worse than an accountant who writes one — even with a worse model."

Why Skills Beat Raw Intelligence

"Who do you want doing your taxes? A 300 IQ genius or an experienced tax professional?"

In 2026, the capability gap between models shrank.
The differentiation is in the skills you build on top.

This isn't just for engineers

Fortune 100 companies are building enterprise skill libraries

Non-technical people (finance, legal, CS) writing skills — often the most useful ones

TALKING POINTS: This also brings me to an philosophical question, what do you do, when intelligence is a commodity? - "In 2026, the capability gap between models shrank. GPT-5.4 vs Claude Opus 4.6 vs Gemini 3.1 — they're all smart enough. The differentiation is skills." - FUNNY: "It's like hiring the world's smartest person and not giving them a laptop. Smart, but useless without tools and process." - "Fortune 100 companies building enterprise skill libraries. This is a real trend, not a toy." - "One Cursor Automations user: 20-40% reduction in manual review tasks — from skills running on cron, not from a better model." (Digital Applied) - "OpenClaw's ClawHub has 3,700+ community skills. People are building skills for accounting, farming, auto repair — not just coding." - BOWTIE: "Our underwriting team wrote a risk assessment skill. They know more about insurance risk than any model ever will."

A Skill in Practice

API investigation — our most-used skill

Before

Engineer gets paged
Opens 3 tabs — admin panel, DB, logs
Figures out the right tenant filter
Manually queries the API
~20 min to get the answer

After

Engineer asks the AI
Skill handles the "Where/how to look"
MCP servers query the DB, logs and API server
Answer in seconds

Same permissions, same API — just faster

Same pattern for analytics (PII auto-masked), code review (per-repo CLAUDE.md), ticket triage (Linear workflows)

TALKING POINTS: - STORY: "You get paged at 11pm. Before skills: open 3 tabs, hunt for the right tenant, manually query — 20 minutes of your life you're not getting back." - "After: you ask the AI. Skill handles auth, scopes to the right tenant. Answer in seconds. Same API, same permissions." - FUNNY: "The AI doesn't complain about being woken up at 11pm either." - "We applied the same pattern everywhere — analytics with auto-PII masking, code review with per-repo CLAUDE.md, Linear ticket triage." - CISCO PARALLEL: "Cisco's Codex integration: cross-repo build optimization across 15+ repos, ~20% reduction in build times, 1,500+ engineering hours saved. Skills + tools at scale." - "Developer completed MCP server project estimated at 10 hours in 17 minutes using OpenClaw + coding sub-agents — 35x speedup. But the skill defined WHAT to build." (Medium, Feb 2026)

I think we should rewrite the whole API server in Rust

You're absolutely right! That's a brilliant idea. Rust would be perfect here. Let me start rewriting the entire module right away...

Actually wait, maybe Svelte instead?

You're absolutely right! Svelte is the much better choice here...

Sycophancy

The tendency of AI models to prioritize agreement over accuracy, telling users what they want to hear rather than what they need to hear.

A sycophantic agent isn't just annoying — it's a security risk.
It'll do whatever a prompt injection tells it to.

Give Your AI a Spine: CLAUDE.md

Your AI should push back like a senior engineer, not agree like an intern on day one.

What goes in CLAUDE.md

Architecture patterns & conventions
Test requirements & coverage rules
PR process & review checklist

Deployment guardrails
Security policies (PII handling, etc.)
Personality — be opinionated, push back

The payoff

New hire on day one gets the same AI-guided standards that took senior engineers months to learn.

Standardized CLAUDE.md → consistent behavior across the entire org

SOPs that previously lived in someone's head — now version-controlled and enforced by AI.

TALKING POINTS: - "CLAUDE.md is the fix for sycophancy AND for inconsistent standards. Every repo at Bowtie has one." - "It encodes personality — be opinionated, push back on bad ideas — alongside architecture patterns, test rules, deployment guardrails." - FUNNY: "We basically gave our AI the personality of that one senior engineer who always says 'have you thought about...' in code review. Except it never goes on holiday." - FUNNY: "It's like that wiki page everyone says they'll update but never does — except the AI actually reads it. Every single time." - "The payoff: new hire on day one gets Staff+ quality guidance. The AI knows YOUR conventions, not just generic best practices." - "Because it's in git, standards evolve with your codebase. No more outdated Confluence pages from 2019." - INDUSTRY: "Cisco standardized CLAUDE.md-equivalent configs across 15+ repos. Stripe's Minions use ~500 MCP tools with scoped configs per agent." - PUNCHLINE: "SOPs that previously lived in someone's head — now version-controlled and enforced by AI."

Vibe Coding Went Enterprise

This isn't early adopters anymore

87%

of Fortune 500 use AI coding tools in production

RunAICode, Feb 2026

76%

of orgs have ungoverned AI code in production

Digital Applied, Mar 2026

35–40%

reduction in time-to-first-commit with AI workflows

Kalvium Labs, 200+ engineers

The productivity gains are real. The governance gap is also real.
That's why you need the 3 layers.

TALKING POINTS: - "87% of Fortune 500 now use AI coding tools in production — not piloting, actually shipping." - "But 76% have ungoverned AI code running in prod. That's 3 out of 4 companies." - "Kalvium Labs, 200+ engineers: 35-40% faster time-to-first-commit. Seniors get 40-50% gains, juniors 15-25%." - "AI code review alone prevents 2-3 production bugs per week — 8-24 hours saved in incident response." - FLIP SIDE: "40% of AI-generated code embeds security vulnerabilities." (Digital Applied) - "1 in 8 enterprise breaches now linked to AI-generated code." - SHADOW AI: "98% of orgs have unsanctioned AI usage. 78% of employees BYOAI." (Microsoft/Varonis) - "Only 12% of enterprises have full visibility into AI tool usage." (Aona AI 2026) - "But here's the good news: 89% drop in unauthorized usage when you provide an approved alternative." (Healthcare Brew) - PUNCHLINE: "The productivity gains are real. The governance gap is also real. That's why you need the 3 layers."

The Vibe Coding Tension

Andrej Karpathy

"I 'Accept All' always, I don't read the diffs anymore."

Feb 2025 — coined "vibe coding". Fine for throwaway weekend projects.

Simon Willison

"I won't commit any code I can't explain to someone else."

Mar 2025 — "If an LLM wrote it and you reviewed it, that's not vibe coding, it's software development."

So… which one is right?

Our approach: govern the infra, let developers vibe.

Even if a developer "Accept All"s — the gateway blocks sensitive data, MCP tools are scoped, skills encode standards.

Vibe Coding ≠ Just Writing Code

What "coding" actually looks like in practice

60%

20%

Research · Logs · Talking to users Prototyping Implementation

Every layer accelerates a different part

Gateway gives you safe model access. MCP tools let AI query your real systems. Skills encode your team's knowledge. Together, they accelerate the 80% that isn't typing code.

The biggest productivity gain isn't in writing code faster — it's in everything before the code.

TALKING POINTS: - "Most developer time isn't writing code — it's the 60% before it: reading logs, researching APIs, talking to stakeholders." - "IBM reports 60% reduction in development time for enterprise apps using AI-assisted workflows." - "But the gains aren't evenly distributed: boilerplate/scaffolding sees 50-60% improvement, integration code 20-30%, architecture/complex debugging sees almost no improvement." (Kalvium Labs) - "AI tools are accelerators, not replacements. The 'expert imperative' — engineers become orchestrators." (Keywords Studios 2026) - "Anthropic's 2026 Agentic Coding Trends Report: 'Everyone becomes more full-stack' — non-engineers creating automations." - TIE TO BOWTIE: "Our 3-layer stack accelerates the 80% that isn't typing code." - PUNCHLINE: "The biggest productivity gain isn't writing code faster — it's everything before the code."

From 3 Pioneers to 50+ People

      3 engineers
      Built the first skills & MCP servers. Proved the pattern worked.
    

      Engineering
      Standardized CLAUDE.md across repos. Consistent AI behavior org-wide.
    

      Non-engineers
      CS built ticket triage skills. Product built spec review. Underwriting built risk assessment.
    

      Today
      10 MCP servers · 50+ people · new hires get Staff+ quality guidance on day one
    

And we're not alone — Stripe, Sentry, Linear all ship official MCP servers as building blocks

TALKING POINTS: - "Phase 1: Three engineers. Built first skills and MCP servers. Proved the pattern worked." - "Phase 2: Standardized across engineering. Every repo got a CLAUDE.md." - "Phase 3: The real unlock — expanded beyond engineering. CS, Product, Underwriting all writing skills." - ANTHROPIC REPORT: "Trend 5 — 'Agentic coding expands to new surfaces and users.' Non-technical people creating automations. We saw this at Bowtie." - "One Augment Code customer finished a project estimated at 4-8 months in just 2 weeks." (Anthropic 2026 Trends) - "Fountain, a workforce management platform, achieved 50% productivity gains." - INDUSTRY: "Gartner projects 40% of enterprise apps will embed AI agents by end of 2026, up from <5% in 2025." - TIE BACK: "We're not alone — Stripe, Sentry, Linear all ship official MCP servers." - "Today: 10 MCP servers, 50+ people, new hires get Staff+ quality guidance on day one."

The Whole Stack

Layer 3: Skills — encode how to think

Layer 2: MCP + OAuth — scoped, auditable tools

Layer 1: LLM Gateway — DLP, cost, visibility

Each layer makes the next one possible.
Together, they make vibe coding safe enough for production.

TALKING POINTS: - "Gateway controls WHAT models get called. MCP controls WHICH tools agents use. Skills control HOW agents behave." - "Each layer depends on the one below it." - ZERO TRUST PARALLEL: "Cloud Security Alliance published the Agentic Trust Framework, Feb 2026 — Zero Trust for AI agents." - "5 questions every agent must answer: Who are you? What can you do? What data are you touching? What did you do? What if you go rogue?" - "Maps directly: L1 = observability, L2 = authorization/identity, L3 = behavioral governance." - "67% of enterprise AI deployments have zero runtime access controls." (SupraWall 2026) - "80% of Fortune 500 use AI agents but only 47% are actively monitored or secured." (AlphaCorp) - PUNCHLINE: "Together, these 3 layers make vibe coding safe enough for production."

Agents Are Already Shipping Code

This isn't hypothetical — it's happening at the biggest companies in tech

      Stripe
      1,300 PRs/week merged with zero human-written code. Engineers review everything, no human types.
    

      OpenAI Harness
      7 engineers, ~1M lines of code, 0 manually written, 3.5 PRs/engineer/day for 5 months
    

      GitHub
      60M Copilot code reviews — 1 in 5 PRs on GitHub now reviewed by AI
    

      MSR 2026 Study
      932K agent PRs analyzed. 55% merged without any revision. Agent PRs = 10% of all GitHub commits
    

The pattern: agents write, humans review. Sound familiar?

TALKING POINTS: - STRIPE: "Minions — forked from Block's Goose, ~500 MCP tools, sandboxed on AWS EC2. 30% week-over-week growth. Two-round CI limit — if agent can't fix in 2 tries, humans take over." - "A company processing $1 TRILLION in payments trusts autonomous agents to write production code." - OPENAI: "Most extreme documented case. Codex agents run 6+ hours per task. Agent-to-agent code review — no human bottleneck." - GITHUB: "12,000+ orgs run Copilot code review on every PR. WEX: ~30% more code shipped after making it default." - "71% of Copilot reviews surface actionable feedback. In the other 29%, it says nothing — silence is better than noise." - MSR 2026: "932K agent PRs, 116K repos. Codex: 83% acceptance rate. Claude: 66%. Copilot: 45%." - "55% of agent PRs merged without any revision. Agent PRs now 10% of all GitHub commits." - CHURN WARNING: "But agent-generated code shows higher churn over time — maintenance still needs humans." - PUNCHLINE: "The pattern is clear: agents write, humans review. That's exactly our model — layers 1-3 provide the governance."

The 24/7 Agent

Same 3 layers — just running without a human at the keyboard

⚡

Trigger
Slack msg / cron /
GitHub alert

→

🤖

Agent Runtime
24/7 on your infra

→

🔒

Your 3 Layers
Gateway · MCP · Skills

→

📝

Output
PR / Slack reply /
Jira update

67%

of SIEM alerts go uninvestigated every day

AI triage: MTTR 4hrs → 20min

D3 Security, 2026

200+

eng hours/month saved by autonomous ops agents

Incidents: hours → minutes

NeuBird AI, Apr 2026

99%

of security triage automated at Google

1M tickets/year, auto-triaged

Caleb Sima, Mar 2026

If you built layers 1–3 well, the autonomous agent is just a different caller — not a new problem to solve.

TALKING POINTS: - PAGERDUTY: "Spring 2026 — SRE Agent is now a 'virtual responder' sitting on on-call schedules alongside humans. Not assisting — literally replacing a rotation slot." - D3 SECURITY: "Enterprises generate 100K+ SIEM alerts/day. Average SOC: 5-10 analysts. 67% of alerts go uninvestigated. AI triage: 100% investigated, MTTR 4-24hrs → under 20min." - NEUBIRD: "$19.3M funded. Autonomous ops agents save 200+ eng hours/month. Incidents: hours → minutes." - GOOGLE: "Detection director: 1M security tickets/year, 99% automated. But built on 15 years of detection-as-code infra." - MICROSOFT COPILOT STUDY: "30% reduction in mean time to resolution for security incidents." (arxiv) - "51% of companies already deployed AI agents for ops; 86% expect to by 2027." (PagerDuty) - "78% of developers spend 30%+ of their time on manual toil." (Harness) - OPENCLAW IN THE WILD: "350K+ GitHub stars, MIT license. Context Studios runs it 24/7 on a Mac Mini with 134 MCP tools." - "E-commerce retailer: 60-70% of support resolved by agent, CSAT 3.2→4.6, $2.1M annualized savings." - "DevOps team: agent 'Reef' handles 70% of overnight K8s incidents before any engineer is paged." - BUT: "42,900 exposed OpenClaw control panels found by security researchers. 9+ CVEs in 2 months. CVSS 8.8 RCE." - "Academic paper (Apr 7): poisoning any single dimension → attack success rate from 24.6% to 64-74%." - TIE BACK: "This is exactly why you don't run these raw. You run them BEHIND your gateway, your OAuth, your skills." - PUNCHLINE: "If you built layers 1-3 well, the autonomous agent is just a different caller — not a new problem to solve."

Key Takeaways

Many "AI issues" were not caused by AI — only amplified. The fix was team-wide governance, not restriction.

      Layer 1: LLM Gateway — One choke point so every developer gets safe, visible AI access
    

      Layer 2: MCP — Per-user OAuth means no shared secrets; AI acts as your delegate
    

Layer 3: Skills — Team SOPs as code; new hires get Staff+ guidance on day one

Build these 3 layers well, and autonomous agents become a natural extension — not a new problem to solve.

Govern the infra.
Let developers vibe.

"The question isn't whether your developers will use AI. They already are.
The question is whether you've made the safe path the easy path."

Gabriel Koo & Rakshit Jain
Bowtie

github.com/gabrielkoo/agentcon-2026-hk-demo
github.com/the-quantum-nargle/agentcon-2026-hk-slides

Q&A — Talk with us off the stage!

Join more AWS User Group Hong Kong events 👇