In 2026, GitHub is no longer just a place to host code. Assign an issue to Copilot and it spins up a dedicated VM inside GitHub Actions, clones your repo, runs RAG over the codebase, plans a change, pushes commits to a draft pull request, then runs Copilot Code Review and security scans on its own work. Cursor, Claude Code, and Codex plug in the same way. Software collaboration is shifting: humans define goals, write rules, and review outputs, while the repository itself becomes the agent's execution workspace. This runbook walks the three GitHub agent product layers (Copilot coding agent, .github/agents/, Agentic Workflows) and pairs them with a dedicated Apple Silicon runner.
00The shift: from code hosting to an AI agent execution workspace
For the last decade, GitHub bundled code, issues, pull requests, Actions, and permissions into a graph optimized for human collaboration. People wrote code, opened PRs, reviewed PRs, and merged once CI was green. In 2025 and 2026, the actor that writes the code is changing. The repository is moving from a code store to an agent execution sandbox.
Three product layers are driving the shift. Copilot coding agent (cloud agent) starts from an issue assignment or a Chat prompt, boots a dedicated VM on GitHub Actions, retrieves code with RAG, plans tasks, and pushes commits to a draft pull request. .github/agents/ lets you declare multiple specialized roles: a performance optimizer agent, a test-writing agent, a docs agent. Agentic Workflows (gh-aw), in technical preview, compiles plain Markdown in .github/workflows/ into Actions YAML, then runs Copilot, Claude, or Codex inside a sandbox to drive event-triggered, long-running automation.
Stacked together, GitHub now reads as human writes a goal → agent executes inside the repository → human reviews the draft PR → CI/CD ships. Understanding this shift matters more than chasing a single tool. The compute contract that supports it (self-hosted runners, dedicated Apple Silicon nodes) becomes the new substrate.
PAINFour hidden costs of running agents on a minute pool
Many teams wire Copilot coding agent or Claude Code straight onto GitHub-hosted runners. It looks cheap. It is not, once four costs surface:
- Minute pool times long agent jobs. Agent tasks rerun compile, tests, and security scans. A single job often runs 30 minutes or more. Per-minute pricing combined with noisy neighbors creates non-linear monthly spend, steeper than any pure human CI you have measured before.
- Signing material exposure. On a shared runner, secrets are injected into the agent's environment. Under a prompt-injection attack or a malicious dependency, signing certificates and provisioning profiles gain a real escape path. GitHub's SafeOutputs MCP Gateway, Agent Workflow Firewall (AWF), and threat detection job exist precisely to physically separate the agent's write capability from its execution environment.
- Unpredictable review pressure. An agent can open dozens of draft PRs overnight. Without AGENTS.md and custom agent guardrails, reviewers drown. GitHub deliberately requires human approval on draft PRs before CI/CD runs, a built-in "human in the loop" switch.
- Underpriced Apple Silicon. iOS and macOS projects need pinned Xcode, signing keychains, DerivedData, and TestFlight uploads. On generic hosted macOS runners that is slow and expensive. On a dedicated node it is cheaper and auditable.
Put these into a review and the conclusion writes itself: an agent execution workspace needs dedicated compute plus an explicit security contract, not a generic runner.
01Three GitHub agent product layers, side by side
The table below aligns three concepts that are often confused. Use it when picking the right entry point for a team:
| Layer | Trigger | Execution environment | Main output | When to use |
|---|---|---|---|---|
| Copilot coding agent (cloud agent) | Issue assignment / Copilot Chat / gh CLI | Dedicated VM on GitHub Actions, RAG over the repo | Draft PR plus self Code Review plus security scans | Hand an agent an issue and ask for a reviewable change |
.github/agents/ custom agents | Declared in the repo as roles and flows | Same VM, with per-role tools and prompts | Same, plus benchmark or diff reports | Encode the team Runbook as agent behavior |
| Agentic Workflows (gh-aw) | Any Actions event (issue / PR / schedule / comment) | Markdown compiled to Actions YAML, Copilot / Claude / Codex in sandbox | Constrained writes via SafeOutputs: issue / comment / label / branch / PR | Issue triage, CI failure analysis, docs maintenance, compliance sweeps |
These layers are not exclusive. Mature teams use Copilot coding agent as the default "write code" entry point, lean on .github/agents/ to constrain roles (test agent, perf agent, docs agent), and add Agentic Workflows for event-driven, always-on automation: analyze a red CI, triage every new issue. Together, GitHub becomes a clean event → agent → controlled write execution plane.
02Workspace security contract: SafeOutputs, AWF, human review
For any plan that lets an agent operate on a repository, the security contract matters more than the model choice. GitHub Agentic Workflows make this layer explicit, and the pattern is worth copying when you self-host:
- SafeOutputs MCP Gateway. The agent does not call the GitHub API directly. It declares intended write operations to the gateway, which buffers them as artifacts. After the agent exits, a separate job with write permissions validates, sanitizes, and rate-limits each request. The agent process is always read-only and has no secrets.
- Agent Workflow Firewall (AWF). The agent runs in an isolated container with iptables-routed traffic going through a Squid proxy. A domain allowlist blocks data exfiltration and external command-and-control.
- Threat detection job. Before any patch is applied, a security-focused agent scans the diff for prompt injection, leaked credentials, and malicious patterns. Anything suspicious fails the workflow.
- Branch protection and human review. Copilot coding agent can only push to branches it created. Draft PRs require human approval before CI/CD runs. People remain the fail-safe valve before deployment.
03AGENTS.md: the repo as a spec, so agents stop guessing
The AGENTS.md spec, co-developed in 2025 by OpenAI, Cursor, Jules, Amp, and Factory, is the human side of the agent workspace contract. Put a "README for agents" at the repository root and you give every coding agent a predictable place for project structure, build commands, test commands, style rules, and security notes.
- Hierarchical resolution. Subdirectories can carry their own AGENTS.md, and the agent picks the closest one to the file being changed. OpenAI's main repository currently ships 88 of them.
- Must-follow versus should. Hard rules ("no commits on main", "pin Xcode 16.2") sit next to soft preferences ("prefer Swift Testing"). Agent behavior converges.
- README and AGENTS.md complement each other. README serves humans and outside contributors. AGENTS.md serves agents, with build steps, long-context rules, and security gotchas that would otherwise clutter the human doc.
- Copilot CLI compatible. JetBrains and VS Code Copilot CLI Agent already support a global
~/.copilot/agents/.agent.md, so personal preferences can layer on top of team rules.
For iOS or macOS work, AGENTS.md should pin the Xcode version, list the xcodebuild entry point, document signing pre-steps, declare gates like -strict-concurrency=complete, name the dedicated runner labels, and call out off-limits directories. The more explicit the rules, the less the agent improvises. This mirrors the freeze-baseline, SSH-baseline, cache-partitioning, signing-isolation checklist in the existing NUKCLOUD dedicated Apple Silicon node runbook; AGENTS.md is its agent-readable form.
DATAOrder-of-magnitude numbers worth quoting in a review
The figures below come from common iOS and macOS CI plus agent rollouts. Treat them as anchors and measure your own:
- Agent job wall time. Copilot coding agent typically runs "fix a bug, add tests" in 8 to 25 minutes; a "refactor a module plus full build" hits 30 to 90 minutes. Shared minute pools still queue at P95 of 15 to 45 minutes during release windows.
- Draft PR cadence. After enabling Copilot coding agent, mid-size teams see daily draft PR counts grow 2x to 5x. The bottleneck moves from "writing code" to "reviewing PRs." AGENTS.md and
.github/agentsare the levers that bring it back. - SafeOutputs benefit. With SafeOutputs and AWF enabled, the agent process holds no
GITHUB_TOKEN, no secrets, no MCP API key. Internal threat-detection rejection rates during preview sit at roughly 0.5 to 2 percent, mostly false positives on prompt-injection-shaped content. - Dedicated Apple Silicon benefit. On a NUKCLOUD dedicated node with
--concurrent-jobs 2 to 4reserved for agent work, a fullxcodebuildjob typically completes 25 to 40 percent faster than on a generic macOS runner. With DerivedData cache hits, you save another 30 to 60 percent. - Reviewer load. The agent runs a self Code Review before opening the draft PR, which compresses first-round human review time. The trade-off: reviewers must focus on design intent and system boundaries, not syntax.
04Six-step runbook: turn your repo into an agent workspace
A minimal runnable runbook for pairing the GitHub agent stack with a dedicated Apple Silicon runner. Land it in order; do not chase a perfect end state on day one.
-
01
Write an AGENTS.md. Root file with Must-follow (Xcode version, build commands, off-limits directories, branch protection) and Should (style, commit format). Add nested ones where useful. Codex, Cursor, and Copilot CLI all parse them.
-
02
Declare
.github/agents/custom agents. At minimum: a test agent that only touches*Tests*and runs the full suite, a docs agent that only touches.mdand changelog files, and a performance agent that benchmarks before changing. Constrain tool calls and required steps in the prompt. -
03
Adopt Agentic Workflows (gh-aw). Write issue triage, CI failure auto-fix, and doc sweeps as Markdown. Keep SafeOutputs on so the agent process stays read-only and all writes flow through constrained jobs.
-
04
Register a dedicated Apple Silicon runner. Order a NUKCLOUD dedicated node from the order page, enroll it as a self-hosted runner, and label it
agent-macos,xcode-16,signing-isolated. Mount the signing keychain only on jobs with write permission. -
05
Lay down the security contract. On the node, mirror AWF with Squid plus iptables. Move secrets out of the runner (OIDC plus a cloud KMS) so the agent container cannot read them. Require human review on draft PRs before CI/CD.
-
06
Close the review loop. Each week, track draft PR count, review latency, manual-reject rate, and threat-detection hit rate. Adjust AGENTS.md and custom agent prompts to match. Pair the rollout with the staged, reversible cadence in the Swift 6 strict concurrency CI gate runbook.
05Side-by-side: generic hosted runner vs dedicated Apple Silicon
Use the table below to align a review. Plug in your finance and network team numbers as you go.
| Dimension | Generic GitHub-hosted runner plus agent | NUKCLOUD dedicated Apple Silicon plus agent |
|---|---|---|
| Compute | Minute pool, noisy neighbors | Bare metal, no neighbors |
| Apple Silicon version | Platform-controlled, narrow update window | Pinned to your Xcode and macOS, your release cadence |
| Security contract | Platform default isolation | Add Squid, iptables, threat detection on the node |
| Signing material | Secrets land in shared runners, wide blast radius | Signing keychain mounts only on write-permission jobs |
| Cost | Long agent jobs and peak minutes inflate the bill | Monthly flat rate, agent work amortizes |
| Auditability | Queue and node opaque to tenants | Node-level logs, egress, and disk usage are observable |
The point is not "which one is cheaper." It is "can the agent execution workspace be defended in writing?" A dedicated node lets SafeOutputs, AWF, and human-review discipline land on real runner labels rather than slogans.
06FAQ
.github/workflows/ works with self-hosted runners too. To route agent jobs to a dedicated Apple Silicon node, label the runner agent-macos, xcode-16, and require those labels in AGENTS.md and your custom agent prompts..github/agents/: lock the directories an agent can touch, force tests, force change descriptions. Then push first-round review onto Agentic Workflows (analyze CI failures, summarize each PR). Humans then focus on design intent and system boundaries. The execution workspace does not delete review; it moves it up.