Job role insights

  • Date posted

    May 14, 2026

  • Closing date

    June 7, 2026

  • Offered salary

    Negotiable Price

  • Career level

    Junior Middle Senior

Description

Company: PostHog

Job Category: Site Reliability Engineering / Platform / AWS / Kubernetes

Contract Type: Full-Time, Permanent Location: Remote — Americas (GMT -3 to GMT -8)

Salary: Location-adjusted via public calculator; open to exceeding ranges for top talent

Application Link: https://posthog.com/careers/site-reliability-engineer

Posted: Live as of May 14, 2026

Job Description: This is not a keep-the-lights-on SRE role. You'll turn a fast-growing, stateful system processing petabytes of data across thousands of cores into a predictable, well-automated platform. The work is about designing safe automation for traffic-heavy workloads, reducing operational stress, and building the tooling that lets the system scale without scaling human effort.

Key Responsibilities:

  • Operate EKS clusters across multiple environments with Karpenter autoscaling, Cilium networking, and ArgoCD-driven GitOps deployments
  • Manage and evolve a multi-AWS account organization — provisioning, networking, access control, cross-account connectivity
  • Maintain the Terraform/Terragrunt IaC platform including modules, automated plan-on-PR/apply-on-merge pipelines
  • Improve operational tooling around deploys, schema changes, backups, restores, and incident response
  • Reduce operational load by identifying repeat pain points and eliminating them through code and self-healing automation
  • Optimize cloud spend continuously
  • Participate in on-call and incident response, with a strong focus on making incidents rarer over time
  • Build AI agent-enabled infrastructure services using LLM tooling to automate alert management and observability

Requirements:

  • Deep hands-on Kubernetes production experience (EKS preferred), including debugging node pressure, networking issues, and deployment failures at scale (thousands of nodes)
  • Strong AWS infrastructure experience across multi-account organizations, IAM, and cross-account networking
  • Experience automating infrastructure with Terraform or Terragrunt at scale including module design and state management
  • Solid Linux systems knowledge: disk, memory, networking, failure modes
  • Experience supporting stateful systems (databases, queues, storage)
  • Comfortable owning systems end-to-end including on-call responsibilities

Nice to have: GitOps with ArgoCD; multi-region infrastructure experience; AI-enabled infra services.

Interested in this job?

24 days left to apply

Apply now
Call employer
Apply now
Job Alert
Subscribe to receive instant alerts of new relevant jobs directly to your email inbox.
Subscribe
Send message
Cancel