Contract Full Time Remote

Job role insights

Date posted

May 14, 2026

Closing date

May 14, 2026

Offered salary

Negotiable Price

Career level

Junior Middle Senior

Show more Hide less

Description

Company: PostHog

Job Category: Site Reliability Engineering / Platform / AWS / Kubernetes

Contract Type: Full-Time, Permanent Location: Remote — Americas (GMT -3 to GMT -8)

Salary: Location-adjusted via public calculator; open to exceeding ranges for top talent

Application Link: https://posthog.com/careers/site-reliability-engineer

Posted: Live as of May 14, 2026

Job Description: This is not a keep-the-lights-on SRE role. You'll turn a fast-growing, stateful system processing petabytes of data across thousands of cores into a predictable, well-automated platform. The work is about designing safe automation for traffic-heavy workloads, reducing operational stress, and building the tooling that lets the system scale without scaling human effort.

Key Responsibilities:

Operate EKS clusters across multiple environments with Karpenter autoscaling, Cilium networking, and ArgoCD-driven GitOps deployments
Manage and evolve a multi-AWS account organization — provisioning, networking, access control, cross-account connectivity
Maintain the Terraform/Terragrunt IaC platform including modules, automated plan-on-PR/apply-on-merge pipelines
Improve operational tooling around deploys, schema changes, backups, restores, and incident response
Reduce operational load by identifying repeat pain points and eliminating them through code and self-healing automation
Optimize cloud spend continuously
Participate in on-call and incident response, with a strong focus on making incidents rarer over time
Build AI agent-enabled infrastructure services using LLM tooling to automate alert management and observability

Requirements:

Deep hands-on Kubernetes production experience (EKS preferred), including debugging node pressure, networking issues, and deployment failures at scale (thousands of nodes)
Strong AWS infrastructure experience across multi-account organizations, IAM, and cross-account networking
Experience automating infrastructure with Terraform or Terragrunt at scale including module design and state management
Solid Linux systems knowledge: disk, memory, networking, failure modes
Experience supporting stateful systems (databases, queues, storage)
Comfortable owning systems end-to-end including on-call responsibilities

Nice to have: GitOps with ArgoCD; multi-region infrastructure experience; AI-enabled infra services.

Show more Hide less

Interested in this job?

0 days left to apply

Site Reliability Engineer (Cloud Foundations Team)

Job role insights

Description

Interested in this job?

Subscribe to our newsletter

About Us

Company

Services

Support

Site Reliability Engineer (Cloud Foundations Team)

Job role insights

Description

Interested in this job?

Call employer

Subscribe to our newsletter

About Us

Company

Services​

Support

Connect​

Send message

Services

Connect