Skip to main content
close

Search Jobs

Senior Technical Architect – Site Reliability Engineering & AIOps

Austin, TX
Requisition ID 2026-119491 Category Engineering & Software Development Position type Regular Pay range USD $210,000.00 - $240,000.00 / Year Application deadline 2026-03-07
Apply

Your opportunity


At Schwab, you’re empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together.

We believe in the importance of in-office collaboration and fully intend for the selected candidate for this role to work on site in the specified location(s).

In this role, you’ll lead the technical vision and architecture for our Site Reliability Engineering (SRE) and AIOps function, shaping how reliability, automation, and intelligent operations scale across the enterprise. This is not a traditional production support role. It requires engineering / coding experience.  You’ll work at the intersection of cloud-native platforms, distributed systems, and AI-driven operations—partnering closely with Engineering, Product, Security, and Infrastructure leaders to build resilient, self-healing systems that support millions of clients. This is a highly visible leadership role where your expertise influences both technology strategy and how teams operate day to day.

Key Responsibilities

  • SRE Architecture & Reliability Strategy — Define and own the end-to-end reliability architecture, including SLO/SLI frameworks, error budget policies, observability standards, and resilience patterns across distributed microservices environments.
  • AIOps Platform Architecture — Design and architect the AIOps platform encompassing ML-driven anomaly detection, predictive alerting, automated root cause analysis, event correlation, and intelligent remediation workflows.
  • Infrastructure & Platform Design — Lead architecture decisions for cloud-native infrastructure (GCP/AWS/Azure), Kubernetes orchestration, service mesh (Istio/Envoy), infrastructure-as-code (Terraform/Pulumi), and multi-region disaster recovery strategies.
  • Observability & Monitoring Architecture — Architect the unified observability stack integrating metrics, logs, traces, and events using technologies such as OpenTelemetry, Grafana, Datadog, and custom ML pipelines for intelligent alerting.
  • Automation & Self-Healing Systems — Drive the architecture of automated remediation frameworks, self-healing infrastructure, chaos engineering pipelines, and progressive deployment strategies (canary, blue-green, feature flags) to achieve zero-touch operations.
  • Technical Leadership & Governance — Establish architecture review boards, technical standards, design patterns, and reference architectures; lead technical due diligence and drive consistency across SRE and platform teams.
  • Team Development & Mentorship — Build, mentor, and grow a team of senior SRE architects and engineers; foster a culture of engineering excellence, continuous learning, and innovation in reliability and AI-driven operations.
  • Stakeholder & Executive Engagement — Partner with Engineering, Product, Security, and Infrastructure leadership to align reliability and AIOps investments with business priorities; present technical strategies to executive stakeholders.

What you have


Required Qualifications

  • 12+ years of experience in software development and engineering, infrastructure, or SRE, with 5+ years in a senior architecture or technical leadership role.
  • Deep expertise in distributed systems, cloud-native architectures, and large-scale production environments.
  • Hands-on experience with Kubernetes, Docker, service mesh, CI/CD pipelines, and infrastructure-as-code tools.
  • Strong understanding of ML/AI concepts and their application to operational intelligence — anomaly detection, predictive scaling, log analysis, and automated diagnostics.
  • Proven experience designing observability platforms using OpenTelemetry, Prometheus, Grafana, Datadog, Splunk, or equivalent.
  • Expertise in incident management frameworks, chaos engineering, and SLO-driven reliability practices.
  • Experience with major cloud platforms (AWS, GCP, Azure) at scale.
  • Strong communication and executive presence with the ability to translate complex technical concepts for non-technical stakeholders.

In addition to the salary range, this role is also eligible for bonus or incentive opportunities.


What’s in it for you

At Schwab, you’re empowered to shape your future. We champion your growth through meaningful work, continuous learning, and a culture of trust and collaboration—so you can build the skills to make a lasting impact. Our Hybrid Work and Flexibility approach balances our ongoing commitment to workplace flexibility, serving our clients, and our strong belief in the value of being together in person on a regular basis.

We offer a competitive benefits package that takes care of the whole you – both today and in the future:

  • 401(k) with company match and Employee stock purchase plan
  • Paid time for vacation, volunteering, and 28-day sabbatical after every 5 years of service for eligible positions
  • Paid parental leave and family building benefits
  • Tuition reimbursement
  • Health, dental, and vision insurance
Apply

Eligible Schwabbies receive

  • Medical, dental and vision benefits

  • 401(k) and employee stock purchase plans

  • Tuition reimbursement to keep developing your career

  • Paid parental leave and adoption/family building benefits

  • Sabbatical leave available after five years of employment