About the Role
This is a senior technical leadership position for an experienced SRE professional ready to operate at an architectural and strategic level. You will serve as a technical authority across product areas, driving the reliability, observability, and automation of a large-scale Azure-based SaaS platform. This role blends deep hands-on expertise with cross-functional leadership — shaping cloud strategy, mentoring senior engineers, and elevating platform resilience across the organization.
Key Responsibilities
- Act as the technical lead on SRE initiatives spanning multiple product areas
- Architect scalable, secure, and automated solutions for client onboarding and live operations
- Lead the design of cross-cutting platform capabilities — observability, CI/CD pipelines, IaC standards, and DR frameworks
- Govern Azure implementation patterns to ensure standardization, reliability, and cost-efficiency
- Define and guide SLOs, SLIs, and other platform reliability metrics across departments
- Lead major incident postmortems, root cause analysis, and reliability retrospectives
- Mentor and coach senior and lead engineers; build communities of practice around SRE principles
- Collaborate with Information Security, Platform Engineering, and Architecture teams on compliance and cloud controls
- Represent the SRE function in executive-level planning, roadmap definition, and technical due diligence
- Advise engineering leads and product owners on cloud architecture trade-offs and risk mitigation
Requirements
Experience & Education
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
- 10+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Architecture
Technical Skills
- Extensive expertise in Microsoft Azure — architecture, deployment, automation, and cost optimization
- Strong knowledge of Windows Server OS and Windows-based desktop application troubleshooting
- Mastery of Infrastructure as Code (IaC) using Terraform, ARM, and Bicep
- Deep familiarity with observability stacks: Azure Monitor, Log Analytics, Grafana, Application Insights
- Broad skillset across Kubernetes, Docker, CI/CD pipelines, SQL, APIs, and scripting
- Experience with enterprise System Operations and Administration of complex system landscapes
- Experience with Citrix, Active Directory, WAC/WSUS is a plus
Leadership & Soft Skills
- Strong foundation in ITIL processes with a strategic mindset for operational excellence
- Proven ability to influence senior stakeholders and align engineering with business priorities
- Background working in regulated, security-conscious environments (e.g., financial services)
- Passion for mentorship, knowledge sharing, and building high-performing engineering culture