WHAT YOU WILL BE RESPONSIBLE FOR
Act as the technical lead on SRE initiatives across multiple Product Areas
Drive forward our strategic use of Microsoft Azure in onboarding and site reliability disciplines
Architect scalable, secure, and automated solutions for client onboarding and live operations
Lead the design and evolution of cross-cutting platform capabilities (e.g., observability, CI/CD
pipelines, IaC standards, DR frameworks)
Shape and govern Azure implementation patterns to ensure platform standardization, reliability, and
cost-efficiency
Solve the most complex and business-critical reliability challenges involving distributed cloud
systems
Advise engineering leads and product owners on cloud platform decisions, including trade-offs and
risk mitigation
Collaborate with Information Security, Platform Engineering, and Architecture teams on compliance
and cloud controls
Guide the definition of SLOs, SLIs, and other reliability metrics across departments
Lead root cause analysis, major incident postmortems, and reliability retrospectives across teams
Provide thought leadership, mentoring, and coaching to senior and lead engineers
Build communities of practice to strengthen SRE principles and knowledge sharing within the
organization
Represent the SRE function in executive-level planning, roadmap definition, and technical due
diligence
You share our values: Caring, Customer Success Driven, Collaboration, Curious, Courageous
WHAT WE VALUE
Bachelor’s or master’s degree in computer science, Engineering, or a related field
10+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Architecture
roles
Extensive expertise in Microsoft Azure, including architecture, deployment, automation, and cost
optimization
Strong grasp of cloud-native and hybrid architectures, distributed systems, networking, and security
Mastery in Infrastructure as Code (IaC) using Terraform, ARM, Bicep, and related tooling
Deep knowledge of observability stacks (Azure Monitor, Log Analytics, Grafana, Application Insights)
Experience leading complex incident and problem management efforts at scale
Broad technical skillset including Kubernetes, Docker, CI/CD pipelines, SQL, APIs, and scripting
Strong foundation in ITIL processes with a strategic mindset for operational excellence
Proven ability to influence senior stakeholders, lead through ambiguity, and align engineering with
business needs
Experience working in or guiding teams within regulated, security-conscious environments (e.g.,
financial services)
Demonstrated passion for mentorship, knowledge sharing, and building engineering culture
Ability to think strategically while delivering pragmatic, hands-on solutions