WHAT YOU WILL BE RESPONSIBLE FOR
 Act as the technical lead on SRE initiatives across multiple Product Areas
 Drive forward our strategic use of Microsoft Azure in onboarding and site reliability disciplines
 Architect scalable, secure, and automated solutions for client onboarding and live operations
 Lead the design and evolution of cross-cutting platform capabilities (e.g., observability, CI/CD
pipelines, IaC standards, DR frameworks)

 Shape and govern Azure implementation patterns to ensure platform standardization, reliability, and
cost-efficiency
 Solve the most complex and business-critical reliability challenges involving distributed cloud
systems
 Advise engineering leads and product owners on cloud platform decisions, including trade-offs and
risk mitigation
 Collaborate with Information Security, Platform Engineering, and Architecture teams on compliance
and cloud controls
 Guide the definition of SLOs, SLIs, and other reliability metrics across departments
 Lead root cause analysis, major incident postmortems, and reliability retrospectives across teams
 Provide thought leadership, mentoring, and coaching to senior and lead engineers
 Build communities of practice to strengthen SRE principles and knowledge sharing within the
organization
 Represent the SRE function in executive-level planning, roadmap definition, and technical due
diligence
 You share our values: Caring, Customer Success Driven, Collaboration, Curious, Courageous

WHAT WE VALUE
 Bachelor’s or master’s degree in computer science, Engineering, or a related field
 10+ years of experience in Site Reliability Engineering, Cloud Infrastructure, or Platform Architecture
roles
 Extensive expertise in Microsoft Azure, including architecture, deployment, automation, and cost
optimization
 Strong grasp of cloud-native and hybrid architectures, distributed systems, networking, and security
 Mastery in Infrastructure as Code (IaC) using Terraform, ARM, Bicep, and related tooling
 Deep knowledge of observability stacks (Azure Monitor, Log Analytics, Grafana, Application Insights)
 Experience leading complex incident and problem management efforts at scale
 Broad technical skillset including Kubernetes, Docker, CI/CD pipelines, SQL, APIs, and scripting

 Strong foundation in ITIL processes with a strategic mindset for operational excellence
 Proven ability to influence senior stakeholders, lead through ambiguity, and align engineering with
business needs
 Experience working in or guiding teams within regulated, security-conscious environments (e.g.,
financial services)
 Demonstrated passion for mentorship, knowledge sharing, and building engineering culture
 Ability to think strategically while delivering pragmatic, hands-on solutions

Have Any Questions?

Principal Site Reliability Engineer

Apply for this position

Services

Industries Served

Company

Contact Us