Lead Site Reliability Engineer - Canada, Remote (CA)
About This Gig
Our Team The Incident Response Team sits at the intersection of every customer-impacting failure across Vista. We coordinate response to critical incidents, drive learning out of every event, and partner with engineering teams across the organisation to make Vista more reliable. Today the team is strong on incident handling, and we are deliberately raising the engineering bar to match. What You Will Do Identify patterns of failure across the organisation. Analyse incidents and post-incident reviews to find the recurring technical root causes behind customer impact, rather than treating each incident as a one-off. Prioritise the biggest improvement levers. Focus reliability effort where it most reduces Mean Time to Detect and Mean Time to Resolve, and where it proactively prevents the next incident from happening at all. Turn those patterns into the right engineering intervention and influence the teams who can build it. This includes safe deployment defaults, secret and credential rot
Skills & Tags
About the Seller
Cimpress/Vista
on Himalayas