Senior Site Reliability Engineer at Once For All in Basingstoke, England, United Kingdom | Kablio

Role

Description

backup recovery

infrastructure code

observability

capacity planning

incident response

slo management

Join our Reliability and Platform group partnering with 10 Agile SCRUM teams to scale and harden a suite of microservices on Microsoft Azure. You will own production reliability for tier-1 services, set and track SLOs, automate operations, and lead incident response to keep our next-generation Supplier Risk Assessment platform fast, secure, and available. This role is fully remote role.

Establish backup, disaster recovery, and tested restore procedures with clear RPO and RTO.
Secure the stack with Managed Identity, Key Vault, workload identity, and network segmentation.
Build infrastructure as code with Terraform or Bicep. Enforce policy as code.
Mentor engineers and raise reliability standards across product teams.
Design safe releases with progressive delivery, automated rollbacks, and feature flags.
Plan capacity, tune performance, and optimize cost without impacting reliability.
Lead on-call rotations, incident response, postmortems, and corrective actions.
Compliance and audit trail design.
Practical exercise: propose SLOs and an alert plan for a sample service, plus a release safety plan.
Implement end-to-end observability: metrics, logs, traces, dashboards, alerts.
Define SLOs, SLIs, and error budgets for critical services.

Requirements

aks

terraform

python

opentelemetry

sre

azure

Tech Stack You Will Use

Azure, AKS, Terraform or Bicep, Azure DevOps or GitHub Actions, Docker, Helm, Service Bus, Storage, SQL Server, Cosmos DB, Key Vault, Azure Monitor, Log Analytics, Application Insights, Prometheus, Grafana, OpenTelemetry, Feature flagging tools.

Experience with performance testing, p95 and p99 tuning, caching and connection pool strategies.
Solid understanding of security hardening, container image scanning, SBOM, and least privilege.
Proven incident command experience with measurable MTTR and MTTD improvements.
6+ years operating Kubernetes in production, including at least 3 years on AKS (network policies, PodDisruptionBudgets, HPA/VPA, node pools, upgrade playbooks).
Culture and collaboration interview with Engineering.
10+ years in SRE, platform, or production-facing engineering roles running large-scale systems.
Service mesh, eBPF, or advanced traffic shaping.
Architect resilient multi-region and zone-aware workloads on Azure and AKS.
FinOps practice with cost per request or per tenant KPIs.
5+ years designing observability and SLO-based alerting using OpenTelemetry and Kusto queries.
Multi-tenant SaaS and data sovereignty patterns.
7+ years hands-on with Microsoft Azure: AKS, Front Door or Application Gateway, VNets, Private Link, Key Vault, Monitor, Log Analytics, Application Insights, Service Bus, Storage, SQL or Cosmos DB.
4+ years running canary or blue-green deployments in Azure DevOps or GitHub Actions.
Technical deep dive on Azure and AKS architecture.
5+ years infrastructure as code with Terraform or Bicep and Git-based workflows.
Strong automation skills in Python or Go, plus Bash and PowerShell.

Benefits

Financial Benefits: Pension, Life Assurance (3x salary).
Everyday Perks: Home office budget, high-spec laptop and peripherals.
Work Setup: Fully remote within UK time zones, optional access to our Basingstoke office.
Time Off: 25 days holiday + 8 bank holidays, holiday purchase scheme (+5 days), paid and unpaid volunteering days.
Health and Wellbeing: Private Medical Insurance or wellness fund, 24/7 Employee Assistance Programme.
Growth and Development: Ongoing CPD, team offsites, and company events.

Training + Development

Information not given or found

Interview process

intro and role overview with talent.
technical deep dive on azure and aks architecture.
practical exercise: propose slos, alert plan, and release safety plan.
culture and collaboration interview with engineering.

Visa Sponsorship

Information not given or found

Security clearance

Information not given or found

Company

Overview

Born from a merger of specialist UK and French compliance firms, Once For All unified under a single platform headquartered in Paris.
Backed by private equity firms GTCR and previously Warburg Pincus, the company has grown through strategic acquisitions.
It delivers a SaaS network matching contractors with vetted suppliers, leveraging AI to streamline tendering, compliance and ESG risk.
Typical projects involve large main contractors in Europe needing pre-qualified subcontractors for construction, facilities or utilities work.
Its platform supports complex workflows — from health & safety checks to legal document validation — across multiple national schemes.
Once For All stands out for managing one of the largest proprietary ESG and compliance datasets in the built environment.
Its reach spans the UK, France, Belgium, Germany and Italy, with pan-European capabilities in energy, building and property sectors.
A standout fact: it powers specialist schemes like Constructionline, Actradis, BidWork and RISQS under one umbrella.

Culture + Values

Environment + Sustainability

0% Plastic Packaging

Sustainability Achievement

Awards were won for redesigning packaging to eliminate plastic while maintaining product protection.

2,500 Tonnes

Recycled Materials

Recycled plastics were processed, contributing to a circular economy.

2.1 Million

Refurbished Units

Remote control units were refurbished, reducing electronic waste.

Commit to continuously review and improve environmental footprint of products, production, and supply chain.
Eliminate single-use plastics by using 100% recyclable paper and pulp trays across most product packaging with minimal protective film.

Inclusion & Diversity

Articles

Contact

Our Network / What We Provide

Building & Facilities Management

Demonstrate Compliance

Role

Description

backup recovery

infrastructure code

observability

capacity planning

incident response

slo management

Establish backup, disaster recovery, and tested restore procedures with clear RPO and RTO.
Secure the stack with Managed Identity, Key Vault, workload identity, and network segmentation.
Build infrastructure as code with Terraform or Bicep. Enforce policy as code.
Mentor engineers and raise reliability standards across product teams.
Design safe releases with progressive delivery, automated rollbacks, and feature flags.
Plan capacity, tune performance, and optimize cost without impacting reliability.
Lead on-call rotations, incident response, postmortems, and corrective actions.
Compliance and audit trail design.
Practical exercise: propose SLOs and an alert plan for a sample service, plus a release safety plan.
Implement end-to-end observability: metrics, logs, traces, dashboards, alerts.
Define SLOs, SLIs, and error budgets for critical services.

Requirements

aks

terraform

python

opentelemetry

sre

azure

Tech Stack You Will Use

Experience with performance testing, p95 and p99 tuning, caching and connection pool strategies.
Solid understanding of security hardening, container image scanning, SBOM, and least privilege.
Proven incident command experience with measurable MTTR and MTTD improvements.
6+ years operating Kubernetes in production, including at least 3 years on AKS (network policies, PodDisruptionBudgets, HPA/VPA, node pools, upgrade playbooks).
Culture and collaboration interview with Engineering.
10+ years in SRE, platform, or production-facing engineering roles running large-scale systems.
Service mesh, eBPF, or advanced traffic shaping.
Architect resilient multi-region and zone-aware workloads on Azure and AKS.
FinOps practice with cost per request or per tenant KPIs.
5+ years designing observability and SLO-based alerting using OpenTelemetry and Kusto queries.
Multi-tenant SaaS and data sovereignty patterns.
7+ years hands-on with Microsoft Azure: AKS, Front Door or Application Gateway, VNets, Private Link, Key Vault, Monitor, Log Analytics, Application Insights, Service Bus, Storage, SQL or Cosmos DB.
4+ years running canary or blue-green deployments in Azure DevOps or GitHub Actions.
Technical deep dive on Azure and AKS architecture.
5+ years infrastructure as code with Terraform or Bicep and Git-based workflows.
Strong automation skills in Python or Go, plus Bash and PowerShell.

Benefits

Financial Benefits: Pension, Life Assurance (3x salary).
Everyday Perks: Home office budget, high-spec laptop and peripherals.
Work Setup: Fully remote within UK time zones, optional access to our Basingstoke office.
Time Off: 25 days holiday + 8 bank holidays, holiday purchase scheme (+5 days), paid and unpaid volunteering days.
Health and Wellbeing: Private Medical Insurance or wellness fund, 24/7 Employee Assistance Programme.
Growth and Development: Ongoing CPD, team offsites, and company events.

Training + Development

Information not given or found

Interview process

intro and role overview with talent.
technical deep dive on azure and aks architecture.
practical exercise: propose slos, alert plan, and release safety plan.
culture and collaboration interview with engineering.

Visa Sponsorship

Information not given or found

Security clearance

Information not given or found