Want to hear how I work? Hit play.Kablio AI applies for you. You just show up to the interviewKablio AI helps you secure roles in construction, clean energy, facilities management, engineering, architecture, sustainability, environment and other physical world sectors.
Get hired, get rewarded!
Land a job through Kablio and earn a 5% salary bonus.
Exclusive benefits
5%Bonus
Site Reliability Engineer - L1 Commander
Siemens
Siemens focuses on electrification, automation, and digitalization across various industries.
L1 SRE ensures stability, monitoring, incident response, and automation for critical systems.
Accurately categorizing incidents, prioritize them based on severity, and raise to L2/L3 teams when vital.
Collaborating with DevOps and L2 teams to automate manual processes for incident response and operational tasks.
Serving as the primary responder for incidents to tackle and resolve issues quickly, ensuring minimal impact on end-users.
Following predefined runbooks/playbooks to resolve known issues and document fixes for new problems.
Performing root cause analysis (RCA) of incidents using log aggregators and observability tools to identify patterns and recurring issues.
Ensuring systems meet Service Level Objectives (SLOs) and maintain uptime as per SLAs.
Monitoring and Alerting: Proactively supervise system health, performance, and uptime using monitoring tools like Datadog, Prometheus.
What you bring
aws
kubernetes
datadog
argocd
python
sre
Basic understanding of networking concepts (DNS, Load Balancers, Firewalls).
Experienced professional with 4 to 6 years of validated experience in SRE, DevOps, or Production Support with monitoring tools (e.g., Prometheus, Datadog).
Proven understanding of Linux/Unix operating systems and basic scripting skills (Python, Gitlab actions) cloud platforms (AWS, Azure, or GCP).
Exposure with ArgoCD for implementing GitOps workflows and automated deployments for containerized applications.
Strong analytical skills to resolve production incidents efficiently.
Familiarity with container orchestration (Kubernetes, Docker, Helmcharts) and CI/CD pipelines.
Good communication and interpersonal skills for incident communication and issue.
Having preferred certifications: AWS Certified SysOps Administrator – Associate, AWS Certified Solutions Architect – Associate or AWS Certified DevOps Engineer – Professional
Being an SRE L1 Commander, who is responsible for ensuring the stability, availability, and performance of critical systems and services. As the first line of defense in incident management and monitoring, the role requires real-time response, proactive problem solving, and strong coordination skills to address production issues efficiently.
Hey there! Before you dive into all the good stuff on our site, let’s talk cookies—the digital kind. We use these little helpers to give you the best experience we can, remember your preferences, and even suggest things you might love. But don’t worry, we only use them with your permission and handle them with care.