
Senior Site Reliability Engineer
Greenlite
The Role
Overview
Build and own reliability, observability, and scalability of GreenLite’s AWS services.
Key Responsibilities
- ci/cd
- observability
- reliability engineering
- aws migration
- infrastructure design
- security compliance
Tasks
-Continuously improve: identify systemic bottlenecks, build tooling that eliminates toil and scale our platform without scaling pager fatigue. -60 days – Roll out error‑budget policies, automated canary deploys and service‑level telemetry across all micro‑services. Complete migration off of AWS CoPilot. Plan migration from RDS Postgres to Aurora Postgres, including metrics. Establish production infrastructure for AI engineering. -Own CI/CD: advance our GitHub Actions pipeline, introduce progressive delivery and automated rollbacks to steadily maintain & improve deployment frequency and lead time for changes. -Design & harden production infrastructure AWS ECS/Fargate via AWS Copilot (migrating to Terraform), RDS/Postgres, S3, EventBridge, Bedrock. -180 days – Mentor two mid‑level engineers into effective first responders and established infrastructure for ML products. Train team on disaster recovery plan, and do a dry run of restoration from backups. -Leadership in defining hiring/on‑call processes at a high‑growth startup. -Lead reliability engineering: SLO/SLA definition, error‑budget policies, capacity planning and load testing ahead of major launches. -Coach & collaborate: mentor engineers on SRE best practices, work closely with ML and product squads, and influence architecture decisions through strong opinions loosely held. -30 days – Stand up staging/production dashboards, own the on‑call rotation and deliver a gap‑analysis of our reliability posture. Take ownership of our migration into AWS Control Tower, and contribute to architecture for hosting our production applications, including AI engineering. -Instrument & Observe: deploy metrics, tracing and logging (Datadog) and drive an on‑call culture focused on MTTR and learning reviews, not blame. -Security & compliance: partner with the engineers to automate patching, secrets management & rotation, least‑privilege IAM and SOC 2 controls.
Requirements
- aws
- terraform
- observability
- python
- slos
- chaos engineering
What You Bring
-6+ yrs building and operating production systems in AWS, GCP or Azure (AWS preferred). -Experience with infrastructure for ML workflows (model training, feature stores). -On-site: Practical systems‑design exercise (real scenarios we face) -Public track record (blog posts, OSS) advancing the SRE discipline. -Deep familiarity with observability stacks (Datadog, Grafana, Prometheus, OTEL). -Engineering velocity accelerates because infrastructure just works and developers ship confidently. -Proficient with at least one modern language (Python, Rust, Go) and strong bash skills. -Certification: AWS Solutions Architect or DevOps Pro. -Demonstrated ownership of SLOs, incident response and post‑incident analysis. -Experience introducing chaos engineering or game‑days. -Expert in IaC (Terraform, CDK, Pulumi) and container orchestration (ECS, EKS or K8s). -99.95 % customer‑visible uptime with clearly defined SLAs. -Prior work in construction‑tech, gov‑tech or other regulated domains. -Track record of raising the bar for security, compliance and cost optimisation.
People Also Searched For
Operations Administrator jobs in Union City , New Jersey , US
Operations Manager jobs in Union City , New Jersey , US
Administrative Assistant jobs in Union City , New Jersey , US
Operations Administrator jobs in New Jersey , US
Operations Manager jobs in New Jersey , US
Administrative Assistant jobs in New Jersey , US
Operations Administrator jobs in Union City , US
Operations Manager jobs in Union City , US
Administrative Assistant jobs in Union City , US
Benefits
-Hybrid Work Environment – Our team thrives on collaboration, so we’re in the office 4 days per week. In the summer, from Memorial Day to Labor Day, we switch to a 3-day in-office schedule to give everyone extra flexibility. -Parental Leave - Generous parental leave for all parents to support your growing family. -Premium Health Coverage - Comprehensive medical, dental, and vision insurance for full-time team members: 100% of premiums covered under our HDHP plan & 98% coverage for employees and their spouses. -Company-Wide Team All Hands - Held twice a year, fostering transparency, alignment, and inspiration. -Competitive Compensation - Generous base salary & access to our Employee Equity Program, so you can grow with us. -Weekly Team Lunches - Enjoy catered lunches every week in our NYC office. Great food, better company. -401(k) Retirement Plan - Helping you invest in your future with smart saving options. -Team-Building Events - Regular opportunities to connect, collaborate, and celebrate as a team. -Performance-Based Annual Bonuses - Rewards for high-impact results and contributions that move the needle. -Wellness Support - Monthly Wellness Stipend and full access to Wellhub, Talkspace, & Teladoc for your physical and mental well-being. -Unlimited PTO - Flexible time off so you can recharge, travel, or take care of life as needed.
The Company
About Greenlite
-Combining proprietary software with expert multidisciplinary review teams, it carves weeks or months off project timelines. -Typical projects include private plan review for public retailers, production home builders, EV-charging operators, and quick-service restaurant brands. -Headquartered in Austin, TX and New York City, it partners with forward-thinking municipalities across the U.S. -Its standout Private Plan Review ('X-Lane') service blends tech automation with human oversight for faster, transparent permitting. -By acting as the connective tissue between builders, developers, and city agencies, it removes black-box inefficiencies in pre-construction.
Sector Specialisms
Residential
Small Business
Large Business
Energy Efficiency
Green Vehicles
Home Automation
Sustainable Water Products
QSRs (Quick Service Restaurants)
Retailers
Healthcare Facilities
Schools
Banks
EV Infrastructure
Ground-up Construction
Tenant Improvements
Renovations
Multi-site Rollouts
Interview Process
-intro with talent partner -values & architecture deep‑dive with head of engineering -on-site: practical systems‑design exercise (real scenarios we face) -on-site: on‑call simulation & retrospective with two engineers -on-site: cross functional panel interview -final exec conversation and offer discussion
