Position Summary
Our client is building a modern, cloud-native platform that powers connected, data-driven manufacturing operations. Their technology sits at the center of increasingly automated factories, integrating equipment, software systems, and real-time production data into a scalable SaaS platform used by global manufacturers.
To support rapid growth and platform scale, they are seeking a Senior Cloud Operations Engineer to own the reliability, performance, and operational excellence of their cloud infrastructure. This is a highly impactful role responsible for ensuring the platform remains highly available, secure, and scalable as adoption continues to grow.
This position is ideal for engineers who thrive in modern cloud environments, enjoy solving complex reliability challenges, and prefer automating everything possible. The right person will combine deep technical expertise with strong operational discipline, helping build a world-class cloud platform supporting real industrial environments.
Key Responsibilities
Cloud Operations & Reliability
Maintain and optimize production, staging, and development environments running in Kubernetes on AWS
Implement and manage monitoring, logging, alerting, and observability frameworks
Lead incident response efforts and drive post-incident reviews focused on continuous improvement
Own backup, disaster recovery, and business continuity processes
Perform system capacity planning and performance tuning
Automation & Infrastructure Management
Build and maintain Infrastructure-as-Code using tools such as Terraform or Pulumi
Automate provisioning, configuration management, and environment lifecycle processes
Identify and eliminate operational inefficiencies through automation
Manage secrets, environment configuration, and version control across infrastructure environments
Security & Compliance
Implement and maintain least-privilege access models and cloud security guardrails
Support vulnerability management, patching workflows, and dependency maintenance
Assist with compliance readiness efforts including SOC 2, ISO 27001, or similar frameworks
Ensure proper logging, retention, and audit practices across cloud environments
FinOps / Cost Optimization
Monitor and optimize cloud spend across services and environments
Implement tagging standards, budget alerts, and cost visibility frameworks
Recommend architectural improvements to balance performance and cost efficiency
Collaboration & Leadership
Partner closely with engineering teams to improve reliability, deployment pipelines, and system architecture
Mentor engineers on operational best practices and cloud platform management
Develop runbooks, documentation, and operational standards
Champion reliability engineering principles, operational maturity, and risk reduction practices
Technical Environment
Candidates should be comfortable working in modern cloud-native environments and familiar with:
Kubernetes clusters, autoscaling, Helm charts, and service mesh concepts
AWS cloud services including compute, networking, storage, and cost management
Infrastructure-as-Code frameworks such as Terraform
Observability platforms such as Datadog, CloudWatch, Prometheus, or New Relic
CI/CD tools such as GitHub Actions, Bitbucket Pipelines, or Bamboo
Linux systems administration and troubleshooting
SRE practices including SLIs, SLOs, MTTR, RTO/RPO, and incident management
If you are a high performer and would like to work for an equally high
for you, we invite you to apply to this job and email your resume to slewis@venteon.com.
We treat all resumes with strict confidentiality. We will always contact you first before submitting your resume to our client(s) for review. If you do not receive correspondence, you are not a fit for this position.
At Venteon, our talent acquisition team is proud to provide our clients with the most qualified Accounting & Finance, Engineering and IT talent in the industry today.
#VPSE
Please take a moment to verify your personal information and resume are up-to-date before you apply.