About the Role
We are seeking an experienced Senior Infrastructure Engineer to lead the design, deployment, and maintenance of our on-premises Kubernetes-based data center infrastructure. This is a hands-on role encompassing physical hardware, virtualization, and cluster administration. You will ensure high availability, performance, and scalability while enabling secure and efficient workloads for both developers and operations.
Key Responsibilities:
Infrastructure Management (40%)
- Design and maintain server, storage, and networking infrastructure across 2–4 data center racks
- Manage rack layout, power distribution, and cooling systems
- Perform hardware lifecycle management and capacity planning
- Administer hypervisor environments (e.g., VMware vSphere, Proxmox, KVM)
- Implement and maintain software-defined networking (SDN) and storage platforms
Kubernetes Cluster Administration (30%)
- Deploy, monitor, and scale production Kubernetes clusters
- Implement RBAC, network policies, and secure baseline configurations
- Manage persistent storage using solutions such as Rook/Ceph or OpenEBS
- Configure container networking layers (e.g., Cilium, Calico)
- Ensure high availability, backup, and disaster recovery procedures
Monitoring & Performance Optimization (20%)
- Set up observability stacks (Prometheus, Grafana, ELK)
- Define alerting, incident response, and SLA compliance (target 99.9%+ uptime)
- Conduct regular infrastructure performance tuning and cost/performance analysis
- Maintain up-to-date infrastructure documentation
Collaboration & Support (10%)
- Collaborate with development teams to optimize app performance and deployment
- Participate in on-call rotation and root cause analysis reviews
- Mentor junior engineers and promote operational best practices
Required Qualifications
Education & Experience
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience
- 5+ years in infrastructure or systems engineering roles
- 2+ years managing Kubernetes in production environments
- Experience with hands-on data center hardware installation and operations
Technical Skills
- Operating Systems: Strong Linux (Ubuntu, RHEL, Rocky)
- Virtualization: VMware, KVM, or Proxmox
- Containers: Docker, containerd, Kubernetes (CKA preferred)
- Networking: VLANs, firewalls, TCP/IP, BGP, load balancing
- Storage: SAN, NAS, Ceph, distributed storage systems
- Scripting/Automation: Python, Bash, Go
- IaC Tools: Terraform, Ansible
- Observability: Prometheus, Grafana, ELK
Soft Skills
- Analytical and methodical troubleshooting skills
- Clear documentation and communication
- Ability to independently manage projects and priorities
Preferred Qualifications
- Kubernetes certifications (CKA, CKS)
- Experience with bare-metal Kubernetes deployments
- Familiarity with GitOps (ArgoCD, Flux)
- Knowledge of service mesh architectures
- Hardware vendor certifications (Dell, HPE, etc.)
Compensation & Benefits
- Competitive salary based on experience and skillset
- Health, dental, and vision insurance
- Annual training and certification budget
- Flexible working hours and on-call compensation
- Hands-on experience with leading open-source tools and infrastructure
Working Conditions
- On-site presence required for hardware maintenance and installation
- Participation in on-call rotation (approximately 1 week per month)
- Occasional after-hours maintenance windows
- Physical ability to lift and rack equipment (up to 50 lbs)
How to Apply
Submit your resume and a cover letter outlining your experience in Kubernetes, infrastructure, or platform engineering. Highlight any relevant projects, open-source contributions, or certifications.