About the Role

We are seeking an experienced Senior Infrastructure Engineer to lead the design, deployment, and maintenance of our on-premises Kubernetes-based data center infrastructure. This is a hands-on role encompassing physical hardware, virtualization, and cluster administration. You will ensure high availability, performance, and scalability while enabling secure and efficient workloads for both developers and operations.

Key Responsibilities:

Infrastructure Management (40%)

  • Design and maintain server, storage, and networking infrastructure across 2–4 data center racks
  • Manage rack layout, power distribution, and cooling systems
  • Perform hardware lifecycle management and capacity planning
  • Administer hypervisor environments (e.g., VMware vSphere, Proxmox, KVM)
  • Implement and maintain software-defined networking (SDN) and storage platforms

Kubernetes Cluster Administration (30%)

  • Deploy, monitor, and scale production Kubernetes clusters
  • Implement RBAC, network policies, and secure baseline configurations
  • Manage persistent storage using solutions such as Rook/Ceph or OpenEBS
  • Configure container networking layers (e.g., Cilium, Calico)
  • Ensure high availability, backup, and disaster recovery procedures

Monitoring & Performance Optimization (20%)

  • Set up observability stacks (Prometheus, Grafana, ELK)
  • Define alerting, incident response, and SLA compliance (target 99.9%+ uptime)
  • Conduct regular infrastructure performance tuning and cost/performance analysis
  • Maintain up-to-date infrastructure documentation

Collaboration & Support (10%)

  • Collaborate with development teams to optimize app performance and deployment
  • Participate in on-call rotation and root cause analysis reviews
  • Mentor junior engineers and promote operational best practices

Required Qualifications

Education & Experience

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience
  • 5+ years in infrastructure or systems engineering roles
  • 2+ years managing Kubernetes in production environments
  • Experience with hands-on data center hardware installation and operations

Technical Skills

  • Operating Systems: Strong Linux (Ubuntu, RHEL, Rocky)
  • Virtualization: VMware, KVM, or Proxmox
  • Containers: Docker, containerd, Kubernetes (CKA preferred)
  • Networking: VLANs, firewalls, TCP/IP, BGP, load balancing
  • Storage: SAN, NAS, Ceph, distributed storage systems
  • Scripting/Automation: Python, Bash, Go
  • IaC Tools: Terraform, Ansible
  • Observability: Prometheus, Grafana, ELK

Soft Skills

  • Analytical and methodical troubleshooting skills
  • Clear documentation and communication
  • Ability to independently manage projects and priorities

Preferred Qualifications

  • Kubernetes certifications (CKA, CKS)
  • Experience with bare-metal Kubernetes deployments
  • Familiarity with GitOps (ArgoCD, Flux)
  • Knowledge of service mesh architectures
  • Hardware vendor certifications (Dell, HPE, etc.)

Compensation & Benefits

  • Competitive salary based on experience and skillset
  • Health, dental, and vision insurance
  • Annual training and certification budget
  • Flexible working hours and on-call compensation
  • Hands-on experience with leading open-source tools and infrastructure

Working Conditions

  • On-site presence required for hardware maintenance and installation
  • Participation in on-call rotation (approximately 1 week per month)
  • Occasional after-hours maintenance windows
  • Physical ability to lift and rack equipment (up to 50 lbs)

How to Apply

Submit your resume and a cover letter outlining your experience in Kubernetes, infrastructure, or platform engineering. Highlight any relevant projects, open-source contributions, or certifications.