Infrastructure Management Engineer
Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.1
Reclaim your time by letting our AI handle the grunt work of job searching.
We continuously scan millions of openings to find your top matches.

Job Description
DevOps & Infrastructure Management
At TensorWave, we’re leading the charge in AI compute, building a versatile cloud platform that’s driving the next generation of AI innovation. We’re focused on creating a foundation that empowers cutting-edge advancements in intelligent computing, pushing the boundaries of what’s possible in the AI landscape.
About the Role
We are seeking a highly skilled DevOps & Infrastructure Management Engineer to join our growing infrastructure team. This role is ideal for someone who thrives in hardware-centric environments, enjoys hands-on datacenter and system administration work, and can build reliable automation around large-scale infrastructure. You will be responsible for managing enterprise hardware, monitoring systems, network operations, infrastructure automation, and supporting our compute clusters across multiple data centers.
This role touches every layer of modern infrastructure—from bare metal provisioning, to OS and Kubernetes management, to monitoring and troubleshooting hardware. If you are detail-oriented, resourceful, and comfortable working with both low-level hardware systems and higher-level DevOps tooling, we’d love to talk.
Key Responsibilities:
Hardware & Infrastructure Management
Manage and maintain enterprise-grade server hardware and infrastructure components
Utilize out-of-band management systems (iLO, iDRAC, IPMI, Redfish, etc.) for remote operations
Use automated hardware management tools (BMC/Redfish-based) to streamline provisioning and maintenance
Perform hardware diagnostics and troubleshooting (CPU, memory, disks, PSUs, NICs, etc.).
Handle vendor interactions, including RMAs, part replacements, and inventory tracking
Oversee datacenter hardware operations, including racking, cabling, PDU installation, and physical layout
Datacenter & DCIM
Use Data Center Infrastructure Management (DCIM) tools for inventory, capacity planning, and environmental tracking
Manage power delivery and consumption across racks and nodes
Configure and monitor managed PDU systems for power cycling, monitoring, and alerts
Collaborate with colocation providers on connectivity, power, security, and maintenance tasks
Monitoring & Observability
Build and maintain infrastructure monitoring and alerting using tools such as Prometheus/Grafana, SNMP, Nagios, CheckMK, or similar platforms
Implement automated alerting for hardware health, network status, power issues, and service-level metrics
Create dashboards to give internal teams visibility into system performance and reliability
Network Operations
Manage and configure firewalls, routing, and network segmentation
Configure and troubleshoot VPN technologies (IPsec, OpenVPN, WireGuard)
Oversee subnetting, IP address allocation, and network architecture planning
Configure managed switches, VLANs, port settings, and trunking
Manage NAT, port forwarding, and related gateway/edge network configurations
System Administration (Linux)
Install, configure, and manage Linux servers (Ubuntu/Debian preferred)
Perform system-level troubleshooting (boot issues, login problems, service failures)
Manage networking configuration (static IPs, DHCP)
Configure and maintain filesystems: partitioning, MD RAID, ext4/XFS, LVM, resizing/growing volumes
Implement secure access using public key authentication and proper SSH hardening
Manage certificates for internal systems, including issuance, revocation, HTTPS installation, and rotation
Handle basic BIOS configuration relevant to bare metal provisioning or system bring-up
Bare Metal Provisioning
Deploy and manage hardware provisioning tools such as MAAS, Foreman, or similar systems
Configure and troubleshoot network boot mechanisms (PXE, UEFI Boot, HTTP Boot)
Automate provisioning pipelines to rapidly bring new nodes online
Containerization & Orchestration
Work with Kubernetes clusters at a foundational level (cluster access, basic resource troubleshooting)
Deploy workloads using Helm charts and maintain cluster application lifecycle
Assist with cluster scaling, node replacements, and security hardening
Automation & Scripting
Write shell scripts (bash) for automation of system tasks, monitoring, or provisioning.
Use CLI tooling such as jq, sed, awk, grep, and rsync
Optionally automate workflows using languages like Python, Go, PHP, or Perl
Required Qualifications
Proven experience managing enterprise-grade hardware at scale
Strong understanding of out-of-band management systems (IPMI/BMC/Redfish)
Hands-on expertise with monitoring systems (Prometheus, Grafana, SNMP, Nagios, CheckMK, or similar)
Solid knowledge of network administration, including firewalls, routing, VPNs, NAT, and managed switches
Linux system administration experience (installation, configuration, troubleshooting)
Experience with filesystems, RAID, partitioning, and general storage management
Familiarity with certificate management, key-based auth, and basic cryptographic functions
Experience with bare metal provisioning (MAAS, Foreman, or similar)
Understanding of PXE/UEFI/HTTP boot systems
Ability to write functional, maintainable bash scripts for automation
Nice to Have
Experience with Kubernetes beyond the basics (operators, cluster scaling, CRDs)
Experience with Helm chart customizationFamiliarity with automation languages such as Python, Go, PHP, or Perl
Previous datacenter operations or colocation management experience
Exposure to high-availability or distributed compute environmentsKnowledge of infrastructure security and hardening practices
What We Bring
Stock Options
100% paid Medical, Dental, and Vision insurance
Life and Voluntary Supplemental Insurance
Short Term Disability Insurance
Flexible Spending Account
401(k)
Flexible PTO
Paid Holidays
Parental Leave
Mental Health Benefits through Spring Health
Automate your job search with Sonara.
Submit 10x as many applications with less effort than one manual application.
