Earnbetter

Job Search Assistant

Data Center Operations Engineer

AI company • Stockton, CA 95202 • Posted 1 day ago via LinkedIn

Boost your interview chances in seconds

Tailored resume, cover letter, and cheat sheet

In-person • Full-time • Senior Level

Job Highlights

Using AI ⚡ to summarize the original job post

We are seeking a highly skilled Data Center Operations Engineer to join our production operations team. The role involves maintaining the operational integrity of production data centers, coordinating with various departments, and ensuring seamless integration and optimal performance of systems. This position requires a deep understanding of data center operations, strong technical acumen, and the ability to manage critical infrastructure, troubleshoot issues, and proactively identify areas for improvement.

Responsibilities

  • Actively engage in all aspects of the data center lifecycle, including design, build, secure, operate, improve, and maintain processes.
  • Coordinate data center builds, expansions, and modifications with internal teams and external partners.
  • Lead root-cause analysis and postmortem reviews to identify underlying issues and drive long-term operational improvements.
  • Analyze process gaps and implement automation solutions to accelerate execution and minimize manual intervention.
  • Participate in a 24/7 on-call rotation, providing critical support during off-hours and responding to emergencies.
  • Oversee the tracking of spare parts inventory and manage data center capacity planning.
  • Ensure strict adherence to company policies and procedures, maintaining compliance with industry standards and regulatory requirements.
  • Investigate and resolve technical issues, analyzing data to identify trends and systemic problems.
  • Contribute to the development and expansion of the global data center knowledge base.
  • Lead teams in deploying new data center infrastructure to support organizational growth.

Qualifications

Required

  • 6+ years of experience in operating technical production environments.
  • 6+ years of direct experience in data center operations.
  • Expertise in managing tasks and priorities through a ticketing system.
  • Proven experience in managing complex projects.
  • Skilled in hardware troubleshooting, component replacement, and cabling.
  • Solid understanding of storage devices, Linux, and networking concepts.
  • Ability to lift up to 75 lbs and understand electrical, mechanical, and HVAC systems.
  • Willingness to travel both domestically and internationally as needed.
  • Capable of monitoring repair costs, technician efficiency, and operational metrics.
  • Experience in technical writing and documentation.
  • Experience in managing parts inventory and stock levels.

Preferred

  • Network+, Server+, Linux+, A+, and CDCTP certifications.
  • Basic scripting skills in at least one programming language (e.g., Bash, Perl, Python).

Full Job Description

Note: 100% onsite in Stockton, CA. Requires some travel to the Santa Clara location (30%)


Job Description: Data Center Operations Engineer

Position Overview: We are seeking highly skilled professionals to join our expanding production operations team, where you'll play a crucial role in maintaining the operational integrity of our production data centers. The successful candidate will work closely with internal operations teams and coordinate with various departments across the organization to ensure seamless integration and optimal performance of our systems.

This position demands a deep understanding of data center operations and a strong technical acumen. You will be responsible for executing complex technical instructions with precision, ensuring that all processes adhere to best practices and industry standards. Your role will encompass the management of critical infrastructure, requiring meticulous attention to detail, as well as the ability to troubleshoot and resolve issues efficiently in a high-pressure environment.


Moreover, you will be expected to proactively identify areas for improvement, contribute to the development of operational procedures, and collaborate with cross-functional teams to implement solutions that enhance system reliability and performance. This role is ideal for a technically proficient individual who thrives in a dynamic environment and is committed to maintaining the highest levels of operational excellence within a production data center setting.Key Responsibilities:

  • Data Center Domain Management: Actively engage in all aspects of the data center lifecycle, including design, build, secure, operate, improve, and maintain processes, ensuring optimal performance and scalability.
  • Project Coordination: Coordinate data center builds, expansions, and modi?cations with internal teams and external partners, ensuring seamless integration and adherence to project timelines and specifications.
  • Incident Management: Lead root-cause analysis and postmortem reviews to identify underlying issues and drive long-term operational improvements, reducing the likelihood of recurrence.
  • Process Optimization: Analyze process gaps and implement automation solutions to accelerate execution and minimize manual intervention, thereby improving efficiency and reducing operational toil.
  • 24/7 On-Call Support: Participate in a 24/7 on-call rotation, providing critical support during off-hours and responding to emergencies to maintain continuous data center operations.
  • Inventory and Capacity Management: Oversee the tracking of spare parts inventory, ensuring availability and readiness for all hardware components. Manage data center capacity planning, including space, power, and cooling, to optimize resource utilization.
  • Compliance and Policy Adherence: Ensure strict adherence to company policies and procedures, maintaining compliance with industry standards and regulatory requirements.
  • Issue Resolution and Trend Analysis: Investigate and resolve technical issues, while analyzing data to identify trends and systemic problems, providing actionable insights for ongoing improvements.
  • Knowledge Base Contribution: Contribute to the development and expansion of the global data center knowledge base, and lead teams in deploying new data center infrastructure to support organizational growth.

Skills and Qualifications:

  • Technical Expertise: 6+ years of experience in operating technical production environments, with extensive hands-on knowledge of data centers and their critical systems. This will include device management such as: Server, Switch, Router, Storage, and other hardware that supports infrastructure services.
  • Data Center Experience: 6+ years of direct experience in data center operations, with hands-on expertise in supporting multi-vendor hardware platforms (Dell, HP, Supermicro, Cisco, Arista, Juniper)
  • Collaboration and Communication: Demonstrated strength in teamwork, communication, and strategic planning, with the ability to effectively collaborate with cross-functional teams.
  • Time Management: Exceptional time management skills, capable of prioritizing and balancing dynamic workloads in complex, fast-paced environments.
  • Customer Focus: Unwavering commitment to customer success, ensuring that all operational activities align with client needs and expectations of 100% Site-Up is our goal
  • Ticketing System Pro?ciency: Expertise in managing tasks and priorities through a ticketing system, consistently meeting or exceeding SLA targets. (Jira preferred)
  • Project Management: Proven experience in managing complex projects, from conception through completion, ensuring alignment with business objectives and timelines.
  • Hardware and Infrastructure Knowledge: Skilled in hardware troubleshooting, component replacement, power distribution units, CDU’s, racking, stacking, and cabling. Solid understanding of storage devices, Linux, and networking concepts.
  • Physical and Facility Technical Requirements: Ability to lid up to 75 lbs, with a strong understanding of electrical, mechanical, and HVAC systems, essential for maintaining data center infrastructure.
  • Global Operations: Willingness to travel both domestically and internationally as needed, with experience in managing physical site locations to ensure operational readiness.
  • Continuous Improvement: Capable of monitoring repair costs, technician efficiency, and operational metrics, providing data-driven recommendations for continuous improvement.
  • Capacity and Layout Planning: Assist with co-location capacity planning and growth initiatives, including the design and optimization of rack and server layouts.
  • Dynamic and Adaptable: You thrive in dynamic, fast-paced environments, and have a proven track record of supporting and executing complex projects in high-stakes settings.
  • Technical Writing and Documentation: Experience in technical writing, with the ability to generate clear, concise documentation and standard operating procedures.
  • Inventory Management: Experience in managing parts inventory and stock levels is highly desirable, contributing to the efficiency of data center operations.
  • Certifications and Technical Skills: Preferred certifications include Network+, Server+, Linux+, A+, and CDCTP. Basic scripting skills in at least one programming language (e.g., Bash, Perl, Python) are an added advantage.