Job Search Assistant

SR. SITE RELIABILITY ENGINEER II

Smarsh • Pleasanton, CA 94566 • Posted 4 days ago

Boost your interview chances in seconds

Tailored resume, cover letter, and cheat sheet

Hybrid • Full-time • $140,000-$160,000/yr • Senior Level

Job Highlights

Using AI ⚡ to summarize the original job post

As a Senior Site Reliability Engineer II at Smarsh, you will play a crucial role in ensuring the resilience of our Petabyte scale Kubernetes-centric ProArchive application. This position involves coordinating with multiple teams to develop migration plans, implementing best practices for our tech stack, and working closely with various engineering groups to manage on-prem and cloud-native infrastructures. The role requires a passion for automation, CI/CD, and infrastructure components, and offers the opportunity to define technology choices, develop new tools, and act as a subject matter expert.

Responsibilities

Help define technology choices, best practices, and process for the team.
Develop and maintain documentation standards for the team.
Develop new tools and libraries for broader use by SaaS Operations and Engineering teams.
Enable engineering teams to discover and understand problems quicker.
Work with product architects and suggest architectural changes and design platform component roadmaps.
Act as a subject matter expert (SME) for components and functions.
Assist engineering teams in deep troubleshooting and application code review.
Work closely with Engineering and peer SRE teams to design and use Smarsh coding standards and best practices.
Respond to incidents coordinated by SRE and Incident Response teams. Act as a Incident Commander during incidents.
Participate in escalation and off-hours on-call schedule.
Adopt and embrace qualities of an SRE as defined in the team charter. Help set them for the rest of the team.
Mentor and train junior members of the team. Design training curriculum for the team.

Qualifications

Required

Minimum 7+ years industry experience.
BS in CS or equivalent combination of education and experience.
Strong experience operating Kubernetes in production environments – EKS Anywhere is preferred.
Experience with middleware systems (Kafka, AMQ, Redis, Memcache, etcd).
Experience managing CI/CD systems (Flux, Concourse).
Experience deploying and/or operating Observability stack (Splunk, Datadog, Grafana).
Experience with large scale systems.
Familiarity with working with PostgreSQL and MongoDB.
Background working in a multi-platform environment (Linux, Windows).
Familiarity of programming/scripting languages (ie. Python, Bash, PowerShell, Go, etc.).
Familiarity with Agile/Scrum/Kanban methodologies.
Strong interpersonal skills with a can-do attitude and sense of urgency for a high growth/fast paced environment.
Curious mind, wanting to learn new technologies and share with others.
The ability to think outside of the box to resolve issues and create solutions.

Full Job Description

Portland / Pleasanton / Atlanta / New YorkDivisions – Corporate Engineering /Full-Time /HybridSummaryAs a Sr. Site Reliability Engineer II, you are instrumental in helping make our Petabyte scale Kubernetes-centric ProArchive application resilient. This position will coordinate with multiple teams to develop a migration plan for various components and services as well as implement best practices for our tech stack. A person in this position will have a passion for getting things done for various functions, including automation, CI/CD, infra components, middleware, etc. You’ll work closely with our Dev Engineering, QA, and Platform Engineering groups to manage our current on-prem deployments and on-prem & cloud-native infrastructures.How will you contribute?Help define technology choices, best practices and process for the team. Develop and maintain documentation standard for the team.Develop new tools and libraries for broader use by SaaS Operations and Engineering teams. Enable engineering teams to discover and understand problems quicker.Work with product architects and make suggestions for architectural changes and design platform component roadmaps.Act as a subject matter expert (SME) for components and functions desired. Develop the skill as required, to become SME for components in need.Assist engineering teams in deep troubleshooting and application code review to find opportunities to improve performance and scalability.Work closely with Engineering and peer SRE teams to design and use Smarsh coding standards and best practices.Respond to incidents coordinated by SRE and Incident Response teams. Act as a Incident Commander during incidents.Participate in escalation and off-hours on-call schedule.Adopt and embrace qualities of an SRE as defined in the team charter. Help set them for the rest of the team.Mentor and train junior members of the team. Design training curriculum for the team.What will you bring?Minimum 7+ years industry experience.BS in CS or equivalent combination of education and experience.Strong experience operating Kubernetes in production environments – EKS Anywhere is preferredExperience with middleware systems (Kafka, AMQ, Redis, Memcache, etcd)Experience managing CI/CD systems (Flux, Concourse)Experience deploying and/or operating Observability stack (Splunk, Datadog, Grafana)Experience with large scale systemsFamiliarity with working with PostgreSQL and MongoDBBackground working in a multi-platform environment (Linux, Windows)Familiarity of programming/scripting languages (ie. Python, Bash, PowerShell, Go, etc.)Familiarity with Agile/Scrum/Kanban methodologiesStrong interpersonal skills with a can-do attitude and sense of urgency for a high growth/fast paced environmentCurious mind, wanting to learn new technologies and share with others.The ability to think outside of the box to resolve issues and create solutions$140,000 - $160,000 a yearThe above salary range represents Smarsh's good faith and reasonable estimate of the range of possible base compensation at the time of posting. Any applicable bonus programs will be discussed during the recruiting process. The salary for this role will be set based on a variety of factors, including but not limited to, internal equity, experience, education, location, specialty and training. Local cost of living assessments are done for each new hire at the time of offer.

Search for other jobs like this one:

Senior Site Reliability Engineer II jobs in Pleasanton, CA

Search for popular related roles:

Search nearby locations hiring for this role:

Use AI to boost your interview chances for free ✨

We'll create you a professional resume
We'll customize your resume for this job to maximize your interview odds
If you get the interview, we'll provide free interview support to help you get the job

"I got a job after 9 months of unemployment...I am so grateful for EarnBetter!"

-Dawn