Earnbetter

Job Search Assistant

Member of Technical Staff, Pretraining Software Engineer

Acceler8 Talent • Palo Alto, CA 94306 • Posted 4 days ago via LinkedIn

Boost your interview chances in seconds

Tailored resume, cover letter, and cheat sheet

In-person • Full-time • Mid Level

Job Highlights

Using AI ⚡ to summarize the original job post

We are seeking a Member of Technical Staff, Pretraining Software Engineer, to contribute to the development of AI models through effective data pretraining techniques. This role involves working on the collection and preparation of data essential for training state-of-the-art AI models. The position offers the opportunity to design and implement web crawling and data processing systems, ensuring the effectiveness of pretraining efforts.

Responsibilities

  • Design and implement web crawling systems to collect large-scale datasets.
  • Develop data processing pipelines to clean, organize, and annotate collected data.
  • Ensure the quality and relevance of data for pretraining AI models.
  • Work with various data sources and formats to gather diverse and comprehensive datasets.
  • Collaborate with machine learning engineers to understand data requirements and optimize data collection strategies.
  • Utilize technologies such as Python, Scrapy, Hadoop, and Spark for data collection and processing.

Qualifications

Required

  • Strong background in data collection and processing.
  • Passion for preparing high-quality datasets for AI model training.
  • Experience in web crawling and data pipeline development.
  • Ability to work in a dynamic environment.

Full Job Description

Member of Technical Staff, Pretraining Software Engineer


Introduction: We are seeking a Member of Technical Staff, Pretraining Software Engineer, who is eager to contribute to the development of our AI models through effective data pretraining techniques. This role offers the opportunity to work on the collection and preparation of data essential for training state-of-the-art AI models.


About the Company: Join a small, interdisciplinary AI studio that has trained several state-of-the-art language models and developed a personal assistant. This company is at the forefront of AI technology, focusing on finetuning and deploying models for commercial partners. With a commitment to prioritizing the well-being and happiness of partners, users, and stakeholders, this organization represents a transformative era in AI development.


About the Role: As a Member of Technical Staff, Pretraining Software Engineer, you will play a crucial role in the data collection and preparation process for training our advanced AI models. This involves designing and implementing web crawling and data processing systems to gather high-quality datasets, ensuring the effectiveness of our pretraining efforts.


What We Can Offer You:

  • Competitive salary range.
  • Unlimited paid time off.
  • Parental leave and flexibility for all parents and caregivers.
  • Generous medical, dental, and vision plans for US employees.
  • Compliance with country-specific benefits for non-US employees.
  • Visa sponsorship for new hires.
  • Avenues for personal growth such as coaching, conference attendance, or specific trainings.


Key Responsibilities:

  • Design and implement web crawling systems to collect large-scale datasets.
  • Develop data processing pipelines to clean, organize, and annotate collected data.
  • Ensure the quality and relevance of data for pretraining AI models.
  • Work with various data sources and formats to gather diverse and comprehensive datasets.
  • Collaborate with machine learning engineers to understand data requirements and optimize data collection strategies.
  • Utilize technologies such as Python, Scrapy, Hadoop, and Spark for data collection and processing.


This position as a Member of Technical Staff, Pretraining Software Engineer, is ideal for those with a strong background in data collection and processing, and a passion for preparing high-quality datasets for AI model training. If you have experience in web crawling and data pipeline development, and enjoy working in a dynamic environment, this role provides a unique opportunity to contribute to a leading AI platform.


Keywords: pretraining, data collection, web crawling, data processing, AI models, Python, Scrapy, Hadoop, Spark.

Search for other jobs like this one:

Search for popular related roles:

Search nearby locations hiring for this role: