Senior MLOps Engineer Job at DeepRec.ai, San Jose, CA

M1hsRW1KcHl2UVgwSXlvdGdwRW9Ycjg0a0E9PQ==
  • DeepRec.ai
  • San Jose, CA

Job Description

Senior MLOps Engineer

We are hiring for an MLOps Engineer for a fast-moving AI startup who are building a worldclass AI-powered video platform.

We are looking for a skilled and hands-on MLOps Engineer to join their growing team. You will play a critical role in deploying, scaling, and maintaining their machine learning infrastructure, supporting a range of tools that enable the controlled generation of high-quality animated videos.

Key Responsibilities

  • Design, deploy, and maintain scalable training and data-processing pipelines on distributed compute clusters (e.g., Slurm, Kubernetes, or cloud-native equivalents).
  • Optimize inference systems for latency and cost in a production setting.
  • Collaborate closely with ML researchers and engineers to productionize deep learning models.
  • Implement robust monitoring, logging, and alerting systems for model performance and infrastructure reliability.
  • Automate model testing, validation, and deployment processes across staging and production environments.
  • Ensure efficient usage of compute resources, including GPU clusters, and help identify bottlenecks or cost-saving opportunities.

Requirements

  • Proven experience in MLOps, ML infrastructure, or related roles.
  • Deep expertise in deploying and maintaining ML training pipelines on distributed systems.
  • Strong knowledge of inference optimization techniques, especially in reducing latency and cost at scale.
  • Proficiency with cloud platforms (AWS, GCP, Azure) and orchestration tools (Kubernetes, Docker).
  • Experience working with GPU scheduling, distributed training (e.g., PyTorch DDP), and model serving frameworks (e.g., Triton, TorchServe).
  • Familiarity with CI/CD for ML workflows.
  • Strong Python skills and experience with ML/DL frameworks like PyTorch or TensorFlow.

Bonus Points

  • Experience working in the creative media or animation industry.
  • Exposure to video processing, generative AI, or large-scale content production systems.
  • Experience collaborating with research teams or integrating research code into production pipelines.

Please apply for more information

Job Tags

Similar Jobs

Christian Health

Security Manager Job at Christian Health

 ...join us! We have an exciting opportunity for a full-time Security Manager to join our Risk Management & Safety department. The Security...  ...main Commons entrance from 6am to 8am weekdays and 6am 9am weekends. Covers gap before Director Safety and Security arrival.... 

Trinity Health

Endoscopy Registered Nurse Job at Trinity Health

 ...GENERAL SUMMARY:**Provides patient care, through all phases of the Endoscopy experience; admission throughdischarge according to...  ...Collects and analyzes data and assesses patient in developing nursing diagnosis / plan of care.- Communicates effectively with the patient... 

TBG | The Bachrach Group

Customer Service Representative Job at TBG | The Bachrach Group

 ...including order tracking, returns, exchanges, and ongoing product support. Collaborate with the marketing team to drive online sales...  ..., accurate, and optimized for search engines (SEO). Use CRM software to maintain detailed records of customer interactions and sales... 

Food For The Poor

Major Gift Philanthropy Advisor - Los Angeles, CA Job at Food For The Poor

*** Candidates to be considered must reside in Los Angeles, California *** About Food For The Poor Food For The Poor, one of the largest international relief and development organizations in the nation, does much more than feed millions of the hungry poor primarily...

U.S. Tsubaki Power Transmission, LLC

Tooling Specialist I (UH) - 2nd shift Job at U.S. Tsubaki Power Transmission, LLC

 ...state-of-the-art products available in the world and we strive to be the Best Value supplier in the industry. The Tooling Specialist's primary responsibility is steel tooling accompanied with any and all work required or assigned to ensure that a smooth flow of...