Research Scientist in Large Model System – 2915

Seattle, US-United States
Posted 1 month ago
About The Company

This company pioneers short-form video creation and social engagement, boasting a vast, engaged user base. Its platform empowers users with creative tools, filters, and effects. With a diverse content ecosystem, it’s a hub of creativity and expression. The proprietary algorithm ensures personalized content feeds, enhancing user engagement and satisfaction. This company wields significant influence on digital media, making it an invaluable partner for innovative collaborations and marketing endeavors.


Job Description

Responsible for the machine learning system development of the company’s large-scale models, researching new applications and solutions of related technologies in areas such as search, recommendation, advertising, content creation, conversation, and customer service, meeting the growing demand for intelligent interaction from users, and comprehensively improving users’ lifestyles and communication methods in the future world.

The main work directions include:
1. Responsible for the design and development of the architecture of large-scale machine learning systems, solving technical difficulties such as high concurrency, high reliability, and high scalability of the system.
2. Covering various sub-directions of machine learning system, including resource scheduling, model training, model inference, data management, and workflow orchestration.
3. Responsible for the research and introduction of advanced technologies in machine learning systems, such as the latest hardware architecture, heterogeneous computing systems, and compiler-based optimization technologies.
4. Working closely with the algorithm teams to optimize the algorithm and system jointly.


Qualifications

1. Excellent coding ability, solid foundation in data structures and basic algorithms, proficient in C/C++ or Python, winners of ACM/ICPC, NOI/IOI and other competitions are preferred.
2. Familiar with at least one mainstream machine learning framework (TensorFlow/PyTorch).
3. Master the principles of distributed systems, and participated in the design, development, and maintenance of large-scale distributed systems.
4. Strong sense of responsibility, good learning ability, communication ability, and self-motivation.
5. Good communication and collaboration skills, able to explore new technologies with the team and promote technological progress.
6. The following experiences will be a big plus:
– Prior experience in large-scale projects or papers with great influence in the field of large models.
– Familiar with NLP, CV-related algorithms, and technologies, and experienced in large model training and RL algorithms.
– Experience in one of the following fields: CUDA, RDMA, AI Infrastructure, HW/SW Co-Design, High-Performance Computing, ML Hardware Architecture (GPU, Accelerators, Networking), ML for System, and Distributed Storage.

Job Features

Job CategoryAI Engineering
SeniorityJunior / Mid IC
Base Salary$180,000 - $330,000
Recruiteryuxuan.sheng@ocbridge.ai

Apply Online