Principal Machine Learning Engineer | United States | Oracle

Position:	Principal Machine Learning Engineer
Institution:	Oracle
Location:	United States
Duties:	This role is available on the OCI AI Services and Data org. We are addressing exciting challenges at the intersection of artificial intelligence and cutting-edge cloud infrastructure. We are building state of the art data processing, model training and benchmarking platform. As Software Engineer on our team, we build services and tools to manage model lifecycle, model provenance, model catalog and model training. We build shared GPU super cluster that enables customers to easily onboard run, monitor and managing AI models at scale with OCI. You will have the opportunity to work on the LLM accelerators, setup ML training and benchmarking platforms, runtimes, libraries in the open-source projects that enable low friction performance optimized large scale training and inferencing of the world’s most advanced AI models
Requirements:	Bachelor’s degree in computer science, engineering, or an equivalent highly technical field; 6+ years of software engineering experience and a proven track record of successfully architecting and shipping high performance, low latency AI/ML enabled products & services; Strong technical understanding in building complex, scalable, low latency streaming/batch processing AI/ML cloud services; Proven track record on running operations for a cloud service; Deep knowledge of large-scale compute, network, and storage systems; Experience working with Distributed Systems

Text:	Principal Machine Learning Engineer This role is available on the OCI AI Services and Data org. We are addressing exciting challenges at the intersection of artificial intelligence and cutting-edge cloud infrastructure. We are building state of the art data processing, model training and benchmarking platform. As Software Engineer on our team, we build services and tools to manage model lifecycle, model provenance, model catalog and model training. We build shared GPU super cluster that enables customers to easily onboard run, monitor and managing AI models at scale with OCI. You will have the opportunity to work on the LLM accelerators, setup ML training and benchmarking platforms, runtimes, libraries in the open-source projects that enable low friction performance optimized large scale training and inferencing of the world’s most advanced AI models Bachelor’s degree in computer science, engineering, or an equivalent highly technical field; 6+ years of software engineering experience and a proven track record of successfully architecting and shipping high performance, low latency AI/ML enabled products & services; Strong technical understanding in building complex, scalable, low latency streaming/batch processing AI/ML cloud services; Proven track record on running operations for a cloud service; Deep knowledge of large-scale compute, network, and storage systems; Experience working with Distributed Systems