NVIDIA is seeking an outstanding Solutions Architect for AI Factory to assist and support customers that are building solutions with our newest AI technology. At NVIDIA, our Solutions Architects work across different teams and enjoy helping customers with the latest Accelerated Computing and Deep Learning software and hardware platforms. We're looking to grow our company, and build our teams with the smartest people in the world. Would you like to join us at the forefront of technological advancement?
You will become a trusted technical advisor with our customers and work on exciting projects and proof-of-concepts passionate about AI Factory. This role is an excellent opportunity to work in an interdisciplinary team with the latest technologies at NVIDIA!
주요업무
• Maintain an up-to-date understanding of the philosophy, architecture, and deployment methods of various evolving NVIDIA Reference Architectures—e.g., NVIDIA DGX SuperPOD Reference Architecture, NVIDIA Cloud Partner Reference Architecture, and NVIDIA Enterprise Reference Architecture.
• Analyze and understand the requirements of customer-initiated AI training or inference clusters.
• Identify the NVIDIA Reference Architecture that best matches customer needs and effectively communicate its value proposition to collaborators.
• Facilitate seamless communication between NVIDIA's internal deployment teams and customers during the implementation of AI clusters based on Reference Architectures.
• Provide hands-on technical support to developers after the AI Factory has been deployed, ensuring that AI training and inference workloads run effectively on the infrastructure.
자격 요건
• Bachelor’s degree or higher in Computer Science, Computer Engineering, or a related technical field.
• Solid understanding of basic principles behind cluster orchestration, such as compute resource provisioning and dynamic prioritization based on user demand.
• Minimum of 3 years of hands-on experience operating AI training or inference clusters that leverage Kubernetes with NVIDIA GPUs.
• Proficiency in key technologies including: Container Runtime Interface (CRI), Container Network Interface (CNI), Calico, NVIDIA GPU Operator, NVIDIA Network Operator, and Kubeflow Training Operator.
우대사항
• Foundational knowledge and experience with network technologies—such as InfiniBand and Ethernet—in AI cluster environments, including compute fabric interconnects between GPU servers, storage fabric integration, and in-band networks for system administration.
• Familiarity with the role of storage in AI training/inference clusters, including hands-on experience with vector databases and leading commercial storage solutions.
• Experience integrating MLOps platforms into Kubernetes environments, such as deploying Airflow for orchestrating distributed training workloads.
채용절차
For more details, please refer to the company website.