Paige is hiring a

Senior Systems Engineer (High Performance Computing)

New York, United States

Paige is on a mission to accelerate and transform the diagnosis and treatment of cancer. Paige is creating a digital platform for pathologists to transform their workflow and is developing a new class of computational diagnostics positioned to drive the future of pathology.

A career at Paige is deeply mission-driven where you will work with state-of-the-art technologies alongside leaders in the field and to improve cancer care every day. We reach high, help each other succeed, and believe in creativity, curiosity, and creating amazing products.

Paige employees receive great benefits, including health insurance, 401(k), paid parental leave, and a generous health and wellness stipend.  We recently raised $70M in series B investment to ensure we’re able to continue our mission and expand our impact. Join us!

We’re seeking an experienced Senior Systems Engineer (HPC) to administer and support our High Performance Computing cluster.  You will work closely with engineering and data management teams on cutting-edge technologies.

This is an extraordinary opportunity to be part of a high-performing team and to pursue a life-changing mission with unique technical challenges!

Responsibilities:

  • Design, plan, test and implement innovative hardware designs for an HPC environment
  • Implement, support, and provide technical guidance for engineering team initiatives and projects
  • Build automation for infrastructure provisioning, configuration management, and account access (emphasis on SaltStack)
  • Install, provision, and support complex Cisco Nexus HPC switching environment (RoCE)
  • Responsible for the design structure and maintenance of an Pure Storage and Qumulo enterprise network attached storage system (NAS).
  • Regularly evaluate and recommend new tools and technologies for use in existing and future clusters
  • Deploy patches and updates to operating systems and application software

Required Skills and Experience

  • Master’s in Computer Science, engineering, information systems or related field, or equivalent years' experience
  • 8+ years’ experience in systems engineer role
  • Deep knowledge of server components CPU, SSD, GPU, Networking
  • Deep knowledge of High Performance Computing (HPC) / Cluster technologies with high-speed interconnect fabrics using Ethernet/RoCE and Infiniband
  • Expert knowledge of SAN and NAS services (iSCSI, NFS, CIFS)
  • Expert knowledge of TCP/IP networking, network security, and DNS (BIND, Windows)
  • Expert knowledge of Linux (Ubuntu, CentOS), common UNIX services, and Shell scripting
  • Strong understanding of high speed HPC interconnects
  • Strong knowledge of parallel GPU computing, MPI, and RDMA within containerized environments
  • Strong knowledge of NVIDIA software environment, NCCL, NGC, GPU tools
  • Strong experience working with operation and administration of workload schedulers such as Slurm, LSF, or PBS
  • Strong knowledge of virtualization technologies such as KVM/libvirt/QEMU
  • Experience working with configuration management tools like SaltStack, Chef, or Puppet

Desired Skills

  • Working knowledge of kubernetes and docker containers within an on-prem HPC cluster
  • Understanding of data pipelines to include ETL and streaming data such as log data or tool/sensor data to indexes (EMR)
  • Understanding of cloud platforms and services, particularly AWS
  • Understanding of Jupyter Notebook technology
  • Understanding of CI/CD pipelines
  • Understanding of Agile development methodologies