Site Reliaibility Engineer - Research Computing

Carter Wellington Limited - More jobs by this advertiser

Site Reliability Engineer - Research Computing

 

The Research Infrastructure Cloud HPC team is a group of experts solving computing problems in the critical path of Research within the business. We work directly with Research and Model Implementation teams and provide them with tools and compute resources to take their ideas from inception to real tradable products. We are looking for an ambitious and operationally minded software engineer to join our team as we mature and scale our cloud HPC platform from a successful strategy-specific offering to the next iteration of our firm-wide Research platform.  

 

Why join?  Our client has a stellar 25-year track record and a reputation for excellence. Our goal is to be the best quantitative investment manager in the world—measured by the quality of our products, not their size.  The companies very high employee-retention rate speaks for itself.  Our people are intellectually extraordinary, and our community is close-knit, down-to-earth, and diverse. 

 

Responsibilities 

We are a small flat team sitting at the cross-section of research, implementation, and systems infrastructure. Our team responsibilities span many areas. Below find a sampling of the types of work you will be expected to work on: 

  • Design and implementation of cloud-based HPC systems. Our projects typically involve equal parts engineering and operations for success in our fast-moving environment. You will be expected to do both for projects small and large. 
  • Running our HPC plant day-to-day. Our research environment is up 24/7, and we want to keep it that way. Everybody on the team contributes to the support of our plant, which thankfully is light because of our automation and quality work. 
  • Implementing automation. We will always choose to work smart over working hard. You will be responsible for conception and implementation of automation from CI/CD pipelines to production metrics and monitoring of our cloud HPC platform. 
  • Capacity management and benchmark optimization. Our demand for compute is constant and involves challenging problems focused on scaling our compute and optimizing it for research-critical workloads. 
  • Obsessive User Focus. All members of the team are expected to partner with researchers and engineers to deliver high-quality cloud HPC systems that are efficient and reliable. This includes leading projects to evolve it as our needs change. 

 

Qualifications 

  • 5+ years of software engineering and/or systems programming experience 
  • 2+ years of experience working with a public cloud, AWS preferred 
  • Mastery of at least one programming language building production systems, Python preferred 
  • Experience with a production configuration management tool, Salt/SaltStack preferred 
  • Experience with a cloud-based infrastructure-as-code tool, Terraform preferred 
  • Excellent written and verbal communication skills 
  • Past experience working with or supporting researchers and/or other developers is a plus  
  • Knowledge of Slurm or similar HPC schedulers and resource managers is a plus 

 

Education

Bachelor’s degree in computer science, engineering, or a related field from a strong academic program. 

 

 



Carter Wellington Limited


Company Profile

3 June 2020
Location: United States New York New York
Salary: $150000 to $200000 USD per year
Work type:
Full time
Sector:
Information and Communication Technology
Profession:
Apply Now

This advertiser's application form opens in a new window. Check your browser's popup blocker settings if you experience problems.

Bookmark and Share
  • Previous Next


This website uses cookies

This website uses cookies to improve user experience. By using our website you consent to all cookies in accordance with our Cookie Policy.
I agree
Read more