OKX is hiring a

Staff Site Reliability Engineer

San Jose, United States

Who We Are

OKX is revolutionising world systems through our cutting-edge digital asset exchange, Web3 portal and blockchain ecosystems.We are deeply committed to shaping a fairer, more transparent and accessible society through blockchain technology and to date, we have 50+ million users, 3000+ employees and 180+ countries believing in the same vision as us. We are safe and reliable, backed by our Proof of Reserves. As strong supporters of the Arts and Sports, we are proud partners of @McLarenF1 @ManCity @Tribeca.

About the Team 

Site Reliability Engineering is a critical engineering discipline and a job function in the company. Its charter is to build tools and infrastructure that promote early detection of production failures, leading to a stellar customer experience.

Our work is to drive safety, health and uptime of our platform, and the ability to remedy unforeseen problems. By removing some of the complex burdens in how to scale and maintain uptime in distributed systems, SRE allows development teams to focus on feature development instead of the nuances of achieving and maintaining service level commitments.

About the Opportunity:

We’re looking for a creative and driven individual that can spearhead our effort to push “outside the box” infrastructure implementations, that will have a tremendous impact on our platform’s stability and scalability.

What You’ll Be Doing:

  • Responsible for the maintenance and configuration of AWS/Alicloud products and services
  • Responsible for the research, architecture and project implementation solutions based on AWS/Alicloud products
  • Responsible for the daily maintenance of each AWS cloud environment
  • Responsible for the preparation of documents related to AWS Cloud O&M and the formulation of O&M specifications

What We Look For In You:

  • Proficient in AWS and Alicloud distributed management, large-scale clustering, fault tolerance, backup, load balancing and other technologies
  • Have a deep understanding of high availability architecture, capacity planning, and rich experience in handling complex problems
  • Have solid Linux platform operation and maintenance and debugging capabilities, and be proficient in troubleshooting, configuration tuning, and performance analysis
  • Familiar with the functional features of AWS/Alicloud products and core products, and have rich practical experience in deployment and tuning of EC2, EKS, VPC, or big data products
  • Experience in monitoring, O&M and management of AWS/Alicloud large-scale servers and containers
  • Familiar with Internet company architecture, familiar with nginx/redis/MySQL/kafka/es and other configurations
  • Familiar with the deployment, configuration and maintenance of Nginx, kong and other software
  • Proficient in using Python/shell for development
  • Strong engineering skills, proficient in at least one O&M or infrastructure sub-area, public cloud networking, SRE, DevOps or cloud-native
  • Excellent business analysis ability, system architecture ability, and problem-solving ability. and strong self-drive

Nice to Haves:

  • Bilingual in English and Mandarin
  • Familiar with the operation and maintenance management of Alibaba Cloud, Google Cloud, Microsoft Cloud and other cloud providers.

Highlights of Perks and Benefits:

  • Competitive total compensation package
  • L&D programs and Education subsidy for employees' growth and development
  • Various team building programs and company events
  • Wellness and meal allowances
  • Comprehensive healthcare schemes for employees and dependants 
  • More that we love to tell you along the process!

OKX Statement:

OKX is committed to equal employment opportunities regardless of race, color, genetic information, creed, religion, sex, sexual orientation, gender identity, lawful alien status, national origin, age, marital status, and non-job related physical or mental disability, or protected veteran status. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

The salary range for this position is $175,000 to $262,000. The salary offered depends on a variety of factors, including job-related knowledge, skills, experience, and market location. In addition to the salary, a performance bonus and long-term incentives may be provided as part of the compensation package, as well as a full range of medical, financial, and/or other benefits, dependent on the position offered. Applicants should apply via OKX internal or external careers site."


Apply for this job

Please mention you found this job on Startup Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quicker

Be the first to apply. Receive an email whenever similar jobs are posted.

Ace your job interview

Understand the required skills and qualifications, anticipate the questions you may be asked, and study well-prepared answers using our sample responses.

Staff Site Reliability Engineer Q&A's
Report this job
Apply for this job