Ekata provides global identity verification via enterprise-grade APIs and a SaaS solution. Our product suite is powered by Ekata Identity Engine, the first and only cross-border identity engine of its kind. It uses complex machine learning algorithms across the five consumer attributes of email, phone, name, physical address, and IP to derive unique data links and features from billions of real-time transactions within our customer network and the globally sourced data of our graph. Businesses around the world including Alipay, Stripe, Airbnb, and Microsoft leverage our solutions to approve more good transactions, reduce friction, and find fraud.
We're looking for an exceptional Site Reliability Engineer to join our team, and work in an environment where you can fit the best technology to the challenge. As a member of the Ekata Infrastructure team you will work with top-notch engineers to design and implement best in class technology to solve problems.
The Site Reliability Engineer manages our production environment, providing a highly available and scalable platform for Ekata to serve our customers. The Infrastructure team provides a resource for Engineering to help diagnose production issues, and provide guidance on improving the availability and performance of our applications. This position also develops systems, automation and tools to help make it easier for Engineering teams to deploy services in a fast, automated and reliable fashion.
In the Site Reliability Engineer role you will:
Build, scale and support high-availability Ubuntu Linux production and development systems in a public cloud environment \
Work with tools such as Ansible, ArgoCD, Terraform, Cloudformation, Resource Manager and many more to ensure that our stack is well represented as Infrastructure as Code
- Manage and Improve security and availability monitoring for all services, ensure defined security policies are consistently implemented across all environments
- Deploy workloads to multiple cloud environments, proven experience with all of the core services within AWS, Azure or GCP, including instance management, IAM configuration, Database, Caching and general support/troubleshooting
- Have a developed understanding of the core components required to run Kubernetes and be able to build a cluster from scratch if needed
- Have perfected the fundamentals of load balancing and are always looking for ways to improve availability and uptime
- Maintain quality documentation for systems owned by the Infrastructure team
- Use monitoring tools to identify and resolve issues before they happen
- Help other teams troubleshoot and solve failures and performance problems, participate in on-call rotations
- Have a passion for working with Go, Python, Rust or even Bash to build custom tools and improve system integration. Take code ownership to the next level and act as an advocate for writing code that aligns with industry best practice
- Have a solid grasp on networking fundamentals and can easily explain how DNS, DHCP and routing work in most environments