We are experiencing a hockey stick of scale and therefore Tradeshift is looking to establish a new site reliability engineering team to assist with keeping our platform running smoothly and efficiently.
The Tradeshift platform is hosted on AWS EC2 and composed of a number of Java applications (with a dash of node.js and Scala/Akka), HornetQ, Elasticsearch, Riak and Postgres. All of our current infrastructure is highly automated with Jenkins, Puppet, Docker, etc playing key roles. We need someone who wants to solve site reliability challenges on a global level and in this type of environment. On the engineering side of things we subscribe to continuous integration (with a lot of automated testing), automation of the build and release processes, test/behavior driven development, etc.
We expect that you know how to do load balancing, identify bottlenecks in large distributed systems, which metrics to monitor for overall platform health, do DB tuning, etc, etc. All the jazz that comes with making a global, multi-tenant always-on platform scale with very rapid growth. We will provide you with ample opportunity to provide a ton of value, solve real problems, and if you are good enough, articulate longer term thinking for the site reliability effort.
This job has been filled or removed by Tradeshift.
You can leave your email address and we will send you an email when there is a new Site Reliability Engineer job post.