Since our launch in 2012, we’ve been on a mission: to make digital identification simple and secure for everyone, and everything.
In that time, we’ve expanded constantly, and been joined by over 150 incredible people, all with the same vision. We’ve grown in other ways too – we raised $35 in our Series A funding round, and launched our game-changing authentication platform.
Our technology is now being used by hundreds and thousands of users worldwide, including some of the world’s leading financial institutions.
And this is just the beginning.
Over the next year, Callsign will double in size as we continue our mission to make every web, mobile and physical interaction seamless and secure.
We can’t do that alone, though. That’s why we’re looking hire the brightest, most inquisitive minds out there: the people who want to help us change the rules of identity – and have the skills and passion to make this mission a reality.
Does that sound like you? Let’s talk.
Site Reliability Engineering (SRE)
Platform Operations/Customer support - London/Manchester, London, City of London
About the job
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Callsign's services both our internally critical and our externally-visible systems have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever- watchful eye on capacity and performance.
SRE is also a mindset and a set of engineering approaches to running better production systems we build our own creative engineering solutions to operations problems. Much of our software development focuses on optimising existing systems, building infrastructure and eliminating work through automation. Practices such as blameless post-mortems and proactive identification of potential outages that is key to both product quality and interesting and dynamic day-to-day work. As with any engineer we hire, the ability to communicate well is crucial, and you manage multiple initiatives with multiple engineers potentially across multiple time zones in order to achieve Callsigns reliability and efficiency goals.
Behind everything our users see online is the architecture built by the Platform Operations team to keep it running. We're striving to keep our systems up and running, ensuring our users have the best and fastest experience possible.
· Experience in one or more of the following: C, C++, Java, Python, Go, Ruby or shell scripting
· Experience with Unix/Linux operating systems internals and administration (e.g. filesystems, system calls) or networking (e.g. TCP/IP, routing, network topologies and hardware)
· Experience with containers and containers orchestration (e.g. Kubernetes, Docker) Extensive knowledge of AWS
· Experience with cloud hosted application-monitoring tools such as Kibana
· Excellent communication skills with the ability to present complex technical information in a clear and concise manner to a variety of audiences, both technical and non-technical
· Comfortable working in a fast-paced, multi-tasking, dynamic environment
· Experience with deployment automation, working with platforms for configuration management, provisioning and artifact repositories
· Experience in improving internal processes and good understanding of security engineering
· Capable of grasping, modifying and maintaining systems and code developed by others.
· Working knowledge of build tools, such as Make, Maven and CMake
· Expertise in designing, analysing and troubleshooting large-scale distributed systems.
· Knowledge and familiarity with CI/Pipeline processes such as Gitlab, Ansible and other similar tools
· Ability to debug and optimise code and automate routine tasks
· Systematic problem-solving approach, coupled with a strong sense of ownership, drive and determination.
· Ability to think outside the box and find innovative solutions to complex problems.
· Interested in new technologies and a hunger to continue learning
· Proactively maintain services once they are live by measuring and monitoring availability, latency and overall system health.
· Respond quickly to issues and mobilise responsible individuals quickly to achieve the fasted possible resolution
· Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
· Scale system and service sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and speed of service resolution
· Continually analyse service to end customers with a view to enhancing customer experience, eradicating issues, fixing root causes and driving quality into everything we do
· Educating customer support operations and customer help desks to increase skills and knowledge.
Please note: successful candidates are subject to a background check for this role