Airbnb is hiring an

Observability Architect

San Francisco, United States
Founded in August of 2008 and based in San Francisco, California, Airbnb is a trusted community marketplace for people to list, discover, and book unique travel experiences around the world. Whether an apartment for a night, a castle for a week, or a villa for a month, Airbnb allows people to Belong Anywhere through unique travel experiences at any price point, in more than 34,000 cities and over 190 countries. We promote a culture of curiosity, humanity, and creativity through our product, brand, and, most importantly, our people.
 

Role & Responsibilities

The IT Experience (ITX) Core Operations team is looking for an Observability Engineer ideally with expertise in Enterprise Network, Systems, and Application monitoring and logging development. 

Role & Responsibilities

  • Develop and improve instrumentation for monitoring and logging the health and availability of services. 
  • Proactively monitor systems, networks, and applications to provide input in improving the stability, security, efficiency, and scalability of systems. 
  • Develop and maintain Monitoring and Logging Frameworks for all of ITX
  • Take personal responsibility for the quality, reliability and availability of global IT corporate infrastructure.
  • Own operations documentation of monitoring and logging for global IT production infrastructure.
  • Participate in rotating on-call incident response on the weekdays and on the weekends.
  • Improve operational efficiencies via scripting, bots and integrations.
  • Participate cross functionally with vendors and other IT engineering teams to ensure smooth service delivery.
  • Network and systems  troubleshooting, fault analysis, and resolution.

Requirements

  • In-depth experience designing at scale monitoring and logging for corporate infrastructure services.
  • Expert level experience  in monitoring and logging technologies, both open source and closed source (e.g. LogicMonitor, SumoLogic, ELK)
  • Experience in RBAC and user based security services such as ISE, Radius, LDAP,  and AD.
  • Must have strong automation/scripting skills - proficiency in Python or Ruby is a plus.
  • Proficient in developing and maintaining technical documentation, runbooks, and procedures.
  • A working knowledge in Network is needed. Fundamental knowledge of TCP/IP stack, application protocols (DHCP/DNS/HTTPs) and networking concepts (HSRP/NAT/VPN/VLANs/802.1x/Wireless/Clustering/High Availability/Load Balancing). 
    • Understanding of enterprise networks using Cisco IOS/NXOS with a working knowledge of IP Protocols (TCP/UDP/ICMP) and Routing Protocols (BGP/OSPF/IS-IS). 
    • Technology understanding of Palo Alto Firewalls, including Firewall Policy Rules, URL-Filtering, App-ID, User-ID, etc.
    • Experience interacting with Telco and Global ISPs (WAN/DIA) and the monitoring of those services.
  • A working knowledge of systems is needed. Fundamental knowledge of Configuration Management and Automation tools, with experience in:
    • Terraform,  Ansible, Chef, Puppet, Jenkins
    • Designing and implementing CI/CD pipelines
    • Infrastructure provisioning and management
  • Strong in troubleshooting incidents in production environment. 
  • A strong ownership attitude and a track record of taking responsibility for problems and pushing through to resolution.
  • Bachelor's degree in Computer Science or EE, or relevant industry experience is required.
  • Ability to communicate and coordinate with cross-functional engineering teams across multiple geographic regions.

Nice To Have:

  • Ability to take lead in an operations environment.
  • Contributed to Open Source - your public Git repos/contributions show good examples of giving back to the community.
  • Architected a monitoring and logging infrastructure that was technology agnostic for a production infrastructure environment.
  • Knowledge of revision control software such as GIT.
  • Familiarity with REST APIs scripting, i.e. with PAN OS API / Infoblox WAPI.
  • Experience with integrations in Google Admin, Duo, OneLogin, Slack, and PagerDuty.
  • Administrative experience with 1Password, LastPass, or other cloud based password managers.

Similar jobs

Other jobs at Airbnb