Founded in August of 2008 and based in San Francisco, California, Airbnb is a trusted community marketplace for people to list, discover, and book unique travel experiences around the world. Whether an apartment for a night, a castle for a week, or a villa for a month, Airbnb allows people to Belong Anywhere through unique travel experiences at any price point, in more than 34,000 cities and over 190 countries. We promote a culture of curiosity, humanity, and creativity through our product, brand, and, most importantly, our people.
Role & Responsibilities
The IT Experience (ITX) Core Operations team is looking for an Observability Engineer ideally with expertise in Enterprise Network, Systems, and Application monitoring and logging development.
Role & Responsibilities
- Develop and improve instrumentation for monitoring and logging the health and availability of services.
- Proactively monitor systems, networks, and applications to provide input in improving the stability, security, efficiency, and scalability of systems.
- Develop and maintain Monitoring and Logging Frameworks for all of ITX
- Take personal responsibility for the quality, reliability and availability of global IT corporate infrastructure.
- Own operations documentation of monitoring and logging for global IT production infrastructure.
- Participate in rotating on-call incident response on the weekdays and on the weekends.
- Improve operational efficiencies via scripting, bots and integrations.
- Participate cross functionally with vendors and other IT engineering teams to ensure smooth service delivery.
- Network and systems troubleshooting, fault analysis, and resolution.
- In-depth experience designing at scale monitoring and logging for corporate infrastructure services.
- Expert level experience in monitoring and logging technologies, both open source and closed source (e.g. LogicMonitor, SumoLogic, ELK)
- Experience in RBAC and user based security services such as ISE, Radius, LDAP, and AD.
- Must have strong automation/scripting skills - proficiency in Python or Ruby is a plus.
- Proficient in developing and maintaining technical documentation, runbooks, and procedures.
- A working knowledge in Network is needed. Fundamental knowledge of TCP/IP stack, application protocols (DHCP/DNS/HTTPs) and networking concepts (HSRP/NAT/VPN/VLANs/802.1x/Wireless/Clustering/High Availability/Load Balancing).
- Understanding of enterprise networks using Cisco IOS/NXOS with a working knowledge of IP Protocols (TCP/UDP/ICMP) and Routing Protocols (BGP/OSPF/IS-IS).
- Technology understanding of Palo Alto Firewalls, including Firewall Policy Rules, URL-Filtering, App-ID, User-ID, etc.
- Experience interacting with Telco and Global ISPs (WAN/DIA) and the monitoring of those services.
- A working knowledge of systems is needed. Fundamental knowledge of Configuration Management and Automation tools, with experience in:
- Terraform, Ansible, Chef, Puppet, Jenkins
- Designing and implementing CI/CD pipelines
- Infrastructure provisioning and management
- Strong in troubleshooting incidents in production environment.
- A strong ownership attitude and a track record of taking responsibility for problems and pushing through to resolution.
- Bachelor's degree in Computer Science or EE, or relevant industry experience is required.
- Ability to communicate and coordinate with cross-functional engineering teams across multiple geographic regions.
Nice To Have:
- Ability to take lead in an operations environment.
- Contributed to Open Source - your public Git repos/contributions show good examples of giving back to the community.
- Architected a monitoring and logging infrastructure that was technology agnostic for a production infrastructure environment.
- Knowledge of revision control software such as GIT.
- Familiarity with REST APIs scripting, i.e. with PAN OS API / Infoblox WAPI.
- Experience with integrations in Google Admin, Duo, OneLogin, Slack, and PagerDuty.
- Administrative experience with 1Password, LastPass, or other cloud based password managers.