if(we) is hiring a

Senior Data Engineer

San Francisco, United States

About if(we)

if(we), formerly known as Tagged Inc., is a company building social products to enable meaningful connections between people. We’re a profitable startup growing our existing successful products, Tagged & hi5, and building new social products on mobile and other key platforms. We're constantly exploring new ideas and technologies to build the next massive social product and realize our mission of connecting people.

Created by the team that brought Tagged to the world in 2004, if(we) was founded in 2014. if(we) was launched to build off of Tagged’s earlier experiences and success, tap the market growth and create the next billion user social product.

About the Team

Our team supports Data Engineering at if(we). We are a very diverse ecosystem consisting of 400 TB of relational databases and 2 PB of Hortonworks Hadoop platform. Your role will be to engineer and maintain the Hadoop ecosystem.

About the Job

Our team is looking for an experienced Data Engineer who has a passion to build data products and data systems. As a key member of our Data team, you will be responsible for designing and developing major components of big data stream and batch processing applications. As a Data Engineer, you should be familiar with and have hands-on experience with all aspects of big data engineering from data ingestion of various types of sources to common data cleansing to transformation techniques. You should have proven expertise to develop a publish-subscribe distributed logging system using Kafka and data ingestion framework Camus in Avro serialized format. You should be able to write scalable Map Reduce jobs to extract data from HDFS into Hive, HBase and Amazon Redshift as necessary. You should be proficient in Python to build a scheduler pipeline using Airflow or similar technologies.Agility and innovativeness are the keys to success in this role.


  • Build and maintain code to populate HDFS, Hadoop with log events from Kafka or other SQL production systems

  • Design, build and support pipeline of data transformation,conversion and validation

  • Design and support effective storage and retrieval of 2 Petabytes Big data ecosystem

  • Design and support Avro serialized schema repository and use Hive or Spark as necessary for different use cases

  • Support and Tune a big data pipeline starting from Kafka to HDFS,Hbase and Amazon Redshift

  • Lead the effort of building a unified Kafka cluster to support multiple consumers

  • Participate in Kafka upgrade to latest version

  • Lead the company initiative to migrate Hadoop Hortonworks Data Platform to Amazon Elastic Map Reduce

About You

  • Experience with Hadoop stack (HIVE, Spark,HBase,Hadoop streaming),MapReduce

  • Familiarity with different data formats and serialization, like JSON, AVRO

  • Strong grasp of algorithms and data structures

  • Database experience with MySQL, Postgres

  • Proficient in these languages: Java, Python

  • Experience with Test Driven Code Development and SCM tools such as GIT

  • Good familiarity with Linux/Unix scripting

  • MS in Computer Science/Engineering is required

  • Strong communication skills

#ifwe #ind123