if(we), formerly known as Tagged Inc., is a company building social products to enable meaningful connections between people. We’re a profitable startup growing our existing successful products, Tagged & hi5, and building new social products on mobile and other key platforms. We're constantly exploring new ideas and technologies to build the next massive social product and realize our mission of connecting people.
Created by the team that brought Tagged to the world in 2004, if(we) was founded in 2014. if(we) was launched to build off of Tagged’s earlier experiences and success, tap the market growth and create the next billion user social product.
About the Team
Our team supports Data Engineering at if(we). We are a very diverse ecosystem consisting of 400 TB of relational databases and 2 PB of Hortonworks Hadoop platform. Your role will be to engineer and maintain the Hadoop ecosystem.
About the Job
Our team is looking for an experienced Data Engineer who has a passion to build data products and data systems. As a key member of our Data team, you will be responsible for designing and developing major components of big data stream and batch processing applications. As a Data Engineer, you should be familiar with and have hands-on experience with all aspects of big data engineering from data ingestion of various types of sources to common data cleansing to transformation techniques. You should have proven expertise to develop a publish-subscribe distributed logging system using Kafka and data ingestion framework Camus in Avro serialized format. You should be able to write scalable Map Reduce jobs to extract data from HDFS into Hive, HBase and Amazon Redshift as necessary. You should be proficient in Python to build a scheduler pipeline using Airflow or similar technologies.Agility and innovativeness are the keys to success in this role.
Build and maintain code to populate HDFS, Hadoop with log events from Kafka or other SQL production systems
Design, build and support pipeline of data transformation,conversion and validation
Design and support effective storage and retrieval of 2 Petabytes Big data ecosystem
Design and support Avro serialized schema repository and use Hive or Spark as necessary for different use cases
Support and Tune a big data pipeline starting from Kafka to HDFS,Hbase and Amazon Redshift
Lead the effort of building a unified Kafka cluster to support multiple consumers
Participate in Kafka upgrade to latest version
Lead the company initiative to migrate Hadoop Hortonworks Data Platform to Amazon Elastic Map Reduce
Experience with Hadoop stack (HIVE, Spark,HBase,Hadoop streaming),MapReduce
Familiarity with different data formats and serialization, like JSON, AVRO
Strong grasp of algorithms and data structures
Database experience with MySQL, Postgres
Proficient in these languages: Java, Python
Experience with Test Driven Code Development and SCM tools such as GIT
Good familiarity with Linux/Unix scripting
MS in Computer Science/Engineering is required
Strong communication skills
This job has been filled or removed by if(we).
You can leave your email address and we will send you an email when there is a new Senior Data Engineer job post.✕