Implementation of Elasticsearch core searching capabilities and its observability features using Filebeats, Metricbeats, and Logstash pipelines with the custom code implementation.
Exploring the new features of the product that supports the various CDC database replication solutions as a source to the different target systems
Installing and configuring the Striim product in the production environment and showcasing client demos with the various customer specific use cases.
Developing the Python utilities to achieve the custom use cases for the customers who struggles to automate the replication pipelines.
Replicating data between on-premise databases to cloud environments like BigQuey, PubSub, Azure's ADLS, Azure SQL Server and other Hadoop stacks like HDFS, Hbase and hive
Installing databases like Mysql, Postgres and Oracle to configure CDC(log based replication) and writing various custom and TPCC workloads to bench-marking the product and the databases
Writing Java/Python based custom components and UDF(User Defined Functions) to achieve the customer specific use cases which are not supported by the product by default
Building a simple to complex replication pipelines with the transformations, optimizing the pipelines which are under performing in the customer environment
Building, supporting for the development of an internal data migration, data validation frameworks which uses Spark as core to handle the huge datasets.
Big Data Engineer
Noah Data (Division of Indium Software)
10.2016 - 11.2018
Hadoop cluster setup using Cloudera distribution, tuning performance of Cluster & Hbase and provided maintenance support
Preparing a VM to add a node into the Hadoop cluster, commissioning and decommissioning of data nodes as per the requirements
Setting up high-availability(HA) for HDFS, Habse, Hive, Oozie, Hue and for the Cloudera manager, externalizing meta store for Cloudera manager, Hive and other services
Managing and tuning the Hadoop services based on the resource availability and load to ensure resource availability for other ETL and services
Setting up Sqoop import and exports jobs using Oozie with Hue as an ETL developer tool, created and managed 20+ ETL flows for the successful completion of jobs which are scheduled on a daily basis
Creating hive internal and external tables for maintaining historical data and tuning Yarn to running parallel queries on top of various workloads in the Hadoop cluster
Populated the real time data in Apache Phoenix by using Phoenix view which directly runs the SQL queries on top of NoSQL Hbase tables
Optimized the real-time reporting Phoenix SQL queries to ensure the quick response time and enforced the effective use of composite row key which we used for Hbase table modeling
Automated ETL processes, making it easier to wrangle data by using Oozie workflow in Hue and reducing time by as much as 300%
Done a few POCs to completely eliminate the MySQL database source(Licensed Queue -> MySQL-> HDFS) approach to Kafka as a centralized source of truth and directly landing them into the HBase table which was accomplished in the Phase-2 enhancement
Handling ETL failures by analyzing Hadoop/MapReduce logs, fixing service failures by finding its resource utilization metrics.
Junior Web Data Analyst
SineQure Software
08.2014 - 07.2016
Writing basic python script to scrap the web pages to extract usage history and other product related information
Running various machine learning pre-built models against the sample data and reporting to the team lead
Creating Mysql schema to store the reporting data for further analysis and running sql queries to generate summarized reports.
Education
M.Tech - Computer Science and Engineering
Bharath University
Chennai, India
08.2011 - 05.2013
B.Tech - Information Technology
Annai Teresa College of Engineering - Anna University
Villupuram, India
07.2005 - 05.2009
Skills
Hadoop Developerundefined
Accomplishments
Client Appreciation - Cash Reward.
Karix mobile private limited chennai
Client appreciation email sent to the top managers and CEO on completing the Karix project and a cash reward.
Two times won customer's pride of the quarter.
Client appreciation for handling ETL and implementation of secured(Kerberized) Hadoop cluster.
Timeline
Senior Project Lead
Indium Software Private Limited
11.2018 - Current
Big Data Engineer
Noah Data (Division of Indium Software)
10.2016 - 11.2018
Junior Web Data Analyst
SineQure Software
08.2014 - 07.2016
M.Tech - Computer Science and Engineering
Bharath University
08.2011 - 05.2013
B.Tech - Information Technology
Annai Teresa College of Engineering - Anna University