Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sai Prasad Nayani

Hyderabad

Summary

Big Data & Hadoop Developer with 8+ years of experience in building scalable data engineering solutions. Skilled in Hadoop ecosystem (Hive, Pig, HBase, Sqoop, Spark, PySpark, Spark SQL) with strong expertise in Python, SQL, and ETL pipeline development. Proven track record in data ingestion, transformation, and workflow automation using HDFS, AWS, Azure, and BMC Control-M to deliver high-performance, large-scale analytics solutions.

Overview

9
9
years of professional experience
1
1
Certification

Work History

Sr Developer

Cgnizant
Hyderabad
11.2020 - Current

Data Engineer at Blue Cross Blue Shield.

  • Developed and executed ETL processes for integrating data lake assets into ORMB and multi-cloud environments (AWS, Azure).
  • Implemented data transformation workflows on Hive tables, consolidating data from various sources for analysis.
  • Developed Talend jobs for file transfers between servers, leveraging Talend FTP components.
  • Played a key role in developing scalable data lake architecture using Hadoop technologies (Hive, HBase, PySpark).
  • Developed HQL pipelines for event joins and pre-aggregations, reducing downstream query latency, and improving analytics efficiency.
  • Streamlined data processing by developing Control-M workflows for automated batch scheduling, reducing manual effort, and errors.
  • Implemented real-time data ingestion pipelines by integrating Kafka with Spark Streaming, and PySpark for storage in HDFS.
  • Processed JSON data with PySpark by encoding and decoding objects to build and modify Spark DataFrames.

Big Data Hadoop Developer

Blue cross blue shield
Chicago
03.2018 - 09.2020
  • Designed and optimized PySpark and Spark SQL code for high-performance data processing on Apache Spark.
  • Designed and implemented HQL queries to perform transformations, event-based joins, and pre-aggregations, ensuring optimized data storage in HDFS.
  • Designed and implemented Hive tables on HDFS, leveraging HiveQL for data querying and processing.
  • Developed and deployed AWS Lambda functions in Python to handle scalable computation workloads.
  • Aggregated large-scale datasets using Spark, and staged them in HDFS to enable downstream data analysis.
  • Redesigned Hive/SQL workflows into efficient Spark transformations with PySpark DataFrames and RDDs, reducing processing time, and improving maintainability.
  • Partnered with the QA lead to create test plans, cases, and streamline the defect identification and resolution process.

Hadoop Developer

Walmart
Bentonville
07.2017 - 02.2018
  • Tracked processing time for multiple fill indicators in the health and wellness project.
  • Developed reports using BI tools to highlight trends and key performance indicators.
  • Performed data analysis, transformations, and aggregations on source datasets, loading results into Hive tables for downstream use.
  • Delivered aggregated data into SQL for downstream consumption in BI tools, supporting business decision-making.
  • Participated in daily Agile Scrum meetings, contributing to system architecture design, and use case development.
  • Analyzed data sources, designed source-to-target mappings, and projected storage capacity requirements for Hadoop environments.
  • Collaborated with clients to capture requirements and scope timelines for developing advanced Hive queries in logistics systems.
  • Delivered scalable solutions using Microsoft Azure, managing project timelines, and deliverables effectively.

Hadoop Developer

Blue Cross Blue Shield
Phoenix
11.2016 - 07.2017
  • Created an Enterprise Data Hub to enable data analytics across business units using Cloudera Hadoop.
  • Defined, designed, and developed Java applications, leveraging Hadoop frameworks, including Cascading and Hive.
  • Coordinated with the offshore development team for application development and unit testing.
  • Developed workflows using Oozie to execute MapReduce jobs and Hive queries.
  • Loaded log data into HDFS directly using Flume for efficient data processing.
  • Migrated applications from on-premises data centers to AWS Public Cloud after architecture analysis.
  • Built reusable Hive UDF libraries to support complex business requirements in querying.
  • Provisioned Azure Data Lake Store and Analytics while leveraging U-SQL for cross-service queries.

Education

Master of Science - Information Systems

Stratford University
FairFax, VA, US
07.2016

Bachelors - information Technology

Vardaman College of Engineering
Telangana, India
03.2013

Skills

Core big data and Hadoop ecosystem

  • Hadoop Distributed File System (HDFS)
  • MapReduce concepts
  • Hive (HiveQL, Hive tables, partitioning, bucketing)
  • HBase (NoSQL storage and retrieval)
  • Pig, if relevant to your projects
  • Sqoop / Flume (for data ingestion)

Spark and PySpark

  • pyspark (RDDs, DataFrames, Datasets, Spark SQL, UDFs)
  • Spark Streaming (real-time data ingestion and processing)
  • Performance tuning (partitioning, caching, broadcast variables)
  • Integration with Kafka for streaming pipelines

ETL and data engineering

  • Data ingestion, transformation, and aggregation
  • Source to target mapping and schema design
  • Pre-aggregations and event joins
  • Building batch and streaming data pipelines
  • Workflow automation using BMC Control-M, Oozie, or Airflow

Certification

Databricks Certified Associate Data Engineer

Data engineering on Microsoft Azure

Timeline

Sr Developer

Cgnizant
11.2020 - Current

Big Data Hadoop Developer

Blue cross blue shield
03.2018 - 09.2020

Hadoop Developer

Walmart
07.2017 - 02.2018

Hadoop Developer

Blue Cross Blue Shield
11.2016 - 07.2017

Master of Science - Information Systems

Stratford University

Bachelors - information Technology

Vardaman College of Engineering
Sai Prasad Nayani