Summary
Overview
Work History
Education
Skills
Timeline
Generic

Sumana Devireddy

Mount Juliet,TN

Summary

AWS Data Engineer Around 9 years of experience in implementing Data warehouse and data base applications with Informatica ETL in addition with data modeling and reporting tools on Apache Hadoop ecosystem, Teradata, Oracle, DB2, SQL Server, Hadoop MapReduce, database applications using Apache.Automation of data workflows using Python, Airflow, and AWS services. Managing data infrastructure with Terraform for cloud-native solutions. Building and optimizing Spark jobs in Databricks for large-scale data processing streamlining deployment and monitoring through CI/CD tools

Overview

9
9
years of professional experience

Work History

DataOps Engineer

Asurion
06.2023 - Current
  • Working with Apache Spark for distributed data processing and transformation within Databricks. This includes writing optimized Spark jobs in Python (PySpark), Scala, or SQL for large-scale data transformations.
  • Setting up monitoring and alerting for Databricks clusters and jobs. This involves integrating with tools like AWS CloudWatch for real-time insights into performance of clusters and jobs.
  • Managing Databricks logs to ensure visibility into job failures, resource usage, and performance bottlenecks.
  • Orchestrated complex ETL/ELT pipelines with Apache Airflow, ensuring reliable data flow and task dependencies.
  • Automated infrastructure provisioning using Terraform, managing cloud resources like AWS S3, EC2, RDS, and Lambda in data pipelines.
  • Cost analysis on AWS resources.
  • starburst Presto support and setting up new hive metastore using apache Spark.
  • Implemented CI/CD pipelines for data workflows, automating deployment and testing with Python-based scripts in Jenkins and GitHub Actions.

AWS DATA ENGINEER/DATABRICKS 

Cigna
08.2022 - 05.2023
  • Set up monitoring tools like CloudWatch alarms to ensure optimal performance of the cloud environment.
  • Developed real-time streaming solutions using Kinesis Firehose, Streams and Analytics along with Apache Kafka and Flink on EMR clusters.
  • Developed and maintained AWS EC2, S3, EMR clusters with Spark and Hive for data processing.
  • Designed and implemented ETL pipelines using Glue, Athena, and other AWS services.
  • created ETL Databricks pipeline and automated on Terraform framework in CI/CD pipeline
  • created External tables in AWS glue and access through data in Databricks.
  • Identified and corrected bugs for maintaining cloud stack functionality.
  • Traced and corrected potential network issues by analyzing systematic vulnerabilities.
  • Automated Delta data delete and insert in RTIM Tool.

AWS Data Engineer

Capital One
08.2019 - 07.2022
  • Developed Spark applications with Scala and Python and implemented Apache Spark for data processing from various streaming sources.
  • Worked on different file formats reading/writing data with CSV, Parquet, JSON, and Avro for data Analytics and load data into spark for ETL transformations.
  • Experience in creating Lambda functions for SNS alerts and creating EMR instances to run applications in cron scheduler.
  • Load and maintain data in DB2, Aurora MySQL, Snowflake, and Postgres databases.
  • Experience in creating AWS infrastructure, Events, S3 bucket policies, IAM roles, EC2 instances, and SNS topics.
  • Migrated an existing on-premises application to AWS and used AWS services like EC2 and S3 for small data sets processing and storage, experienced in maintaining the Hadoop cluster on AWS EMR.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using.
  • Spark Context, Spark-SQL, Data Frames, and pair RDDs.
  • Experience in creating Lambda functions to automate scheduled jobs in EMR cluster in-step functions.
  • Participated in AWS TREx activities.

DATABASE ANALYST 

Wells Fargo
08.2015 - 07.2019
  • Experience with Process data into HDFS by developing solutions analyzing the data using MapReduce, Pig, Hive, and producing summary results from Hadoop to downstream systems.
  • Developed complete ETL process in Teradata by writing stored procedures and complex SQL’s.
  • Developed various ETL transformation scripts using Hive to create refined datasets analytics use cases.
  • Worked in improving the performance and optimization of the existing algorithms using Spark.
  • Worked on performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDDs, and Spark YARN.
  • Extensive experience in importing and exporting data using stream processing platforms like Flume and Kafka.
  • Developed UNIX shell scripts to load large number of files into HDFS from local file system.

Education

Masters - Information Technology

VIU
05.2015

Bachelors - Electronics and communications

JNTU
05.2013

Skills

  • Spark SQL, PySpark
  • Python, Scala, Java
  • Databricks, MYSQL
  • Control-M Athena
  • PL/SQL, Snowflake
  • GitHub Linux
  • MYSQL Cassandra
  • Agile, SDLC
  • AWS Cloud Services, S3, EC2 , EMR
  • EMR SNS
  • Lambda CloudWatch
  • Teradata Admin
  • Linux/Unix
  • Agile S3 EC2
  • EMR SNS
  • Lambda CloudWatch

Timeline

DataOps Engineer

Asurion
06.2023 - Current

AWS DATA ENGINEER/DATABRICKS 

Cigna
08.2022 - 05.2023

AWS Data Engineer

Capital One
08.2019 - 07.2022

DATABASE ANALYST 

Wells Fargo
08.2015 - 07.2019

Bachelors - Electronics and communications

JNTU

Masters - Information Technology

VIU
Sumana Devireddy