Vishva Mahadevan

Vishva Mahadevan

Software Developer

Hey, I'm Vishva Mahadevan, a passionate Software Engineer. Welcome to my personal space on the web!

ToolsBookmarks

Software Developer 2 at Gupshup

- PresentBangalore, India

Role Overview

As a Software Developer 2 at Gupshup, I transitioned to the analytics team where I focus on building and optimizing large-scale data processing pipelines. This role has deepened my understanding of distributed systems and data analytics, presenting unique challenges different from traditional application development.

Key Achievements

  • Architected and implemented analytics pipelines capable of processing 100 million events per day using Apache Flink
  • Developed both streaming and batch processing solutions to handle diverse analytical workloads
  • Optimized data storage and processing patterns through careful consideration of serialization and compression techniques
  • Implemented efficient data storage solutions using Apache Parquet and query capabilities with Amazon Athena
  • Gained deep insights into distributed systems and their specific challenges in analytics contexts

Technical Challenges Overcome

  • Adapted Java development practices to meet the unique requirements of distributed processing
  • Optimized object creation and management for better performance in Flink pipelines
  • Mastered the complexities of stream processing and state management
  • Implemented efficient serialization and compression strategies for large-scale data handling

Technologies Used

  • Processing Framework: Apache Flink
  • Programming: Java for Distributed Systems
  • Storage: Apache Parquet
  • Analytics: Amazon Athena
  • Cloud Infrastructure: AWS Services
  • Data Processing: Batch and Stream Processing Pipelines

Impact

My work has enabled the processing of massive data volumes efficiently, providing valuable insights for business decisions while maintaining system performance and reliability in a distributed environment.

Diving Deep into Analytics: My Journey from Services to Streams

When I got promoted to Software Developer 2 at Gupshup and moved to the analytics team, I quickly realized that this wasn't just another project switch – it was entering a completely different realm of distributed computing. The transition from traditional service-based architecture to stream processing opened my eyes to new ways of thinking about data and systems at scale.

The Paradigm Shift

The first eye-opening moment came when I realized that my usual Java coding patterns weren't going to work in analytics. In the world of Apache Flink and stream processing, every line of code needs to be thought through differently. The same structured code that worked perfectly in REST APIs could become a performance bottleneck or even a show-stopper in a streaming environment.

Scaling to 100 Million Events

Our primary challenge was handling 100 million events per day efficiently. This required a deep understanding of:

Data Processing Fundamentals

  • The importance of proper serialization and compression
  • Memory management in distributed systems
  • Stateful vs. stateless processing
  • The critical difference between streaming and batch pipelines

Technical Implementation

We implemented both streaming and batch pipelines using Apache Flink with Java. Some key learning points were:

  1. Object Creation: Every Java object needs careful consideration. In a high-throughput environment, even small inefficiencies get magnified millions of times.

  2. Simplicity is Key: Complex objects and processing patterns that work fine in traditional applications can become bottlenecks in stream processing. We learned to keep things as simple as possible.

  3. Storage Optimization: Working with Apache Parquet and AWS Athena taught us the importance of proper data storage formats and query optimization.

The Infrastructure Challenge

One of the most challenging aspects was setting up the environment with Kubernetes and the Flink Kubernetes operator. This required:

  • Understanding how Flink jobs deploy and scale on Kubernetes
  • Managing state backup and recovery
  • Handling job upgrades without data loss
  • Ensuring proper resource allocation and utilization

Cost Optimization Success

A significant achievement was participating in a cost optimization initiative that resulted in a 40% reduction in infrastructure costs. This involved:

  • Analyzing resource usage patterns
  • Right-sizing our Kubernetes clusters
  • Optimizing data storage and processing patterns
  • Implementing efficient scaling policies

Documentation: The Unsung Hero

One learning that stands out is the importance of documentation. In the world of analytics, where problems can be complex and solutions non-obvious, maintaining detailed documentation of issues and solutions became crucial. This helped:

  • Speed up problem resolution
  • Share knowledge across the team
  • Maintain system reliability
  • Reduce operational overhead

Key Learnings

  1. Think Distributed: Every piece of code needs to be thought of in terms of how it will behave when distributed across multiple nodes.

  2. Performance is Key: In analytics, performance isn't just about response time – it's about processing massive amounts of data efficiently.

  3. Resource Awareness: Understanding resource utilization is crucial when dealing with big data processing.

  4. Simplicity Wins: The simpler your code and architecture, the easier it is to maintain and scale.

Looking Forward

This transition to analytics has been a transformative experience. It's shown me how different aspects of software engineering – from code organization to infrastructure management – need to be approached differently when dealing with big data and stream processing.

The challenges of handling 100 million events daily have taught me the importance of:

  • Thinking at scale from day one
  • Understanding the entire data pipeline
  • Keeping performance in mind at every step
  • Maintaining robust documentation

For any developer looking to move into analytics, my advice would be to be prepared to challenge your existing assumptions about software development. The rules are different here, but the opportunities to learn and grow are immense.