Streaming Demo with Redpanda using NY Taxi Dataset
Table of Contents Redpanda Demo Project Architecture Session Terminal 1 (Preparations) Session Terminal 2 (Kafka Producer) Session Terminal 3 (Kafka Consumer) Check (Monitor) Output Overview This repository contains some homework solutions from module 6 (Streaming) in DTC Data Engineering Zoomcamp 2024. Instead of Kafka, here will use Red Panda , which is a drop-in replacement for Kafka. Ensure we have the following set up : Docker (module 1) PySpark (module 5) For this homework we will be using the files from Module 5 homework i.e. : Green 2019-10 data from here Note: Don't run these all steps on Jupyter Notebook. Otherwise the ipynb script file will grow very quickly when running the producer and consumer steps. Please run on terminal. All scripts created using pyhton (.py) extention Redpanda Streaming Demo Architecture Session Terminal 1 (Preparations) Open Operating System terminal Creat