Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Last update: Dec 06, 2021

Related tags

Data Analysis kafka-to-spark-streaming

Overview

Using Streaming Twitter Data with Kafka and Spark

Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream API and Spark Streaming

Make sure that VPN is switched on, so that you can use Twitter. In some countries Twitter is blocked.

Moreover, you should have own consumer_key, consumer_secret, and access_token with its secret inside config.py file

Create environment using conda with Python 3.8:
- conda create -n python38 python=3.8
- conda activate python38
- Check requirements inside requirements.txt and install then using conda:
  - conda install -c conda-forge tweepy==4.4.0
  - conda install -c conda-forge kafka-python==2.0.2
Kafka should be installed in your machine, check the documentation for installation. if you use brew with Mac you can use brew install kafka
Start zookeeper: zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties, port: 2181
On another terminal window start broker: kafka-server-start /usr/local/etc/kafka/server.properties, port: 9092 - In terminal window list topics you have: kafka-topics --list --bootstrap-server localhost:9092
Create Kafka topic "tweeter" with 1 partition and no replication because we use local machine: kafka-topics --create --topic tweeter --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
Now list again, the topics you have: kafka-topics --list --bootstrap-server localhost:9092
Let's see what we have inside the "tweeter" topic kafka-console-consumer --bootstrap-server localhost:9092 --topic tweeter --from-beginning, absolutely noting), but when we start streaming, data will be generated
Now run python kafka_producer.py to start stream Twitter and push message to topic.
And now check that the data is inside topic with kafka-console-consumer --bootstrap-server localhost:9092 --topic tweeter --from-beginning
Congrats! You have done it!

So what's next?

You can use generated data with Kafka Stream and Spark Streaming, and practice more!

Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Related tags

Overview

Using Streaming Twitter Data with Kafka and Spark

Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream API and Spark Streaming

Owner

Rustam Zokirov

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology.

Stitch together Nanopore tiled amplicon data without polishing a reference

Geospatial data-science analysis on reasons behind delay in Grab ride-share services

This mini project showcase how to build and debug Apache Spark application using Python

An Aspiring Drop-In Replacement for NumPy at Scale

Data imputations library to preprocess datasets with missing data

Time ranges with python

Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

Evaluation of a Monocular Eye Tracking Set-Up

This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

Python for Data Analysis, 2nd Edition

A fast, flexible, and performant feature selection package for python.

Statistical Rethinking course winter 2022

CS50 pset9: Using flask API to create a web application to exchange stocks' shares.

Pandas and Dask test helper methods with beautiful error messages.

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Picka: A Python module for data generation and randomization.