For the full writeup of this project, click here.

If you prefer the GitHub version of the writeup, click here.

For the GitHub repository of this project, click here


Summary

Data is all around us, but getting that data and processing it into a readable format is usually the most time consuming part of any data pipeline. In this project, I focused on web scraping using BeautifulSoup. The flowchart below is the process that can be used for most websites and turn the internet itself into a valuable data source.

summary


Database Schema

database schema


Sample Visualization

sample visualization