Jul 30, 2020How to use google storage with PySparkThis is not a guide on how to set up a spark or create a bucket on a google cloud platform. The documentation for setting up the Cloud Storage connector is lacking, so I decided to create this quick guide to access your google storage files with PySpark. Go to…Pyspark2 min readPyspark2 min read
Jun 3, 2020One-Stop Guide for Plotly and Dash Text Dataset Visualization Using Big Query and Flask: Good to GreatBackground: In the previous tutorial, we created an advanced data extraction pipeline from Airflow and discussed the different types of data engineering frameworks. If you are interested in Google Cloud Platform with Airflow, you can check out the first and second posts of this blog series. In this tutorial, we will…Plotly13 min readPlotly13 min read
May 22, 2020Airflow with Twitter Scraper, Google Cloud Storage, Big Query — tweets relating to Covid19Background: In Part I, we learned how to set up Airflow with Google Cloud Platform (GCS) using Docker. We then implemented the standard operators and sensors concept to our google cloud storage, followed by performing a file clean-up procedure. In Part II of this 4-part blog series, we will go over…Airflow8 min readAirflow8 min read
May 13, 2020Get Started with Airflow + Google Cloud Platform + DockerMy Motivation: Data engineering is the foundational base of every data scientist’s toolbox. After all, before we could produce any meaningful analysis that adds business value, data must be obtained, cleaned, and shaped. …Airflow6 min readAirflow6 min read