top of page
Search

Python Charting Application using John Hopkins Data tracking #Covid19

  • Writer: Will regan
    Will regan
  • Jun 24, 2020
  • 3 min read


So you want to track Covid 19 data using the John Hopkins data set (located here). To start out with, you'll need a python compiler with several libraries. I personally use pycharm to retrieve my libraries because it has a decent compiler and easy access to the terminal, which you'll need to pull data from the John Hopkins github daily, or whenever you decide to run the project.


To start create a file structure like so:


COVID-19 is where I stored the Covid data from John Hopkins, and Covid19_US_County is where I've written the code.


Make sure you've pulled the data from the github so you have the files, you'll need that local copy on your machine for this to work: (get it here).


The application I wrote is based on two files, a DataIinit.py and a covid_example.py. The data file is meant to create and retrieve the sql database. The covid_example.py file is meant to utilize the retrieved database to create the data and chart it. Depending on your style, you can create multiple covid files creating different types of charts, or you can cram them all into one file, as I did with covid.py. That file creates 7 different charts!


Let's go through the code. We'll start with the DataInit.py file. Starting with the import statements.

If any of these libraries are not already available in your environment, you'll need to download them. Numpy is extremely handy.


Next, create a class header. This is where all the code goes.

Next, create a static function which we'll utilize from outside the class to retrieve the data.

Below the function, fill in the definition, we'll start with a few key variables.


Depending on your coding ability, you may be able to auto generate these dates (which are also file names) but I was not really motivated to do this at this point. The key refers to a specific county. You can also use a specific country, such as "Brazil" or "Sweden." There's limitations, because of the datasets we are targeting, but I'll let you find those on your own.


Next, we are going to create an empty database with a table and columns for the data we are pulling.

Next, we'll create a for loop to iterate over the dates, and populate the database with each file's data.

At this point, you may have to change the file system depending on what you named your folders and where you placed them, according to your dev environment. If you aren't getting any errors, proceed onward to return the data.

Here's a link to the final code for the DataInit.py.

So far, the code really does nothing, because we haven't had a class call our function yet. So the next step is to create a covid_example.py. This file will call your init function, retrieve the data, and display it, and also save the file for future use.

Matplotlib is probably the hardest import to get here, but if you don't have it, the data you download will be difficult to display. Matplotlib is an invaluable library.


Next write some preparation code, and declare some variables.

So this initial code does a number of things. First, it uses your function from DataInit.py. Then then it will create the folder for where you will save the data. The g_colors variable is for charting purposes. The firstDate and lastDate variables are for displaying the date range on a chart. The last 3 variables are going to be populated as the code traverse the data, but they will be data points on the chart.

These three commands do as follows, first we create a cursor from the database we returned earlier. Next, we execute an SQL statement which gets the data we'd like to use. Finally, we create a multidimensional array from the data we fetched. That's what we'll use next.

This is a basic loop which appends the data to the arrays. The if statement is there to compensate for the first 23 days where "active" was not populated in our local area, so needed to be created by subtracting deaths from total cases. If you delete the data correction line, you'll notice a jump in active cases in some areas, but that's it.


Finally, we display the data. This is mostly matplotlib stuff, so I'll post it all at once.

That's it. If it worked, you should get something like this: Note, because my local area doesn't report recoveries yet, "active" cases are still high.

Good luck with data. If you have any questions, please contact using any of the methods displayed on this site. There should be plenty of code on the Git repository if you want to branch out and do other things. Polynomials is what I was working on last, but they are a bit tricky to get to display right, and can be misleading. Happy plotting and #StayHomeStaySafe!




 
 
 

Comments


Post: Blog2_Post

541-633-5836

©2020 by Will Regan - Game Developer. Proudly created with Wix.com

bottom of page