Analyzing and Predicting Pakistan Covid-19 Cases

0 Comments

In this blog post we will work on Covid-19 of Pakistan. This post is break into two part first we do the analyze part and then Predict cases for future day.

Before we begin just wanna mention that if you want to get more insights check out video tutorial as well.

Pakistan Covid-19 Prediction Tutorial: 
https://youtu.be/Pt76F5rHsRE
Global Covid-19 Prediction Tutorial:
https://www.youtube.com/watch?v=ZNi_3bcutkY

Lets first explore some key things about this pandemic, so that we know about the problem we are trying to solve.

WHAT IS CORONA VIRUS

The name “coronavirus” comes from the crown-like projections on their surfaces. “Corona” in Latin means “halo” or “crown.” Among humans, coronavirus infections most often occur during the winter months and early spring. People regularly become ill with a cold due to a coronavirus and may catch the same one about 4 months later. This is because coronavirus antibodies do not last for a long time. Also, the antibodies for one strain of coronavirus may be ineffective against another one.

The coronavirus pandemic is an ongoing pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).The outbreak was first identified in Wuhan, Hubei, China, in December 2019, and was recognised as a pandemic by the World Health Organization (WHO) on 11 March 2020. 

Symptoms

Common symptoms:
fever, tiredness, dry cough
Some people may experience:
aches and pains, nasal congestion, runny nose, sore throat, diarrhoea.

On average it takes 5–6 days from when someone is infected with the virus for symptoms to show, however it can take up to 14 days.

COVID-19 IN PAKISTAN

The COVID-19 pandemic was confirmed to have reached Pakistan on 26 February 2020, when a student in Karachi tested positive upon returning from Iran. By 18 March, cases had been registered in all four provinces, the two autonomous territories, and the federal territory of Islamabad.
As of 3 May 2020, there have been over 19,100 confirmed cases with 4,315 recoveries and 385 deaths in the country. Punjab has recorded the most cases at over 7,100, while Khyber Pakhtunkhwa has reported the most deaths in the country, a total of 172. The country has been put under a nation-wide lockdown until 9 May, which was initiated on 1 April and later extended twice.

TIME TO CODE…

The goal is to make a machine learning model that can predict and forecast new cases, deaths, cure cases based on past data of Pakistan and forecast new cases overall and at Province/state and City level as well as.

I already done the data extraction and cleaning part so let jump to analyzing some data

You can get the data from my git-hub repository
https://github.com/uzairaj/Covid19-Pakistan-Prediction-Model

#after importing libraries

#reading data
covid_data = pd.read_excel('../input/pak_data-v2.xlsx')
#printing head of our data, to know about columns
covid_data.head()
It looks like we have total 6 columns
#printing data types of each column
covid_data.dtypes

#start date of covid in Pakistan
covid_data.Date.min()

#max date
covid_data.Date.max()

#creating a copy of dataframe

pak_data = covid_data.copy()

### Now creating new columns like Total Confirmed, Total Recovered, Total Deaths and Active cases based on New, Deaths, Cured cases

pak_data['Total Confirmed Cases'] = 0
pak_data['Total Recovered'] = 0
pak_data['Total Deaths'] = 0
pak_data['Active Cases'] = 0
for i in range(0, len(pak_data)):
    if (i == 0):
        pak_data['Total Confirmed Cases'].iloc[i] = pak_data['New Cases'].iloc[i]
        pak_data['Total Recovered'].iloc[i] = pak_data['Cured Cases'].iloc[i]
        pak_data['Total Deaths'].iloc[i] = pak_data['Death Cases'].iloc[i]
        pak_data['Active Cases'].iloc[i] = pak_data['Active Cases'].iloc[i-1] + pak_data['New Cases'].iloc[i] - pak_data['Death Cases'].iloc[i] - pak_data['Cured Cases'].iloc[i]

    else:
        pak_data['Total Confirmed Cases'].iloc[i] = pak_data['Total Confirmed Cases'].iloc[i-1] + pak_data['New Cases'].iloc[i]
        pak_data['Total Recovered'].iloc[i] =  pak_data['Total Recovered'].iloc[i-1] + pak_data['Cured Cases'].iloc[i]
        pak_data['Total Deaths'].iloc[i] =  pak_data['Total Deaths'].iloc[i-1] + pak_data['Death Cases'].iloc[i]
        pak_data['Active Cases'].iloc[i] = pak_data['Active Cases'].iloc[i-1] + pak_data['New Cases'].iloc[i] - pak_data['Death Cases'].iloc[i] - pak_data['Cured Cases'].iloc[i]

#printing head after adding new column in data
pak_data.head()
Updated Dataframe
### This code will display cumulative records of Active, Recovered and Death cases for Pakistan

confirmed = pak_data['New Cases'].sum()
recovered = pak_data['Cured Cases'].sum()
deaths = pak_data['Death Cases'].sum()
active = confirmed - (recovered - deaths)

print(confirmed, recovered, deaths, active)

labels = ['Active Cases','Recovered Cases','Death Cases']
sizes = [active,recovered,deaths]
color= ['#66b3ff','green','red']
explode = []

for i in labels:
    explode.append(0.1)
    
plt.figure(figsize= (15,10))
plt.pie(sizes, labels=labels, autopct='%3.1f%%', startangle=9, explode =explode,colors = color)
centre_circle = plt.Circle((0,0),0.60,fc='white')

fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.title('Pakistan COVID-19 Cases',fontsize = 24)
plt.axis('equal')  
plt.tight_layout()
###This will show spread of cases in Pakistan by Scatter plot with dates used as x-axis and cases at y-axis

fig = go.Figure()
fig.add_trace(go.Scatter(x=pak_data['Date'], y=pak_data['Total Confirmed Cases'],
                    mode='lines+markers',marker_color='blue',name='Confimned Cases'))

fig.add_trace(go.Scatter(x=pak_data['Date'], y=pak_data['Total Recovered'],
                mode='lines+markers',marker_color='green',name='Recovered'))
fig.add_trace(go.Scatter(x=pak_data['Date'], y=pak_data['Total Deaths'], 
                mode='lines+markers',marker_color='red',name='Deaths'))
fig.update_layout(title_text='Coronavirus Cases in Pakistan',plot_bgcolor='rgb(275, 270, 273)',width=600, height=600)
fig.show()

PREDICTION TIME

PROPHET MODEL

Prophet is a tool that is used for forecasting data, it provides a practical approach to forecasting “at scale”. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

# This is predicting Total number of Confirmed Cases in Pakistan

confirmed = pak_data['Total Confirmed Cases'].values.tolist()
data = pd.DataFrame(columns = ['ds','y'])
data['ds'] = list(pak_data['Date'])
data['y'] = confirmed

prop=Prophet()
prop.fit(data)
future=prop.make_future_dataframe(periods=15)
prop_forecast=prop.predict(future)
forecast = prop_forecast[['ds','yhat']].tail(15)
#print(forecast)


fig = go.Figure()
fig.add_trace(go.Scatter(x=pak_data['Date'], y=pak_data['Total Confirmed Cases'],
                    mode='lines+markers',marker_color='green',name='Actual'))
fig.add_trace(go.Scatter(x=prop_forecast['ds'], y=prop_forecast['yhat_upper'],
                    mode='lines+markers',marker_color='Orange',name='Predicted'))
fig.update_layout(title_text = 'Confirmed Cases (Predicted vs Actual) using Prophet')
fig.update_layout(plot_bgcolor='rgb(275, 270, 273)',width=600, height=600)
fig.show()

It looks like on April 29 the actual cases were around 14.278K and our model predicted 14.486K confirmed cases.

ARIMA & SARIMA MODELS

In time series models we have two important models and they are ARIMA and SARIMA models, they both belongs to the concepts of ARMA models.

For reference:
ARMA: https://www.youtube.com/watch?v=HhvTlaN06AM 
ARIMA: https://www.youtube.com/watch?v=3UmyHed0iYE 
SARIMA: https://www.youtube.com/watch?v=WjeGUs6mzXg

######  ARIMA MODEL

cc = pak_data['Total Confirmed Cases'].values

# fit model
p,d,q = auto_arima(cc).order
print(p,d,q)

model = ARIMA(pak_data['Total Confirmed Cases'],order=(p,d,q))
arima = model.fit(disp=True)
forecast = arima.forecast(steps= 15)
pred = list(forecast[0])
print(pred)
start_date = pak_data['Date'].max()
prediction_dates = []
for i in range(15):
    date = start_date + datetime.timedelta(days=1)
    prediction_dates.append(date)
    start_date = date

fig = go.Figure()
fig.add_trace(go.Scatter(x=pak_data['Date'], y=pak_data['Total Confirmed Cases'],
                    mode='lines+markers',marker_color='green',name='Actual'))
fig.add_trace(go.Scatter(x=prediction_dates, y=pred,
                    mode='lines+markers',marker_color='Orange',name='Predicted'))
fig.update_layout(title_text = 'Confirmed cases Predicted vs Actual using ARIMA')
fig.update_layout(plot_bgcolor='rgb(275, 270, 273)',width=600, height=600)
fig.show()
######  SARIMA MODEL

cc = pak_data['Total Confirmed Cases'].values

# fit model
p,d,q = auto_arima(cc).order
print(p,d,q)

model = SARIMAX(cc, order=(p,d,q), seasonal_order=(0,0,0,0),measurement_error=True)#seasonal_order=(1, 1, 1, 1))
model_fit = model.fit(disp=False)
        
# make prediction
pred = model_fit.predict(len(cc), len(cc)+7)
print(pred)
        
start_date = pak_data['Date'].max()
prediction_dates = []
for i in range(15):
    date = start_date + datetime.timedelta(days=1)
    prediction_dates.append(date)
    start_date = date

fig = go.Figure()
fig.add_trace(go.Scatter(x=pak_data['Date'], y=pak_data['Total Confirmed Cases'],
                    mode='lines+markers',marker_color='green',name='Actual'))
fig.add_trace(go.Scatter(x=prediction_dates, y=pred,
                    mode='lines+markers',marker_color='Orange',name='Predicted'))
fig.update_layout(title_text = 'Confirmed cases Predicted vs Actual using SARIMA')
fig.update_layout(plot_bgcolor='rgb(275, 270, 273)',width=600, height=600)
fig.show()

These experiments which are mention is related to confirmed cases for other data like predicting Recovered, Death cases you can check out video tutorial or Kaggle notebook or refer to Github code and try it.

Pakistan Covid-19 Prediction Tutorial: 
https://youtu.be/Pt76F5rHsRE
Global Covid-19 Prediction Tutorial:
https://www.youtube.com/watch?v=ZNi_3bcutkY
Github: 
https://github.com/uzairaj
Blog: 
http://uzairadamjee.com/blog/
Kaggle:
https://www.kaggle.com/uzairadamjee/covid-19-pakistan-data-analysis-and-prediction
https://www.kaggle.com/uzairadamjee/analyze-forecasting-covid-19-data/

In the end just want to comment that, these are just numbers which we used for predicting possible number of cases that can be occurred in future.

What will happened GOD knows better.

Let us STAY POSITIVE in this time, maintain SOCIAL DISTANCE and PRAY for a better future.

“SO VERILY, WITH EVERY DIFFICULTY, THERE IS RELIEF.”
THANK YOU 🙂


Leave a Reply

Your email address will not be published. Required fields are marked *