Fill the missing date values in a Pandas Dataframe column

I’m using Pandas to store stock prices data using Data Frames. There are 2940 rows in the dataset. The Dataset snapshot is displayed below:

enter image description here

The time series data does not contain the values for Saturday and Sunday. Hence missing values have to be filled.

Here is the code I’ve written but it is not solving the problem:

import pandas as pd
import numpy as np
import os
os.chdir('C:/Users/Admin/Analytics/stock-prices')

data  = pd.read_csv('stock-data.csv')

# PriceDate Column - Does not contain Saturday and Sunday stock entries
data['PriceDate'] =  pd.to_datetime(data['PriceDate'], format='%m/%d/%Y')
data = data.sort_index(by=['PriceDate'], ascending=[True])


# Starting date is Aug 25 2004
idx = pd.date_range('08-25-2004',periods=2940,freq='D')


data = data.set_index(idx)
data['newdate']=data.index
newdate=data['newdate'].values   # Create a time series column   


data = pd.merge(newdate, data, on='PriceDate', how='outer')

How to fill the missing values for Saturday and Sunday?

Answer

I think you can use resample with ffill or bfill, but before set_index from column PriceDate:

print (data)
   ID  PriceDate  OpenPrice  HighPrice
0   1  6/24/2016          1          2
1   2  6/23/2016          3          4
2   2  6/22/2016          5          6
3   2  6/21/2016          7          8
4   2  6/20/2016          9         10
5   2  6/17/2016         11         12
6   2  6/16/2016         13         14
data['PriceDate'] =  pd.to_datetime(data['PriceDate'], format='%m/%d/%Y')
data = data.sort_values(by=['PriceDate'], ascending=[True])
data.set_index('PriceDate', inplace=True)
print (data)
            ID  OpenPrice  HighPrice
PriceDate                           
2016-06-16   2         13         14
2016-06-17   2         11         12
2016-06-20   2          9         10
2016-06-21   2          7          8
2016-06-22   2          5          6
2016-06-23   2          3          4
2016-06-24   1          1          2

data = data.resample('D').ffill().reset_index()
print (data)
   PriceDate  ID  OpenPrice  HighPrice
0 2016-06-16   2         13         14
1 2016-06-17   2         11         12
2 2016-06-18   2         11         12
3 2016-06-19   2         11         12
4 2016-06-20   2          9         10
5 2016-06-21   2          7          8
6 2016-06-22   2          5          6
7 2016-06-23   2          3          4
8 2016-06-24   1          1          2

data = data.resample('D').bfill().reset_index()
print (data)
   PriceDate  ID  OpenPrice  HighPrice
0 2016-06-16   2         13         14
1 2016-06-17   2         11         12
2 2016-06-18   2          9         10
3 2016-06-19   2          9         10
4 2016-06-20   2          9         10
5 2016-06-21   2          7          8
6 2016-06-22   2          5          6
7 2016-06-23   2          3          4
8 2016-06-24   1          1          2

Attribution
Source : Link , Question Author : User456898 , Answer Author : jezrael

Leave a Comment