Membuat Lag time series di Pandas – forecasting berbasis time series seperti Simple Moving Average telah digunakan sebagai metode forecasting secara sederhana dengan rumus nilai rerata sebelumnya akan digunakan untuk memprediksi/menghaluskan nilai berikutnya.
Misalkan kita akan menghaluskan nilai hari ke 6, maka akan kita hitung terlebih dahulu nilai rerata hari ke 1 sampai 5. Namun untuk membentuk data tersebut di pandas secara mudah bagaimana? Nanti kita akan menggunakan function shift untuk menggeser.
Sebagai contoh kita punya data dari tanggal 01 sampai 28 februari 2023. Mari kita generate saja dengan nama column t1
from datetime import datetime as dt from datetime import timedelta import numpy as np import pandas as pd t = np.arange(dt(2023,2,1), dt(2023,3,1), timedelta(days=1)).astype(dt) b = pd.DataFrame(t,columns=['t1'])
hasilnya
t1 0 2023-02-01 1 2023-02-02 2 2023-02-03 3 2023-02-04 4 2023-02-05 5 2023-02-06 6 2023-02-07 7 2023-02-08 8 2023-02-09 9 2023-02-10 10 2023-02-11 11 2023-02-12 12 2023-02-13 13 2023-02-14 14 2023-02-15 15 2023-02-16 16 2023-02-17 17 2023-02-18 18 2023-02-19 19 2023-02-20 20 2023-02-21 21 2023-02-22 22 2023-02-23 23 2023-02-24 24 2023-02-25 25 2023-02-26 26 2023-02-27 27 2023-02-28
kita akan menggeser sebanyak 6 yang artinya
- hari ke 1 sampai dengan 5 sebagai input
- hari ke 6 sebagai target
b2 = b.copy() target = 6 for i in range(1,target): buffer = b2['t1'].shift(-i) b2['t'+str(i+1)] = buffer b2 = b2.dropna()
Perhatikan b2 sebagai berikut
t1 t2 t3 t4 t5 t6 0 2023-02-01 2023-02-02 2023-02-03 2023-02-04 2023-02-05 2023-02-06 1 2023-02-02 2023-02-03 2023-02-04 2023-02-05 2023-02-06 2023-02-07 2 2023-02-03 2023-02-04 2023-02-05 2023-02-06 2023-02-07 2023-02-08 3 2023-02-04 2023-02-05 2023-02-06 2023-02-07 2023-02-08 2023-02-09 4 2023-02-05 2023-02-06 2023-02-07 2023-02-08 2023-02-09 2023-02-10 5 2023-02-06 2023-02-07 2023-02-08 2023-02-09 2023-02-10 2023-02-11 6 2023-02-07 2023-02-08 2023-02-09 2023-02-10 2023-02-11 2023-02-12 7 2023-02-08 2023-02-09 2023-02-10 2023-02-11 2023-02-12 2023-02-13 8 2023-02-09 2023-02-10 2023-02-11 2023-02-12 2023-02-13 2023-02-14 9 2023-02-10 2023-02-11 2023-02-12 2023-02-13 2023-02-14 2023-02-15 10 2023-02-11 2023-02-12 2023-02-13 2023-02-14 2023-02-15 2023-02-16 11 2023-02-12 2023-02-13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 12 2023-02-13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 2023-02-18 13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 2023-02-18 2023-02-19 14 2023-02-15 2023-02-16 2023-02-17 2023-02-18 2023-02-19 2023-02-20 15 2023-02-16 2023-02-17 2023-02-18 2023-02-19 2023-02-20 2023-02-21 16 2023-02-17 2023-02-18 2023-02-19 2023-02-20 2023-02-21 2023-02-22 17 2023-02-18 2023-02-19 2023-02-20 2023-02-21 2023-02-22 2023-02-23 18 2023-02-19 2023-02-20 2023-02-21 2023-02-22 2023-02-23 2023-02-24 19 2023-02-20 2023-02-21 2023-02-22 2023-02-23 2023-02-24 2023-02-25 20 2023-02-21 2023-02-22 2023-02-23 2023-02-24 2023-02-25 2023-02-26 21 2023-02-22 2023-02-23 2023-02-24 2023-02-25 2023-02-26 2023-02-27 22 2023-02-23 2023-02-24 2023-02-25 2023-02-26 2023-02-27 2023-02-28
kita split x sebagai input dan y sebagai target
x = b2.iloc[:,0:5] y = b2.iloc[:,5:6]
hasilnya
x Out[64]: t1 t2 t3 t4 t5 0 2023-02-01 2023-02-02 2023-02-03 2023-02-04 2023-02-05 1 2023-02-02 2023-02-03 2023-02-04 2023-02-05 2023-02-06 2 2023-02-03 2023-02-04 2023-02-05 2023-02-06 2023-02-07 3 2023-02-04 2023-02-05 2023-02-06 2023-02-07 2023-02-08 4 2023-02-05 2023-02-06 2023-02-07 2023-02-08 2023-02-09 5 2023-02-06 2023-02-07 2023-02-08 2023-02-09 2023-02-10 6 2023-02-07 2023-02-08 2023-02-09 2023-02-10 2023-02-11 7 2023-02-08 2023-02-09 2023-02-10 2023-02-11 2023-02-12 8 2023-02-09 2023-02-10 2023-02-11 2023-02-12 2023-02-13 9 2023-02-10 2023-02-11 2023-02-12 2023-02-13 2023-02-14 10 2023-02-11 2023-02-12 2023-02-13 2023-02-14 2023-02-15 11 2023-02-12 2023-02-13 2023-02-14 2023-02-15 2023-02-16 12 2023-02-13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 2023-02-18 14 2023-02-15 2023-02-16 2023-02-17 2023-02-18 2023-02-19 15 2023-02-16 2023-02-17 2023-02-18 2023-02-19 2023-02-20 16 2023-02-17 2023-02-18 2023-02-19 2023-02-20 2023-02-21 17 2023-02-18 2023-02-19 2023-02-20 2023-02-21 2023-02-22 18 2023-02-19 2023-02-20 2023-02-21 2023-02-22 2023-02-23 19 2023-02-20 2023-02-21 2023-02-22 2023-02-23 2023-02-24 20 2023-02-21 2023-02-22 2023-02-23 2023-02-24 2023-02-25 21 2023-02-22 2023-02-23 2023-02-24 2023-02-25 2023-02-26 22 2023-02-23 2023-02-24 2023-02-25 2023-02-26 2023-02-27 y Out[65]: t6 0 2023-02-06 1 2023-02-07 2 2023-02-08 3 2023-02-09 4 2023-02-10 5 2023-02-11 6 2023-02-12 7 2023-02-13 8 2023-02-14 9 2023-02-15 10 2023-02-16 11 2023-02-17 12 2023-02-18 13 2023-02-19 14 2023-02-20 15 2023-02-21 16 2023-02-22 17 2023-02-23 18 2023-02-24 19 2023-02-25 20 2023-02-26 21 2023-02-27 22 2023-02-28
dengan menggunakan cara tersebut diatas, kita secara mudah dapat menghitung simple moving average. Mari kita coba menggunakan angka random saja
#kita ganti dengan angka random t = np.random.random([29]) b = pd.DataFrame(t,columns=['t1']) b2 = b.copy() target = 6 for i in range(1,target): buffer = b2['t1'].shift(-i) b2['t'+str(i+1)] = buffer b2 = b2.dropna() x = b2.iloc[:,0:5] y = b2.iloc[:,5:6] #hitung Simple Moving Average sma = np.array(x).mean(axis=1) from matplotlib import pyplot as plt plt.figure() plt.plot(y['t6'].values) plt.plot(sma) plt.legend(['target','SMA']) plt.show()