Membuat Lag time series di Pandas

By | September 17, 2023
Print Friendly, PDF & Email
270 Views

Membuat Lag time series di Pandas – forecasting berbasis time series seperti Simple Moving Average telah digunakan sebagai metode forecasting secara sederhana dengan rumus nilai rerata sebelumnya akan digunakan untuk memprediksi/menghaluskan nilai berikutnya.

Misalkan kita akan menghaluskan nilai hari ke 6, maka akan kita hitung terlebih dahulu nilai rerata hari ke 1 sampai 5. Namun untuk membentuk data tersebut di pandas secara mudah bagaimana? Nanti kita akan menggunakan function shift untuk menggeser.

Sebagai contoh kita punya data dari tanggal 01  sampai 28 februari 2023. Mari kita generate saja dengan nama column t1

from datetime import datetime as dt
from datetime import timedelta
import numpy as np
import pandas as pd

t = np.arange(dt(2023,2,1), 
              dt(2023,3,1),
              timedelta(days=1)).astype(dt)

b = pd.DataFrame(t,columns=['t1'])

hasilnya

           t1
0  2023-02-01
1  2023-02-02
2  2023-02-03
3  2023-02-04
4  2023-02-05
5  2023-02-06
6  2023-02-07
7  2023-02-08
8  2023-02-09
9  2023-02-10
10 2023-02-11
11 2023-02-12
12 2023-02-13
13 2023-02-14
14 2023-02-15
15 2023-02-16
16 2023-02-17
17 2023-02-18
18 2023-02-19
19 2023-02-20
20 2023-02-21
21 2023-02-22
22 2023-02-23
23 2023-02-24
24 2023-02-25
25 2023-02-26
26 2023-02-27
27 2023-02-28

kita akan menggeser sebanyak 6 yang artinya

  • hari ke 1 sampai dengan 5 sebagai input
  • hari ke 6 sebagai target
b2 = b.copy()

target = 6
for i in range(1,target):
    buffer = b2['t1'].shift(-i)
    b2['t'+str(i+1)] = buffer
b2 = b2.dropna()

Perhatikan b2 sebagai berikut

           t1         t2         t3         t4         t5         t6
0  2023-02-01 2023-02-02 2023-02-03 2023-02-04 2023-02-05 2023-02-06
1  2023-02-02 2023-02-03 2023-02-04 2023-02-05 2023-02-06 2023-02-07
2  2023-02-03 2023-02-04 2023-02-05 2023-02-06 2023-02-07 2023-02-08
3  2023-02-04 2023-02-05 2023-02-06 2023-02-07 2023-02-08 2023-02-09
4  2023-02-05 2023-02-06 2023-02-07 2023-02-08 2023-02-09 2023-02-10
5  2023-02-06 2023-02-07 2023-02-08 2023-02-09 2023-02-10 2023-02-11
6  2023-02-07 2023-02-08 2023-02-09 2023-02-10 2023-02-11 2023-02-12
7  2023-02-08 2023-02-09 2023-02-10 2023-02-11 2023-02-12 2023-02-13
8  2023-02-09 2023-02-10 2023-02-11 2023-02-12 2023-02-13 2023-02-14
9  2023-02-10 2023-02-11 2023-02-12 2023-02-13 2023-02-14 2023-02-15
10 2023-02-11 2023-02-12 2023-02-13 2023-02-14 2023-02-15 2023-02-16
11 2023-02-12 2023-02-13 2023-02-14 2023-02-15 2023-02-16 2023-02-17
12 2023-02-13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 2023-02-18
13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 2023-02-18 2023-02-19
14 2023-02-15 2023-02-16 2023-02-17 2023-02-18 2023-02-19 2023-02-20
15 2023-02-16 2023-02-17 2023-02-18 2023-02-19 2023-02-20 2023-02-21
16 2023-02-17 2023-02-18 2023-02-19 2023-02-20 2023-02-21 2023-02-22
17 2023-02-18 2023-02-19 2023-02-20 2023-02-21 2023-02-22 2023-02-23
18 2023-02-19 2023-02-20 2023-02-21 2023-02-22 2023-02-23 2023-02-24
19 2023-02-20 2023-02-21 2023-02-22 2023-02-23 2023-02-24 2023-02-25
20 2023-02-21 2023-02-22 2023-02-23 2023-02-24 2023-02-25 2023-02-26
21 2023-02-22 2023-02-23 2023-02-24 2023-02-25 2023-02-26 2023-02-27
22 2023-02-23 2023-02-24 2023-02-25 2023-02-26 2023-02-27 2023-02-28

kita split x sebagai input dan y sebagai target

x = b2.iloc[:,0:5]
y = b2.iloc[:,5:6]

hasilnya

x
Out[64]: 
           t1         t2         t3         t4         t5
0  2023-02-01 2023-02-02 2023-02-03 2023-02-04 2023-02-05
1  2023-02-02 2023-02-03 2023-02-04 2023-02-05 2023-02-06
2  2023-02-03 2023-02-04 2023-02-05 2023-02-06 2023-02-07
3  2023-02-04 2023-02-05 2023-02-06 2023-02-07 2023-02-08
4  2023-02-05 2023-02-06 2023-02-07 2023-02-08 2023-02-09
5  2023-02-06 2023-02-07 2023-02-08 2023-02-09 2023-02-10
6  2023-02-07 2023-02-08 2023-02-09 2023-02-10 2023-02-11
7  2023-02-08 2023-02-09 2023-02-10 2023-02-11 2023-02-12
8  2023-02-09 2023-02-10 2023-02-11 2023-02-12 2023-02-13
9  2023-02-10 2023-02-11 2023-02-12 2023-02-13 2023-02-14
10 2023-02-11 2023-02-12 2023-02-13 2023-02-14 2023-02-15
11 2023-02-12 2023-02-13 2023-02-14 2023-02-15 2023-02-16
12 2023-02-13 2023-02-14 2023-02-15 2023-02-16 2023-02-17
13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 2023-02-18
14 2023-02-15 2023-02-16 2023-02-17 2023-02-18 2023-02-19
15 2023-02-16 2023-02-17 2023-02-18 2023-02-19 2023-02-20
16 2023-02-17 2023-02-18 2023-02-19 2023-02-20 2023-02-21
17 2023-02-18 2023-02-19 2023-02-20 2023-02-21 2023-02-22
18 2023-02-19 2023-02-20 2023-02-21 2023-02-22 2023-02-23
19 2023-02-20 2023-02-21 2023-02-22 2023-02-23 2023-02-24
20 2023-02-21 2023-02-22 2023-02-23 2023-02-24 2023-02-25
21 2023-02-22 2023-02-23 2023-02-24 2023-02-25 2023-02-26
22 2023-02-23 2023-02-24 2023-02-25 2023-02-26 2023-02-27

y
Out[65]: 
           t6
0  2023-02-06
1  2023-02-07
2  2023-02-08
3  2023-02-09
4  2023-02-10
5  2023-02-11
6  2023-02-12
7  2023-02-13
8  2023-02-14
9  2023-02-15
10 2023-02-16
11 2023-02-17
12 2023-02-18
13 2023-02-19
14 2023-02-20
15 2023-02-21
16 2023-02-22
17 2023-02-23
18 2023-02-24
19 2023-02-25
20 2023-02-26
21 2023-02-27
22 2023-02-28

dengan menggunakan cara tersebut diatas, kita secara mudah dapat menghitung simple moving average. Mari kita coba menggunakan angka random saja

#kita ganti dengan angka random
t = np.random.random([29])
b = pd.DataFrame(t,columns=['t1'])
b2 = b.copy()

target = 6
for i in range(1,target):
    buffer = b2['t1'].shift(-i)
    b2['t'+str(i+1)] = buffer
b2 = b2.dropna()

x = b2.iloc[:,0:5] 
y = b2.iloc[:,5:6]
#hitung Simple Moving Average
sma = np.array(x).mean(axis=1)
from matplotlib import pyplot as plt

plt.figure()
plt.plot(y['t6'].values)
plt.plot(sma)
plt.legend(['target','SMA'])
plt.show()

 

See also  Apa itu format TFRecordDataset

 

Leave a Reply