YOLO (You Only Look Once) adalah salah satu model deep learning yang digunakan untuk deteksi objek secara real-time. Sejak diperkenalkan oleh Joseph Redmon pada tahun 2015, YOLO telah berkembang pesat dengan berbagai versi yang lebih cepat dan akurat. Salah satu model yang populer saat ini adalah YOLOv5, yang dirilis oleh Ultralytics pada tahun 2020. YOLOv5 menawarkan performa yang lebih cepat dan ringan dibandingkan versi sebelumnya serta didukung oleh berbagai fitur yang membuatnya ideal untuk diterapkan pada banyak skenario.
Arsitektur YOLOv5
Contents
Arsitektur YOLOv5 didesain agar lebih efisien baik dari segi kecepatan inferensi maupun akurasi. Ada beberapa elemen penting yang menyusun arsitektur YOLOv5, antara lain:
- Backbone: Bagian ini bertanggung jawab untuk mengekstraksi fitur dari gambar. YOLOv5 menggunakan Cross Stage Partial Networks (CSPNet) sebagai backbone-nya, yang memungkinkan efisiensi dalam hal komputasi tanpa kehilangan akurasi yang signifikan.
- Neck: Setelah fitur diekstraksi oleh backbone, bagian “neck” bertugas untuk mengumpulkan informasi dari berbagai tingkatan resolusi gambar. Pada YOLOv5, fitur ini menggunakan PANet (Path Aggregation Network) untuk menggabungkan informasi multi-skala dan memperbaiki hasil deteksi untuk objek dengan ukuran yang berbeda-beda.
- Head: Bagian terakhir dari arsitektur ini digunakan untuk memprediksi kotak-kotak pembatas (bounding boxes), kelas objek, dan kepercayaan deteksi. Setiap objek akan diprediksi berdasarkan lokasi dan kelas yang sesuai.
Pada postingan kali ini, kita akan Melatih Model Yolo5 untuk deteksi Objek dengan kasus yang cukup sederhana yaitu deteksi wajah dengan ukuran gambar yang kecil saja yaitu 320 x 320.
Adapun kode yolov5 yang akan kita gunakan kita ambil dari https://github.com/ultralytics/yolov5 kode nya sangat lengkap mulai dari training sampai testing, kalian bisa menggunakan git clone untuk download nya. atau kalian bisa membaca dokumentasinya secara langsung https://docs.ultralytics.com/yolov5/#explore-and-learn
Download Dataset
Kita download dulu https://universe.roboflow.com/rlggypface/face-detection-zspaa/dataset/1 dengan ukuran gambar yaitu 320 x 320 pixel dan letakan di folder sebagai berikut
yolov5/datasets/face
didalamnya ada folder images dan labels
Membuat configurasi dataset
Untuk membuat configurasi dataset berupa file *.yaml akan kita beri nama face_config_dataset.yaml
yolov5/data/face_config_dataset.yaml
isinya sebagai berikut yang kita anggap sama untuk trainign dan validasinya.
path: datasets/face # dataset root dir train: images # train images val: images # val images test: # test images (optional) # Classes names: 0: face
Membuat Configurasi Model
Configurasi model sudah ada beberapa contoh. Fleksibilitas model ini, ditambah dengan pilihan berbagai ukuran model (s, m, l, x), memungkinkan YOLOv5 untuk diadaptasi sesuai kebutuhan pengguna, baik untuk perangkat yang terbatas sumber dayanya maupun untuk sistem canggih dengan GPU yang kuat.
- yolov5l.yaml;
- yolov5m.yaml;
- yolov5n.yaml;
- yolov5s.yaml;
- yolov5x.yaml
kita hanya menggunakan yang dibawah ini saja yaitu
yolov5/models/yolov5m.yaml
isinya dari configurasi model diatas yaitu
# Ultralytics YOLOv5 🚀, AGPL-3.0 license # Parameters nc: 1 # number of classes depth_multiple: 0.67 # model depth multiple width_multiple: 0.75 # layer channel multiple anchors: - [10, 13, 16, 30, 33, 23] # P3/8 - [30, 61, 62, 45, 59, 119] # P4/16 - [116, 90, 156, 198, 373, 326] # P5/32 # YOLOv5 v6.0 backbone backbone: # [from, number, module, args] [ [-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2 [-1, 1, Conv, [128, 3, 2]], # 1-P2/4 [-1, 3, C3, [128]], [-1, 1, Conv, [256, 3, 2]], # 3-P3/8 [-1, 6, C3, [256]], [-1, 1, Conv, [512, 3, 2]], # 5-P4/16 [-1, 9, C3, [512]], [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32 [-1, 3, C3, [1024]], [-1, 1, SPPF, [1024, 5]], # 9 ] # YOLOv5 v6.0 head head: [ [-1, 1, Conv, [512, 1, 1]], [-1, 1, nn.Upsample, [None, 2, "nearest"]], [[-1, 6], 1, Concat, [1]], # cat backbone P4 [-1, 3, C3, [512, False]], # 13 [-1, 1, Conv, [256, 1, 1]], [-1, 1, nn.Upsample, [None, 2, "nearest"]], [[-1, 4], 1, Concat, [1]], # cat backbone P3 [-1, 3, C3, [256, False]], # 17 (P3/8-small) [-1, 1, Conv, [256, 3, 2]], [[-1, 14], 1, Concat, [1]], # cat head P4 [-1, 3, C3, [512, False]], # 20 (P4/16-medium) [-1, 1, Conv, [512, 3, 2]], [[-1, 10], 1, Concat, [1]], # cat head P5 [-1, 3, C3, [1024, False]], # 23 (P5/32-large) [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) ]
Training Model
Jika sudah siap semua, kita lakukan training dengan kode berikut
python train.py --data 'face_config_dataset.yaml' --weights '' --cfg 'yolov5m.yaml' --img 320 --rect --epochs 100
berikut output dari training diatas!
python train.py --data 'face_config_dataset.yaml' --weights '' --cfg 'yolov5m.yaml' --img 320 --rect --epochs 100 train: weights=, cfg=yolov5m.yaml, data=face_config_dataset.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=100, batch_size=2, imgsz=320, rect=True, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, evolve_population=data/hyps, resume_evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, ndjson_console=False, ndjson_file=False remote: Enumerating objects: 5, done. remote: Counting objects: 100% (5/5), done. remote: Compressing objects: 100% (5/5), done. remote: Total 5 (delta 0), reused 4 (delta 0), pack-reused 0 (from 0) Membongkar objek: 100% (5/5), 4.22 kibibita | 719.00 kibibita/detik, selesai. Dari https://github.com/ultralytics/yolov5 12b577c8..f7322921 master -> origin/master github: ⚠️ YOLOv5 is out of date by 1 commit. Use 'git pull' or 'git clone https://github.com/ultralytics/yolov5' to update. YOLOv5 🚀 v7.0-365-g12b577c8 Python-3.11.5 torch-2.3.0 CPU hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/ from n params module arguments 0 -1 1 5280 models.common.Conv [3, 48, 6, 2, 2] 1 -1 1 41664 models.common.Conv [48, 96, 3, 2] 2 -1 2 65280 models.common.C3 [96, 96, 2] 3 -1 1 166272 models.common.Conv [96, 192, 3, 2] 4 -1 4 444672 models.common.C3 [192, 192, 4] 5 -1 1 664320 models.common.Conv [192, 384, 3, 2] 6 -1 6 2512896 models.common.C3 [384, 384, 6] 7 -1 1 2655744 models.common.Conv [384, 768, 3, 2] 8 -1 2 4134912 models.common.C3 [768, 768, 2] 9 -1 1 1476864 models.common.SPPF [768, 768, 5] 10 -1 1 295680 models.common.Conv [768, 384, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 2 1182720 models.common.C3 [768, 384, 2, False] 14 -1 1 74112 models.common.Conv [384, 192, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 2 296448 models.common.C3 [384, 192, 2, False] 18 -1 1 332160 models.common.Conv [192, 192, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 2 1035264 models.common.C3 [384, 384, 2, False] 21 -1 1 1327872 models.common.Conv [384, 384, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 2 4134912 models.common.C3 [768, 768, 2, False] 24 [17, 20, 23] 1 24246 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]] YOLOv5m summary: 291 layers, 20871318 parameters, 20871318 gradients, 48.2 GFLOPs optimizer: SGD(lr=0.01) with parameter groups 79 weight(decay=0.0), 82 weight(decay=0.0005), 82 bias WARNING ⚠️ --rect is incompatible with DataLoader shuffle, setting shuffle=False albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) train: Scanning /Users/mulkansyarif/Desktop/yolov5/datasets/face/labels... 356 images, 0 backgrounds, 0 corrupt: 100%|██████████| 356/356 [00:07<00:00, 45.28i train: New cache created: /Users/mulkansyarif/Desktop/yolov5/datasets/face/labels.cache val: Scanning /Users/mulkansyarif/Desktop/yolov5/datasets/face/labels.cache... 356 images, 0 backgrounds, 0 corrupt: 100%|██████████| 356/356 [00:00<?, ?it/s] AutoAnchor: 5.31 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅ Plotting labels to runs/train/exp/labels.jpg... Image sizes 320 train, 320 val Using 2 dataloader workers Logging results to runs/train/exp Starting training for 100 epochs... Epoch GPU_mem box_loss obj_loss cls_loss Instances Size 0/99 0G 0.0837 0.01134 0 2 320: 30%|███ | 54/178 [00:45<01:33, 1.32it/s]
hasil setiap pelatihan akan disimpan secara incrementi di runs/train/exp untuk melihat semua pengaturan dan hyper paramaternya, bisa file opt.yaml
weights: '' cfg: /Users/mulkansyarif/Desktop/yolov5/models/yolov5m.yaml data: /Users/mulkansyarif/Desktop/yolov5/data/face_config_dataset.yaml hyp: lr0: 0.01 lrf: 0.01 momentum: 0.937 weight_decay: 0.0005 warmup_epochs: 3.0 warmup_momentum: 0.8 warmup_bias_lr: 0.1 box: 0.05 cls: 0.5 cls_pw: 1.0 obj: 1.0 obj_pw: 1.0 iou_t: 0.2 anchor_t: 4.0 fl_gamma: 0.0 hsv_h: 0.015 hsv_s: 0.7 hsv_v: 0.4 degrees: 0.0 translate: 0.1 scale: 0.5 shear: 0.0 perspective: 0.0 flipud: 0.0 fliplr: 0.5 mosaic: 1.0 mixup: 0.0 copy_paste: 0.0 epochs: 100 batch_size: 2 imgsz: 320 rect: true resume: false nosave: false noval: false noautoanchor: false noplots: false evolve: null evolve_population: data/hyps resume_evolve: null bucket: '' cache: null image_weights: false device: '' multi_scale: false single_cls: false optimizer: SGD sync_bn: false workers: 8 project: runs/train name: exp exist_ok: false quad: false cos_lr: false label_smoothing: 0.0 patience: 100 freeze: - 0 save_period: -1 seed: 0 local_rank: -1 entity: null upload_dataset: false bbox_interval: -1 artifact_alias: latest ndjson_console: false ndjson_file: false save_dir: runs/train/exp
epoch ke 100
Berikut hasil epoch ke 100
Epoch GPU_mem box_loss obj_loss cls_loss Instances Size 99/99 0G 0.01116 0.004289 0 2 320: 100%|██████████| 178/178 [02:25<00:00, 1.22it/s] Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 89/89 [00:49<00:00, 1.81it/s] all 356 356 0.997 1 0.995 0.87 100 epochs completed in 5.568 hours. Optimizer stripped from runs/train/exp/weights/last.pt, 42.1MB Optimizer stripped from runs/train/exp/weights/best.pt, 42.1MB Validating runs/train/exp/weights/best.pt... Fusing layers... YOLOv5m summary: 212 layers, 20852934 parameters, 0 gradients, 47.9 GFLOPs Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 89/89 [00:48<00:00, 1.84it/s] all 356 356 0.999 0.997 0.995 0.871 Results saved to runs/train/exp
Melatih Model Yolo5 untuk deteksi Objek
berikut hasil epoch setelah mencapai 100 dengan menggunakan perintah
python detect.py --weights 'runs/train/exp/weights/best.pt' --source 'gambar.jpg' --imgsz 320 --conf-thres 0.8 --data 'data/face_config_dataset.yaml' --line-thickness 1
File diatas disimpan di /runs/detect/exp
Selain menggunakan image file kalian bisa kok menggunakan camera
$ python detect.py --weights yolov5s.pt --source 0 # webcam img.jpg # image vid.mp4 # video screen # screenshot path/ # directory list.txt # list of images list.streams # list of streams 'path/*.jpg' # glob 'https://youtu.be/LNwODJXcvt4' # YouTube 'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream
berikut ketika menggunakan webcam
Melatih Model Yolo5 untuk deteksi Objek
Hal yang perlu kalian ketahui dari model YOLO yaitu cocok untuk deteksi objek dengan file image dimension rectangle, bila tidak kalian bisa menggunakan trik dibawah ini
Semula ukuran 300 x 1500 pixel
menjadi 1500 x 15oo pixel
Langkah cukup berhasil daripada yolov5 memaksa melakukan resize maka gambarnya akan menjadi aneh nanti karena rationya cukup besar yaitu 1500/300 = 5 kali
Retraining
Untuk lakukan melanjutkan training nya lagi, kalian bisa langsung memasukan weight nya
python train.py --data 'face_config_dataset.yaml' --weights 'runs/train/exp/weights/best.pt' --cfg 'yolov5m.yaml' --img 320 --rect --epochs 100
hasilnya
python train.py --data 'face_config_dataset.yaml' --weights 'runs/train/exp/weights/best.pt' --cfg 'yolov5m.yaml' --img 320 --rect --epochs 100 train: weights=runs/train/exp/weights/best.pt, cfg=yolov5m.yaml, data=face_config_dataset.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=100, batch_size=2, imgsz=320, rect=True, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, evolve_population=data/hyps, resume_evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest, ndjson_console=False, ndjson_file=False github: ⚠️ YOLOv5 is out of date by 1 commit. Use 'git pull' or 'git clone https://github.com/ultralytics/yolov5' to update. YOLOv5 🚀 v7.0-365-g12b577c8 Python-3.11.5 torch-2.3.0 CPU hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/ from n params module arguments 0 -1 1 5280 models.common.Conv [3, 48, 6, 2, 2] 1 -1 1 41664 models.common.Conv [48, 96, 3, 2] 2 -1 2 65280 models.common.C3 [96, 96, 2] 3 -1 1 166272 models.common.Conv [96, 192, 3, 2] 4 -1 4 444672 models.common.C3 [192, 192, 4] 5 -1 1 664320 models.common.Conv [192, 384, 3, 2] 6 -1 6 2512896 models.common.C3 [384, 384, 6] 7 -1 1 2655744 models.common.Conv [384, 768, 3, 2] 8 -1 2 4134912 models.common.C3 [768, 768, 2] 9 -1 1 1476864 models.common.SPPF [768, 768, 5] 10 -1 1 295680 models.common.Conv [768, 384, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 2 1182720 models.common.C3 [768, 384, 2, False] 14 -1 1 74112 models.common.Conv [384, 192, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 2 296448 models.common.C3 [384, 192, 2, False] 18 -1 1 332160 models.common.Conv [192, 192, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 2 1035264 models.common.C3 [384, 384, 2, False] 21 -1 1 1327872 models.common.Conv [384, 384, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 2 4134912 models.common.C3 [768, 768, 2, False] 24 [17, 20, 23] 1 24246 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]] YOLOv5m summary: 291 layers, 20871318 parameters, 20871318 gradients, 48.2 GFLOPs Transferred 480/481 items from runs/train/exp/weights/best.pt optimizer: SGD(lr=0.01) with parameter groups 79 weight(decay=0.0), 82 weight(decay=0.0005), 82 bias WARNING ⚠️ --rect is incompatible with DataLoader shuffle, setting shuffle=False albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) train: Scanning /Users/mulkansyarif/Desktop/yolov5/datasets/face/labels... 356 images, 0 backgrounds, 0 corrupt: 100%|██████████| 356/356 [00:07<00:00, 48.73i train: New cache created: /Users/mulkansyarif/Desktop/yolov5/datasets/face/labels.cache val: Scanning /Users/mulkansyarif/Desktop/yolov5/datasets/face/labels.cache... 356 images, 0 backgrounds, 0 corrupt: 100%|██████████| 356/356 [00:00<?, ?it/s] AutoAnchor: 5.31 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅ Plotting labels to runs/train/exp3/labels.jpg... Image sizes 320 train, 320 val Using 2 dataloader workers Logging results to runs/train/exp3 Starting training for 100 epochs... Epoch GPU_mem box_loss obj_loss cls_loss Instances Size 0/99 0G 0.0115 0.004215 0 2 320: 100%|██████████| 178/178 [02:20<00:00, 1.27it/s] Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 89/89 [00:46<00:00, 1.93it/s] all 356 356 0.999 0.997 0.995 0.858 Epoch GPU_mem box_loss obj_loss cls_loss Instances Size 1/99 0G 0.01629 0.004533 0 2 320: 100%|██████████| 178/178 [02:41<00:00, 1.10it/s] Class Images Instances P R mAP50 mAP50-95: 83%|████████▎ | 74/89 [00:39<00:08, 1.87it/s]