117 Views
Kalau kalian butuh dataset untuk keperluan OCR, berikut dataset yang bisa kalian gunakan
| Dataset | Train Samples | Test Samples | Information |
|---|---|---|---|
| FUNSD | 149 | 50 | |
| SROIE | 626 | 360 | |
| CORD | 800 | 100 | |
| IIIT5K | 2000 | 3000 | |
| SVT | 100 | 249 | |
| SVHN | 33402 | 13068 | Character Localization |
| SynthText | 772875 | 85875 | |
| IC03 | 246 | 249 | |
| IC13 | 229 | 233 | external resources |
| IMGUR5K | 7149 | 796 | Handwritten / external resources |
| WILDRECEIPT | 1268 | 472 | external resources |
| COCOTEXT | 13880 | 3261 | external resources / legible filtered |
Comics Text Detection and Recognition
https://github.com/gsoykan/comics_text_plus
referensi:
https://mindee.github.io/doctr/using_doctr/using_datasets.html