在進行AI圖像上, 常會看到MS COCO的資料集,
COCO 的圖片資料集,提供3種標注檔:object instances(用於物件偵測), person keypoints(人的關鍵點,用於姿態識別)以及 image captions(圖像標題, 5 captions per image) , 每種標注類型都有相應的json 檔。標注檔也分好了訓練集、驗證集。
with open(annotation_file, "r") as f:
data = json.load(f)
annotations=data["annotations"]
images=data["images"]
categories=data["categories"]
print(f"Number of images: {len(annotations)}")
print(f"Number of images: {len(images)}")
print(f"Number of images: {len(categories)}")
The COCO dataset has been downloaded and extracted successfully. Number of images: 36781 Number of images: 5000 Number of images: 80images[60] => 取出index 60 這張圖的資訊{'license': 1, 'file_name': '000000360661.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000360661.jpg', 'height': 480, 'width': 640, 'date_captured': '2013-11-18 21:33:43', 'flickr_url': 'http://farm4.staticflickr.com/3670/9709793032_f9ee4f0aa2_z.jpg', 'id': 360661}annotations[60] => 取出index 60 annotations資訊
{'segmentation': [[267.51, 222.31, 292.15, 222.31, 291.05, 237.02, 265.67, 237.02]], 'area': 367.89710000000014, 'iscrowd': 0, 'image_id': 525083, 'bbox': [265.67, 222.31, 26.48, 14.71], 'category_id': 72, 'id': 34096}annotation{ "id": int, "image_id": int, "category_id": int, "segmentation": RLE or [polygon], "area": float, "bbox": [x,y,width,height], "iscrowd": 0 or 1, }
每一張圖片會有一個image_id, 而一張圖可能包含一個以上的單一物件或群物件. 針對每一個物件,不論是單一物件或群物件, 都會用一個annotation來表現物件內容.. 一張圖會有多個annotation, 即多個物件annotation{ "id": int, ==> 物件id "image_id": int, ==> 所屬的圖片 "category_id": int, ==>此物件的類別id "segmentation": RLE or [polygon], ==> 單一物件或一群物件的區域描述 "area": float, ==> 物件區域的Pixel總數 "bbox": [x,y,width,height], ==> bounding box的座標 "iscrowd": 0 or 1, ==> 0: 單一物件, 1: 一群物件 (如:一群觀眾)}
其中"segmentation": 若為單一物件, 則是以一個多邊形的座標點 [X1,Y1,X2,Y2, ....] 來描述此物件的區塊位置.
若是一群物件的區域描述, 如要描述一群蘋果,則會用Mask的方式來描述,如下圖所示。一群物件的區域描述, 即iscrowd=1, 則segmentation的內容為{'counts': [671, 10, 2, 2, 4, 22, 6, 31, 1, 11, 1, 10, 379, 16, 1, 25, 5, 55, 378, 43, 4, 55, 378, 44, 3, 55, 378, 44, 3, 55, 378, 44, 3, 55, 378, 44, 3, 55, 378, 44, 4, 54, 379, 29, 1, 16, 1, 54, 380, 28, 2, 15, 2, 53, 382, 23, 6, 15, 1, 8, 21, 1, 3, 3, 5, 12, 384, 20, 8, 16, 40, 12, 384, 16, 14, 15, 40, 10, 386, 10, 21, 14, 40, 8, 388, 8, 22, 15, 41, 3, 393, 3, 25, 15, 465, 15, 465, 15, 466, 14, 467, 13, 468, 12, 469, 10, 471, 8, 474, 3, 983, 7, 472, 9, 470, 11, 454, 6, 1, 20, 452, 28, 451, 30, 449, 31, 448, 33, 447, 34, 446, 35, 445, 35, 445, 35, 445, 35, 445, 36, 445, 36, 445, 35, 447, 33, 450, 30, 450, 30, 450, 12, 1, 17, 451, 10, 3, 16, 452, 8, 6, 13, 455, 4, 12, 8, 474, 3, 50865, 6, 459, 6, 8, 11, 454, 27, 452, 29, 450, 31, 448, 32, 448, 32, 448, 32, 448, 32, 448, 32, 448, 32, 448, 31, 450, 29, 452, 20, 2, 4, 456, 7, 1, 3, 1, 4, 7174, 6, 2, 6, 4, 14, 447, 34, 445, 36, 443, 38, 442, 38, 442, 38, 442, 38, 442, 38, 442, 38, 442, 38, 443, 36, 445, 34, 446, 6, 19, 6, 450, 3, 478, 1, 42714, 6, 473, 8, 471, 10, 469, 15, 465, 18, 462, 19, 461, 21, 459, 22, 458, 24, 456, 26, 455, 25, 456, 24, 458, 22, 461, 18, 463, 16, 466, 3, 5, 3, 3840, 7, 463, 20, 459, 22, 457, 27, 448, 33, 446, 35, 437, 44, 435, 46, 433, 47, 432, 48, 432, 48, 432, 48, 432, 48, 432, 30, 4, 14, 432, 29, 7, 12, 432, 29, 8, 10, 433, 29, 9, 8, 435, 13, 1, 1, 2, 10, 12, 3, 439, 12, 6, 7, 456, 10, 471, 3, 3, 2, 474, 1, 478, 2, 477, 3, 476, 4, 476, 9, 470, 11, 469, 12, 468, 13, 467, 14, 7, 1, 458, 23, 457, 23, 457, 23, 458, 22, 459, 21, 461, 19, 463, 18, 462, 19, 461, 20, 1, 9, 450, 33, 447, 34, 446, 36, 444, 37, 443, 38, 443, 38, 443, 37, 445, 35, 450, 7, 2, 21, 459, 21, 460, 20, 461, 19, 463, 3, 3, 10, 471, 8, 474, 3, 18209, 1, 479, 2, 478, 3, 477, 4, 476, 5, 475, 6, 474, 7, 474, 10, 471, 8, 474, 4, 5302, 7, 447, 4, 26, 4, 445, 6, 1, 15, 11, 3, 443, 29, 6, 3, 442, 32, 4, 2, 442, 38, 442, 38, 442, 38, 442, 38, 442, 38, 442, 37, 444, 16, 1, 18, 446, 8, 1, 3, 6, 3, 4, 8, 449, 3, 22, 4, 46076, 6, 468, 13, 466, 15, 461, 20, 459, 21, 458, 22, 457, 23, 457, 23, 457, 23, 457, 23, 457, 22, 458, 21, 459, 20, 460, 20, 461, 19, 462, 18, 463, 17, 462, 18, 461, 19, 460, 19, 461, 19, 461, 19, 461, 19, 461, 19, 461, 19, 461, 19, 461, 18, 463, 16, 465, 8, 1, 3, 470, 4, 31194, 12, 9, 6, 452, 29, 445, 36, 443, 38, 441, 39, 435, 45, 434, 46, 427, 53, 426, 54, 425, 55, 424, 55, 425, 54, 426, 53, 427, 51, 429, 25, 1, 24, 430, 22, 5, 22, 210], 'size': [480, 640]}References:
尚有其他訓練影像的標記格式, 如PASCAL VOC 以XML格式儲存, TensorFlow Object Detection 以.csv 儲存, 而Darknet (Yolo) 以.txt 儲存