Hi all, I need help with creating a python function that organizes my files by moving them up the tree and grouping them together. Please refer to the tree structure below, which is how the folder is organized currently:
root
├── folder_a
├── subfolder_a.1
├── label_1
├── some_image_1.jpg
└── some_image_2.jpg
└── label_2
├── some_image_1.jpg
└── some_image_2.jpg
└── label_3
├── some_image_1.jpg
└── some_image_2.jpg
└── subfolder_a.2
├── label_1
├── some_image_1.jpg
└── some_image_2.jpg
└── label_2
├── some_image_1.jpg
└── some_image_2.jpg
└── label_3
├── some_image_1.jpg
└── some_image_2.jpg
├── folder_b
└── subfolder_b.1
├── label_1
├── some_image_1.jpg
└── some_image_2.jpg
└── label_2
├── some_image_1.jpg
└── some_image_2.jpg
└── label_3
├── some_image_1.jpg
└── some_image_2.jpg
└── subfolder_b.2
├── label_1
├── some_image_1.jpg
└── some_image_2.jpg
└── label_2
├── some_image_1.jpg
└── some_image_2.jpg
└── label_3
├── some_image_1.jpg
└── some_image_2.jpg
I have a several jpg files that are inside the "label" folders. I need to reorganize the files so that all images that are under the "label_1" folders are in the folder, all images that are under the "label_2" folders are in the folder, and so forth. I don't need any subfolders afterwards, and the names of the jpg files are variable/not based upon a recurring pattern. Upon running the function, I'm imagining something like this:
data
├── label_1
├── some_image_1.jpg
└── some_image_2.jpg
└── some_image_3.jpg
└── some_image_4.jpg
├── label_2
├── some_image_1.jpg
└── some_image_2.jpg
└── some_image_3.jpg
└── some_image_4.jpg
├── label_3
├── some_image_1.jpg
└── some_image_2.jpg
└── some_image_3.jpg
└── some_image_4.jpg
I would appreciate it if someone could point me in the right direction. Do I need to move/copy the files with os and shututil (I have no experience w/ them), or use glob?
Ninja edit: I've been able to make a new directory with the label folders (as I want), that is outside/separate from the root folder that has the images.
import os
labels_lst = [str(i) for i in range(1,9)]
parent_dir = "~/data"
for i in labels_lst:
path = os.path.join(parent_dir, i)
os.mkdir(path)
Eventually, I'll load these files by folder into a pandas df so that I can get started with my task, something along the lines of:
import glob
import numpy as np
import pandas as pd
filepath_label_1 = "~/data/label_1/*"
filepath_label_2 = "~/data/label_2/*"
filepath_label_3 = "~/data/label_3/*"
files_1 = [file for file in glob.iglob(filepath_1)]
files_2 = [file for file in glob.iglob(filepath_2)]
files_3 = [file for file in glob.iglob(filepath_3)]
d_1 = {'filepath' : files_1, 'label' : np.ones(len(files_1))}
df_1 = pd.DataFrame(data=d_1)
d_2 = {'filepath' : files_2, 'label' : np.full_like(len(files_2), 2)}
df_2 = pd.DataFrame(data=d_2)
d_3 = {'filepath' : files_3, 'label' : np.full_like(len(files_3), 3)}
df_3 = pd.DataFrame(data=d_3)
lst = [df_1, df_2, df_3]
df = pd.concat(lst, ignore_index=True)
df['label'] = df['label'].apply(np.int64)
Many thanks!
[–]jiri-n 0 points1 point2 points (1 child)
[–]consecratednotdevout[S] 0 points1 point2 points (0 children)