Data¶
For machine learning tasks, high-quality, uniformly distributed data is essential, as it indicates the need to identify the location of the target subspace within the high-dimensional input data encoding space
Preparation¶
Preparing our dataset by ingesting and splitting it.1
Ingestion¶
IO¶
pandas->.csvopencv,PIL,kornia.image->.jpg,.png
Data Loader¶
inherit VisionDataset and pass it to DataLoader2
from torchvision.datasets import VisionDataset
class TestDataset(VisionDataset):
pass
train_dataset = TestDataset(...)
from torch.utils.data import DataLoader
train_loader = DataLoader(
train_dataset,
batch_size=batch_size,
shuffle=True,
num_workers=4,
pin_memory=True,
)
finally,dataset is able to get into trainer.fit()3
import lightning as L
from your_module import MyLightningModule
model = MyLightningModule()
trainer = L.Trainer()
trainer.fit(model, train_dataloader, val_dataloader)
Splitting¶
define dataset type using lighting DataModule
Cite
we need to split our training dataset into train and val data splits.
- Use the
trainsplit to train the model.
Here the model will have access to both inputs (features) and outputs (labels) to optimize its internal weights.
- After each iteration (epoch) through the training split, we will use the
valsplit to determine the model's performance.
Here the model will not use the labels to optimize its weights but instead, we will use the validation performance to optimize training hyperparameters such as the learning rate, etc.
- Finally, we will use a separate holdout
testdataset to determine the model's performance after training.
This is our best measure of how the model may behave on new, unseen data that is from a similar distribution to our training dataset.1
Exploratory Data Analysis (EDA)¶
Note
Goal is to convince yourself that the data you have is sufficient for the task.4
Collect Data¶
Preprocessing¶
torchvision transform 5
Vectorization¶
-
https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#loading-a-dataset ↩
-
https://lightning.ai/docs/pytorch/stable/common/trainer.html#basic-use ↩
-
https://madewithml.com/courses/mlops/exploratory-data-analysis/ ↩
-
https://pytorch.org/vision/main/auto_examples/transforms/plot_transforms_e2e.html#sphx-glr-auto-examples-transforms-plot-transforms-e2e-py ↩