Dear all,

I struggle hard because the teacher doesnt go into the math behind “batches and epochs”.

Situation:

We have a dataset of 627 rows to train the machine.

The teacher explains:

batchsize=32 (ok) → epochs = amount of times the machine will get trained by the same dataset.

My question is:

Why would i load the same set of data more than once into the machine? Loading the same data once or 1000000 doesnt has an impact on the result even if we shuffle the rows with which the batches gets filled (in the end it is 627/627).

After I brainstormed:

- Is it like a row on a lottery-paper? Means: 627 rows → the shuffle-algorithm picks 32 rows randomly → theese rows gets passed as a batch into the machine → epochs++ (32/627) and start from new (0/627)? That means the same rows could get picked by the shuffle-algo again (what somehow makes sense due to the fact that the dataset is also just the result of 1 event and not of 100000 events).

But the teacher mentioned "loading the complete dataset → epochs++

Can someone please explains this.

Thanks!