TensorFlow Epochs vs Steps: Key Differences Explained

Introduction
Step-by-Step Guide
Code Example
Additional Notes
Summary
Conclusion
References

Introduction

Let's imagine we're teaching a child to tell the difference between cats and dogs using a picture book. This picture book is like our "dataset" in machine learning. Each time we show the child a few pictures from the book, say 10 pictures, that's like using a "batch size" of 10. When the child looks at these 10 pictures and learns a little bit more about cats and dogs, they've completed one "step" of learning. Now, if the child goes through the entire picture book once, that's one "epoch" of learning.

Step-by-Step Guide

Imagine you're teaching a child to recognize cats from dogs using a picture book.

Dataset: Your picture book is your dataset. It has pictures of cats and dogs.
Batch Size: You show the child 10 pictures at a time. This is your batch size.
Step: Each time the child looks at a batch of 10 pictures and learns a little about cats and dogs, that's one step.
Epoch: One epoch is when the child has gone through the entire picture book once.

model.fit(
    training_data, 
    epochs=10,  # Go through the entire dataset 10 times
    batch_size=32  # Each step uses 32 samples
)

Relationship between steps and epochs:

Steps per epoch: The number of steps it takes to complete one epoch. If your picture book has 100 pictures and your batch size is 10, you'll have 10 steps per epoch.

steps_per_epoch = total_samples // batch_size

Why are these important?

Epochs: Control how many times the model sees the entire dataset. More epochs can lead to better learning but also overfitting.
Batch size and steps: Affect how often the model updates its understanding. Smaller batches mean more frequent updates but can be slower.

Code Example

This Python code simulates a simplified image classification training process. It uses a list of dictionaries, each containing an image emoji and a label ("cat" or "dog"), to represent a dataset. The code iterates through this dataset in batches for a specified number of epochs, simulating the process of showing images to a model and updating its understanding. Key parameters like batch size and epochs are defined to control the learning process. The code demonstrates the concept of dividing data into batches and iterating over epochs for training a machine learning model, although the actual model training and update steps are not implemented.

# Imagine a list of pictures, where each picture is labeled as "cat" or "dog"
picture_book = [
    {"image": "🐱", "label": "cat"},
    {"image": "🐶", "label": "dog"},
    {"image": "🐱", "label": "cat"},
    # ... more pictures of cats and dogs ...
    {"image": "🐶", "label": "dog"}
]

# Total number of pictures in our book
total_pictures = len(picture_book) 

# Batch size: How many pictures we show the child at once
batch_size = 10  

# Epochs: How many times we go through the entire book
epochs = 3

# Calculate how many steps are in one epoch
steps_per_epoch = total_pictures // batch_size

# Let's simulate the learning process
for epoch in range(epochs):
    print(f"Epoch {epoch + 1}/{epochs}")
    for step in range(steps_per_epoch):
        # Get a batch of pictures
        start_index = step * batch_size
        end_index = start_index + batch_size
        batch = picture_book[start_index:end_index]

        print(f"Step {step + 1}/{steps_per_epoch}: Showing pictures {start_index + 1} to {end_index}")
        # ... (Imagine the child looking at the pictures and learning) ...

        # In a real model, we'd update the model's understanding here based on the batch

print("Learning complete!")

Explanation:

Picture Book (Dataset): We represent the picture book as a list called picture_book. Each element is a dictionary with an "image" (simplified here as an emoji) and a "label" ("cat" or "dog").
Parameters: We define total_pictures, batch_size, and epochs to control the learning process.
Steps per Epoch: We calculate steps_per_epoch to know how many batches are needed to cover the entire dataset in one epoch.
Learning Loop:
- Outer Loop (Epochs): We iterate through the entire picture book multiple times (defined by epochs).
- Inner Loop (Steps): Within each epoch, we iterate through the dataset in batches.
- Showing Pictures: We simulate showing a batch of pictures to the child.
- Model Update (Not Shown): In a real machine learning model, this is where we'd use an algorithm (like backpropagation) to adjust the model's internal parameters based on the batch of data it just saw.

Key Points:

Epochs and Overfitting: More epochs generally lead to better learning, but too many can cause the model to memorize the training data too well (overfitting) and perform poorly on new data.
Batch Size Trade-off: Smaller batch sizes (e.g., 10) mean more frequent updates to the model, which can lead to faster learning but also more instability. Larger batch sizes (e.g., 32) are more computationally efficient but might miss subtle patterns in the data.
Steps per Epoch: This value helps you understand how many times the model's understanding is updated within a single pass through the entire dataset.

Additional Notes

Real-world analogy: Think of teaching the child multiplication tables. One epoch is like going through the entire multiplication table once. Each time they practice a set of multiplications (like 2 x 2, 2 x 3, 2 x 4), that's a batch.
Learning rate: In addition to epochs and batch size, there's another important factor in learning: the "learning rate." Imagine it as the size of the steps the child takes while learning. Small steps might be slow, but big steps might make them miss important details.
Validation: Just like we'd occasionally ask the child to identify cats and dogs in new pictures to see how well they're learning, we also test machine learning models on unseen data. This is called "validation" and helps us make sure the model isn't just memorizing the picture book.
The goal: Our goal isn't to make the child (or the model) memorize every picture in the book. We want them to learn the general features of cats and dogs so they can recognize any cat or dog, even if they've never seen it before.
Iteration is key: Learning happens gradually. Each time the child (or model) sees a batch of pictures and gets feedback, they adjust their understanding a little bit. Over many epochs and steps, this leads to significant improvement.

Summary

This analogy explains machine learning concepts using a child learning to differentiate cats and dogs from a picture book.

Concept	Analogy	Explanation
Dataset	Picture book	The collection of cat and dog pictures used for learning.
Batch Size	10 pictures at a time	The number of pictures shown to the child in one go.
Step	Looking at one batch and learning	One cycle of showing the child a batch and them learning from it.
Epoch	Going through the entire book once	One complete cycle of showing the child all the pictures in the book.

Code Example:

model.fit(
    training_data, # The picture book
    epochs=10,  # Child goes through the book 10 times
    batch_size=32  # Child sees 32 pictures at a time
)

Relationship between Steps and Epochs:

Steps per epoch: Number of times the child goes through batches to finish the book. Calculated as total pictures / pictures per batch.

Importance:

Epochs: Control how much the model learns from the data. More epochs can improve learning but might lead to "memorizing" the pictures (overfitting).
Batch size and steps: Influence how often the model adjusts its understanding. Smaller batches mean more frequent adjustments but can be slower.

Conclusion

Just like teaching a child, training a machine learning model involves breaking down the learning process into smaller, manageable steps. We use a dataset (like a picture book) and control how the model learns using parameters like epochs (going through the entire dataset multiple times) and batch size (the number of examples processed at once). The relationship between steps and epochs helps us understand how many times the model updates its understanding within and across these learning cycles. By carefully tuning these parameters, we guide the model to learn effectively from the data and generalize well to new, unseen examples, much like a child learns to recognize any cat or dog after being exposed to enough examples.

References

What is the difference between steps and epochs in TensorFlow ... | A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.
What is the relationship between steps and epochs in TensorFlow ... | Mar 15, 2017 ... TL;DR: An epoch is when your model goes through your whole training data once. A step is when your model trains on a single batch (or a ...
Steps vs Epochs in TensorFlow: What's the Difference? | Learn the difference of steps vs epochs in TensorFlow. Understand how these terms impact your models.
neural network - TensorFlow: nr. of epochs vs. nr. of training steps ... | Apr 10, 2017 ... An epoch consists of going through all your training samples once. And one step/iteration refers to training over a single minibatch.
Request for Explanation on steps_per_epoch parameter ... | Hi, In C2_W2 we use ‘steps_per_epoch’ parameter while training the model. I would like to know why this parameter is used for this use case. How this parameter helps with the training? When we should be using this parameter? Any other parameter checks/settings we need to do while using this parameter? I have read the tf/keras documentation for the same. I would like to little more explanation to understand it better. Thank you.
Steps-VS-epochs - machine-learning | An epoch usually means one iteration over all of the training data. For instance if you have 20,000 images and a batch size of 100 then the epoch should contain ...
Wrong definition of secs/step in Keras progress bar? · Issue #11706 ... | I was trying to get the images/sec from training Keras model. import numpy as np import tensorflow as tf x = np.random.randint(0, 1, size=(512, 224, 224, 3)) y = np.random.randint(0, 1000, size=512...
What are steps, epochs, and batch size in Deep Learning | by ... | A short glossary of terms used in Deep Learning.
What is difference between Validation steps and Steps per epoch ... | Jun 26, 2021 ... Validation steps. While training, a machine learning model performs training steps on the training data. · Steps per epoch. Once we know what is ...