TensorFlow CUDA_VISIBLE_DEVICES in Jupyter Notebook

Introduction
Step-by-Step Guide
Code Example
Additional Notes
Summary
Conclusion
References

Introduction

When working with TensorFlow, especially on systems with multiple GPUs, you might want to control which GPUs TensorFlow can access and utilize. This is crucial for tasks like dedicating specific GPUs for certain processes or troubleshooting. You can achieve this control by using the CUDA_VISIBLE_DEVICES environment variable.

Step-by-Step Guide

You can control which GPUs TensorFlow sees by setting the CUDA_VISIBLE_DEVICES environment variable. Here's how:

Identify GPU IDs: Use nvidia-smi in your terminal to list available GPUs and their IDs.
Set the environment variable:
```
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # Use GPU 0
```
- Replace "0" with the desired GPU ID.
- To use multiple GPUs, separate their IDs with commas: "0,1,2".
- To disable GPU usage, set it to "-1".
Place the code: This code needs to be executed before you import TensorFlow.

Important Notes:

Order matters: TensorFlow assigns logical IDs (0, 1, ...) to the visible GPUs based on the order they appear in CUDA_VISIBLE_DEVICES. If you set "1,0", then GPU 1 will be logical device 0 and GPU 0 will be logical device 1.
Jupyter Notebooks: Restart the kernel after setting the environment variable for it to take effect.
Other libraries: This approach generally works for other deep learning libraries like PyTorch as well.
Horovod: For distributed training with Horovod, you typically set CUDA_VISIBLE_DEVICES differently for each node to specify which GPU(s) that node should use.

Code Example

The Python code demonstrates how to control which GPUs TensorFlow can access. It uses the CUDA_VISIBLE_DEVICES environment variable to specify GPU IDs before importing TensorFlow. You can choose a single GPU, multiple GPUs, or disable GPU usage entirely. The code also includes a check to verify the visible GPUs within TensorFlow.

# Step 1: Identify GPU IDs (This is done in your terminal, not in the code)
# Run: nvidia-smi

# Step 2 & 3: Set the environment variable BEFORE importing TensorFlow
import os

# Use only GPU 0
os.environ["CUDA_VISIBLE_DEVICES"] = "0" 

# Use multiple GPUs (0 and 1)
# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

# Disable GPU usage
# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

# Now import TensorFlow and your other libraries
import tensorflow as tf

# Verify which GPUs TensorFlow sees
physical_devices = tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(physical_devices))

# Rest of your TensorFlow code...

Explanation:

nvidia-smi (Terminal command): This command provides information about your NVIDIA GPUs, including their IDs. You'll use these IDs in the next step.
os.environ["CUDA_VISIBLE_DEVICES"] = ...: This line sets the CUDA_VISIBLE_DEVICES environment variable.
- "0": Makes only GPU 0 visible to TensorFlow.
- "0,1": Makes GPUs 0 and 1 visible. TensorFlow will assign logical IDs 0 and 1 to them, respectively.
- "-1": Disables all GPUs, forcing TensorFlow to use the CPU.
Import TensorFlow after setting the variable: This ensures TensorFlow respects your GPU selection.
tf.config.list_physical_devices('GPU'): This TensorFlow command lets you confirm which GPUs are visible and accessible within your code.

Remember:

Restart kernels in Jupyter Notebooks: After setting the environment variable, restart the kernel for the changes to take effect.
Horovod: For distributed training, you'll need to manage CUDA_VISIBLE_DEVICES on each node separately to control GPU assignments across your cluster.

Additional Notes

Persistence: Setting CUDA_VISIBLE_DEVICES only affects the current process. Once you close your terminal or Jupyter Notebook, the setting reverts to its default.
System-wide settings: While possible, directly modifying system environment variables to manage GPU visibility is generally not recommended. It can lead to conflicts with other applications or users.
Containerization (Docker, Singularity): When using TensorFlow within containers, you'll typically set CUDA_VISIBLE_DEVICES as part of the container's environment configuration during its launch.
Cloud environments: Cloud providers often have their own mechanisms for specifying GPU resources for your instances. Consult their documentation for how to allocate and access GPUs.
Debugging: If you encounter issues with GPU visibility, double-check the following:
- Correct GPU IDs: Ensure you're using the IDs shown by nvidia-smi.
- Variable set before import: CUDA_VISIBLE_DEVICES must be set before you import TensorFlow.
- Jupyter kernel restarts: Restart the kernel after making changes to the environment variable within a Jupyter Notebook.
Alternatives: TensorFlow provides more fine-grained control over GPU usage through its API:
- tf.config.set_visible_devices: Allows you to dynamically change visible devices within your TensorFlow code.
- Device placement: You can explicitly place operations on specific devices (CPUs or GPUs) using tf.device.

Remember that understanding your specific hardware configuration and software environment is essential for effectively managing GPU resources with TensorFlow.

Summary

Feature	Description
Environment Variable	`CUDA_VISIBLE_DEVICES`
Purpose	Controls which GPUs TensorFlow can access and utilize.
Identifying GPU IDs	Use the command `nvidia-smi` in your terminal.
Setting the Variable	- Use `os.environ["CUDA_VISIBLE_DEVICES"] = "GPU_ID"` in your Python code before importing TensorFlow. - Replace `"GPU_ID"` with: - A single GPU ID (e.g., `"0"`). - Multiple GPU IDs separated by commas (e.g., `"0,1,2"`). - `"-1"` to disable GPU usage.
Logical GPU IDs	TensorFlow assigns logical IDs (0, 1, ...) to visible GPUs based on their order in `CUDA_VISIBLE_DEVICES`.
Jupyter Notebooks	Restart the kernel after setting the variable for changes to take effect.
Other Libraries	This method generally applies to other deep learning libraries like PyTorch.
Horovod	Distributed training with Horovod requires specific `CUDA_VISIBLE_DEVICES` settings for each node.

Conclusion

By using the CUDA_VISIBLE_DEVICES environment variable, you can effectively manage which GPUs TensorFlow utilizes. This control is essential for optimizing GPU resources, especially in multi-GPU systems. Remember to set this variable before importing TensorFlow and to verify your configuration using tf.config.list_physical_devices('GPU'). Whether you need to dedicate GPUs, troubleshoot issues, or experiment with different setups, mastering this technique will streamline your TensorFlow development process.

References

Specifying GPU in IPython notebook - Deep Learning - fast.ai ... | I have found this piece of code : import os os.environ[“CUDA_DEVICE_ORDER”]=“PCI_BUS_ID” os.environ[“CUDA_VISIBLE_DEVICES”]=“1” from this website . Did anybody tried to use gpu’s programmatically inside a notebook. Does the code looks valid?
Tensorflow multiple sessions with multiple GPUs - Stack Overflow | Jan 13, 2016 ... ... in the code as tensorflow will automatically make use of CUDA_VISIBLE_DEVICES accordingly. ... Tensorflow set CUDA_VISIBLE_DEVICES within jupyter.
Tip: Limiting TensorFlow to one GPU - Part 2 (2017) - fast.ai Course ... | ###Goal Limit a TensorFlow session to one GPU within a Jupyter notebook. ###Method import os os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # so the IDs match nvidia-smi os.environ["CUDA_VISIBLE_DEVICES"] = "0" # "0, 1" for multiple Discussion This code hides any GPUs you haven’t listed. This will allow you to train a model on one GPU while experimenting on another. ###Reference
How to disable GPU with TensorFlow? - Data Science Stack Exchange | Sep 7, 2019 ... import os os.environ["CUDA_VISIBLE_DEVICES"] = "-1" When that variable is defined and equal to -1, TF uses the CPU even when a CUDA GPU is available.
Train model by using a specific GPU - PyTorch Forums | I have two GPUs, and GPU 0 is in using. So I want to train my model on GPU 1. However, I’ve tried to use os.environ["CUDA_VISIBLE_DEVICES"]="1" in my script, and CUDA_VISIBLE_DEVICES=1 both not work.
[1.12] os.environ["CUDA_VISIBLE_DEVICES"] has no effect · Issue ... | 🐛 Describe the bug The GPUs installed on my server are as follows > nvidia-smi -L GPU 0: NVIDIA GeForce RTX 3090 GPU 1: NVIDIA GeForce RTX 3090 GPU 2: NVIDIA TITAN RTX GPU 3: Quadro GV100 GPU 4: NV...
Using Horovod for Distributed Training - HECC Knowledge Base | Oct 6, 2022 ... How to Set Up R Kernel in Jupyter Lab. menu item. Using TensorBoard ... Sets CUDA_VISIBLE_DEVICES to ints to let the other nodes use the ...
How to Tell if Tensorflow is Using GPU Acceleration from Inside ... | In this blog, we will learn about Tensorflow, a widely-used open-source machine learning library that is favored by data scientists and software engineers. Known for its versatility, Tensorflow excels in performing computations on both CPUs and GPUs, establishing itself as a robust tool for practitioners in the fields of data science and machine learning. Whether you're a data scientist or a software engineer, understanding Tensorflow's capabilities can significantly enhance your proficiency in these domains.
how can i run tensorflow in second gpu · Issue #40842 · tensorflow ... | I am tired to make TensorFlow 2.1 to work in the second , third, fourth GPU but every time TensorFlow uses the first GPU I have 4 GPU and I want to train 4 model each model in different GPU System ...