Remote Debugging on Cluster with VS Code and DebugPy
Welcome to this tutorial on remote debugging! When working with machine learning models or computational tasks on a cluster, debugging can be challenging. This guide will show you how to use VS Code and DebugPy to debug Python code running on a remote cluster as if it were running locally on your machine.
Why Remote Debugging?
When developing on computing clusters, you often encounter scenarios where:
- Your code runs fine locally but fails on the cluster
- You need to debug with the cluster’s specific hardware (GPUs, large memory)
- Long-running processes make print-statement debugging impractical
- You want to inspect variables and step through code in real-time
Remote debugging solves these problems by letting you use VS Code’s powerful debugging tools while your code executes on the remote server.
Beyond the Cluster: The skills you’ll learn here extend far beyond academic computing clusters. These same techniques apply to debugging applications in cloud environments, including containerized applications running in Kubernetes pods, AWS EC2 instances, Google Cloud VMs, or Azure containers. As modern software increasingly runs in distributed and cloud-native architectures, mastering remote debugging becomes an essential skill for any developer working with production systems.
Prerequisites
Before we begin, ensure you have:
- Conda installed on your local machine (see our Python Environment Setup guide)
- VS Code installed on your local machine
- Python Extension for VS Code installed
- Automated SSH access to your remote server (covered in our environment setup guide)
- Git repository cloned on your local machine
- Basic familiarity with VS Code and Python
Important: Before starting setup on the remote server, open a screen session to avoid connection drops during installation. Learn about persistent sessions with screen in our environment setup guide.
Part 1: Setting Up the Remote Server
Step 1: Start a Persistent Session (Recommended)
Long installations can break if your SSH connection closes. Start a persistent screen session to keep your work alive:
# SSH to your cluster
ssh cluster_node
# Activate your conda environment
conda activate vulnewdata
# Start a screen session
screen -S setup_env
# Proceed with setup steps below
Screen Commands:
- Detach:
Ctrl+A, thenD - Reattach:
screen -r setup_env - List sessions:
screen -ls
This ensures installations or debugging sessions stay alive even if you disconnect.
Step 2: Install DebugPy on the Cluster
DebugPy is Microsoft’s debug adapter for Python. It enables VS Code to attach to and debug Python processes remotely.
In your activated conda environment, run:
conda install anaconda::debugpy
Verify the installation:
python -m debugpy --version
You should see the version number (e.g., 1.8.0).
Step 3: Run DebugPy on the Remote Server
Before debugging your project code, let’s practice with a simple example. Create a test script or use the sample code in your uncertainty/debug directory.
Understanding the DebugPy Command:
python -m debugpy --wait-for-client --listen 0.0.0.0:5678 your_script.py
Let’s break down the parameters:
-
0.0.0.0- Listens on all network interfaces (allows external connections) -
5678- The port for the debug server (use another if this is occupied) -
--wait-for-client- Pauses code execution until VS Code attaches -
your_script.py- The Python file you want to debug
Create a Helper Script (Optional):
For convenience, create a runpy.sh script:
#!/bin/bash
# runpy.sh - Helper script to run Python files with DebugPy
python -m debugpy --wait-for-client --listen 0.0.0.0:5678 "$1"
Make it executable:
chmod +x runpy.sh
Run your Python file with debugging:
./runpy.sh train_model.py
Or directly:
python -m debugpy --wait-for-client --listen 0.0.0.0:5678 train_model.py
Step 4: Verify DebugPy is Listening
Check that the debug server is running and waiting for connections:
netstat -a -n | grep 5678
You should see output indicating port 5678 is in LISTEN state:
tcp 0 0 0.0.0.0:5678 0.0.0.0:* LISTEN
If you see this, DebugPy is successfully waiting for your VS Code debugger to attach!
Part 2: Setting Up Your Local Machine
Step 1: Configure VS Code for Remote Debugging
VS Code uses launch.json to configure debugging sessions. Let’s create one for remote debugging.
Create or Edit .vscode/launch.json:
In your project’s root directory, create .vscode/launch.json with the following configuration:
{
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Remote Attach without Mappings",
"type": "debugpy",
"request": "attach",
"connect": {
"host": "localhost",
"port": 55678
}
},
{
"name": "Python Debugger: Remote Attach",
"type": "debugpy",
"request": "attach",
"connect": {
"host": "localhost",
"port": 55678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/home/vulnewdata"
}
]
}
]
}
Understanding the Configuration:
We have two debug configurations:
- Without Mappings: Use this if your local and remote directory structures are identical
- With Mappings: Use this when your local workspace is in a different location than the remote one
Path Mappings Explained:
-
localRoot: Your local project directory (${workspaceFolder}refers to your VS Code workspace) -
remoteRoot: The absolute path to the project on the remote cluster
Important: Update remoteRoot to match your actual remote directory path!
Step 2: Set Up Conda Environment in VS Code
To ensure VS Code uses the correct Python interpreter:
If Conda isn’t recognized by VS Code:
Initialize Conda in your shells (run in Anaconda Prompt on Windows):
conda init powershell
conda init cmd.exe
Select the Python Interpreter:
- Restart VS Code
- Open Command Palette:
Ctrl+Shift+P(Windows/Linux) orCmd+Shift+P(Mac) - Type and select: “Python: Select Interpreter”
- Choose your conda environment:
~\miniconda3\envs\vulnewdata\python.exe (Windows) ~/miniconda3/envs/vulnewdata/bin/python (Linux/Mac)
Now you can activate your conda environment directly in VS Code’s integrated terminal!
Step 3: Sync Local and Remote Code
Critical: Your local code must match the remote code, or breakpoints won’t align correctly!
Before debugging, always sync your repository:
# Pull latest changes from remote repository
git pull origin main
# Or if you have local changes, commit and push them first
git add .
git commit -m "Your changes"
git push origin main
# Then pull on the cluster
ssh cluster_node "cd /path/to/project && git pull"
Best Practice: Use Git as the single source of truth. Both your local machine and cluster should pull from the same repository.
Step 4: Set Up SSH Port Forwarding
This is the crucial step that connects your local VS Code to the remote DebugPy server.
Create the SSH Tunnel:
Open a terminal on your local machine and run:
ssh -L 55678:localhost:5678 cluster_node
Understanding the Command:
-
-L 55678:localhost:5678- Forward local port55678to remote port5678 -
cluster_node- Your SSH alias oruser@hostname
This creates a secure tunnel: VS Code connects to localhost:55678, which forwards to the cluster’s 5678 where DebugPy is listening.
Verify the Connection:
Use the modern ss command (socket statistics):
ss -tln | grep 55678
You should see:
LISTEN 0 128 127.0.0.1:55678 0.0.0.0:*
Alternative with netstat (older systems):
If ss is not available, you can use the traditional netstat command:
Windows PowerShell:
netstat -a -n | select-string "55678"
Linux/Mac:
netstat -a -n | grep 55678
Expected output:
TCP 127.0.0.1:55678 0.0.0.0:0 LISTENING
Keep This Terminal Open! If you close it, the tunnel closes and debugging stops.
Part 3: Start Debugging!
Now that everything is set up, let’s start debugging.
The Debugging Workflow
On the Remote Cluster:
- Ensure your screen session is running (if you detached)
- Start your Python script with DebugPy:
python -m debugpy --wait-for-client --listen 0.0.0.0:5678 your_script.py - You should see output indicating DebugPy is waiting:
Waiting for debugger attach...
On Your Local Machine:
- Ensure SSH port forwarding is active (the tunnel terminal is open)
- Open your project in VS Code
- Set breakpoints by clicking in the left margin next to line numbers
- Go to the Run and Debug tab (or press
Ctrl+Shift+D) - Select “Python Debugger: Remote Attach” from the dropdown
- Click the green play button ▶ or press
F5
Success! VS Code should connect, and you’ll see:
- Your breakpoint indicators turn red (active)
- The debug toolbar appears
- Variables panel populates
- Your code pauses at breakpoints
Using the Debugger
Once connected, you can:
Navigate Through Code:
- Continue (
F5): Resume execution until next breakpoint - Step Over (
F10): Execute current line, don’t enter functions - Step Into (
F11): Enter function calls to debug them - Step Out (
Shift+F11): Exit current function
Inspect Variables:
- View local and global variables in the Variables panel
- Hover over variables in the code to see their values
- Add variables to the Watch panel for continuous monitoring
Evaluate Expressions:
- Use the Debug Console to evaluate Python expressions in real-time
- Type any valid Python code to test hypotheses
Call Stack:
- See the execution path that led to the current point
- Click on different frames to inspect variables at each level
Example: Debugging a Training Script
Let’s walk through a practical example of debugging a machine learning training script.
Remote Cluster (in screen session):
# Navigate to your project
cd ~/vulnewdata
# Activate environment
conda activate vulnewdata
# Start debugging the training script
python -m debugpy --wait-for-client --listen 0.0.0.0:5678 \
uncertainty/scripts/train_model.py --epochs 10 --dataset data/train.csv
Local Machine:
- Open project in VS Code
- Set breakpoints in
train_model.py:- Line where model is initialized
- Beginning of training loop
- Loss calculation
- Validation step
-
Start SSH tunnel (separate terminal):
ssh -L 55678:localhost:5678 cluster_node - Attach debugger in VS Code:
- Press
F5or click the debug play button - Select “Python Debugger: Remote Attach”
- Press
- Debug your code:
- Inspect model parameters
- Check tensor shapes
- Verify loss calculations
- Step through the training loop
Common Debugging Scenarios
Scenario 1: Unexpected NaN Loss
# Set breakpoint here
loss = criterion(outputs, labels)
In Debug Console, inspect:
>>> outputs
tensor([...]) # Check for NaN or inf values
>>> labels.shape
torch.Size([32, 10]) # Verify dimensions
>>> (outputs != outputs).any() # Check for NaN
Scenario 2: Dimension Mismatch
# Set breakpoint before the error
x = model(inputs)
# Inspect x.shape in Variables panel
y = classifier(x) # Dimension mismatch here
Check in Variables panel or Debug Console:
>>> x.shape
torch.Size([32, 512])
>>> classifier.in_features
256 # Ah, there's the problem!
Troubleshooting Common Issues
Issue 1: VS Code Can’t Connect to Debug Server
Symptoms:
- “Connection refused” or timeout error
- Debugger doesn’t attach
Solutions:
-
Verify DebugPy is running on cluster:
netstat -a -n | grep 5678 -
Check SSH tunnel is active:
# Local machine netstat -a -n | grep 55678 - Ensure ports match:
- Remote:
5678(or your chosen port) - Local tunnel:
ssh -L 55678:localhost:5678 - VS Code
launch.json:"port": 55678
- Remote:
- Firewall issues: Ensure cluster firewall allows port 5678
Issue 2: Breakpoints Are Gray (Not Active)
Symptoms:
- Breakpoints appear gray/hollow
- Code doesn’t pause at breakpoints
Solutions:
-
Check path mappings in
launch.json:"pathMappings": [ { "localRoot": "${workspaceFolder}", "remoteRoot": "/home/username/vulnewdata" // Must match exactly! } ] -
Verify files are in sync:
git status # On both local and remote -
Use absolute paths: Try specifying exact paths instead of
${workspaceFolder}
Issue 3: SSH Connection Drops
Symptoms:
- Debugging suddenly stops
- “Connection lost” message
Solutions:
-
Use screen on the cluster:
screen -S debug_session python -m debugpy --wait-for-client --listen 0.0.0.0:5678 script.py # Detach with Ctrl+A, D -
Configure SSH keepalive in
~/.ssh/config:Host cluster_node ServerAliveInterval 60 ServerAliveCountMax 10 -
Use persistent SSH tunnel:
ssh -N -L 55678:localhost:5678 cluster_node(
-Nmeans no remote command, just tunneling)
Issue 4: Port Already in Use
Symptoms:
- “Address already in use” error
Solutions:
-
Find and kill the process:
# On cluster lsof -ti:5678 | xargs kill -9 # On local machine lsof -ti:55678 | xargs kill -9 # Mac/Linux netstat -ano | findstr :55678 # Windows (then use taskkill) -
Use different ports:
# On cluster python -m debugpy --wait-for-client --listen 0.0.0.0:5679 script.py # On local machine ssh -L 55679:localhost:5679 cluster_node # Update launch.json port to 55679
Issue 5: Wrong Python Interpreter
Symptoms:
- Import errors for packages you know are installed
- Different Python version than expected
Solutions:
-
Verify remote Python:
which python python --version -
Explicitly use conda python:
conda activate vulnewdata which python # Should show conda env path python -m debugpy ... -
Update VS Code interpreter:
-
Ctrl+Shift+P→ “Python: Select Interpreter” - Choose the correct conda environment
-
Best Practices
Do’s ✅
- Always use screen sessions for remote debugging
- Keep code synchronized between local and remote via Git
- Use descriptive debug configurations in
launch.json - Set conditional breakpoints for specific scenarios
- Use logpoints instead of print statements (they don’t require code changes)
- Document your debug port assignments if working in a shared cluster
- Close debug sessions properly to free up resources
Don’ts ❌
- Don’t hardcode paths - use variables like
${workspaceFolder} - Don’t debug in production environments - use development/staging clusters
- Don’t leave debug servers running - they consume resources
- Don’t commit
launch.jsonwith personal paths - use relative paths or .gitignore - Don’t debug with print statements when you can use VS Code’s tools
- Don’t forget to pull latest code before debugging
- Don’t close the SSH tunnel terminal while debugging
Advanced Tips
Tip 1: Debug with Command-Line Arguments
python -m debugpy --wait-for-client --listen 0.0.0.0:5678 script.py \
--epochs 10 \
--batch-size 32 \
--learning-rate 0.001
Tip 2: Conditional Breakpoints
Right-click on a breakpoint in VS Code and set a condition:
epoch > 5 # Only break after epoch 5
loss > 1.0 # Only break if loss is high
batch_idx == 100 # Break at specific batch
Tip 3: Logpoints
Instead of adding print statements, right-click in the margin and choose “Add Logpoint”:
Loss: {loss.item()}, Accuracy: {acc}
This prints to the Debug Console without modifying code!
Tip 4: Multiple Debug Sessions
Debug multiple processes simultaneously:
# Terminal 1: Debug main training
ssh -L 55678:localhost:5678 cluster_node
# Terminal 2: Debug data preprocessing
ssh -L 55679:localhost:5679 cluster_node
Add separate configurations in launch.json for each.
Tip 5: Debug Jupyter Notebooks
You can also debug Jupyter notebooks running on the cluster:
# On cluster
python -m debugpy --listen 0.0.0.0:5678 -m jupyter notebook --no-browser --port=8888
Then connect with both Jupyter port forwarding and debug port forwarding!
Quick Reference
Essential Commands
Remote Cluster:
# Start debug server
python -m debugpy --wait-for-client --listen 0.0.0.0:5678 script.py
# Check if listening
netstat -a -n | grep 5678
# Kill debug process
pkill -f debugpy
Local Machine:
# SSH tunnel
ssh -L 55678:localhost:5678 cluster_node
# Check tunnel
netstat -a -n | grep 55678
# Persistent tunnel
ssh -N -L 55678:localhost:5678 cluster_node
VS Code:
- Open Debug panel:
Ctrl+Shift+D - Start debugging:
F5 - Toggle breakpoint:
F9 - Step over:
F10 - Step into:
F11 - Step out:
Shift+F11 - Continue:
F5
Additional Resources
Getting Help
If you encounter issues:
- Check this guide’s troubleshooting section
- Verify all prerequisites are met
- Test with a simple Python script first
- Ask in our lab Slack/Discord channel
- File an issue in the lab’s GitHub repository
Happy debugging! 🐛✨
Maintained by: CUNY MASS Lab
Last Updated: October 7, 2025
Questions? Contact the lab administrators or the authors of the blog post.