Downloading Images with Python: Requests, Urllib3, and Wget Guide
Quick answer: Use Python’s Requests library to download images by making a GET request and saving the binary content to a file. For multiple images, parse HTML with BeautifulSoup, extract image URLs, and download them in a loop.
Downloading a Single Image with Requests
Requests is the most popular and beginner-friendly library for HTTP requests.
python
import requests
# Image URL
url = 'https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg'
# Download the image
response = requests.get(url)
# Save to file (binary mode)
with open('image.jpg', 'wb') as file:
file.write(response.content)
Get filename from URL:
python
def extract_filename(url):
return url.split("/")[-1]
with open(extract_filename(url), 'wb') as file:
file.write(response.content)
Error Handling
Always handle potential errors:
python
import requests
from requests.exceptions import HTTPError, Timeout
url = 'https://example.com/image.jpg'
try:
response = requests.get(url, timeout=10)
response.raise_for_status() # Raise exception for 4xx/5xx status codes
with open('image.jpg', 'wb') as file:
file.write(response.content)
except HTTPError as e:
print(f"HTTP error: {e}")
except Timeout as e:
print(f"Request timed out: {e}")
except IOError as e:
print(f"File error: {e}")
Using Proxies with Requests
Avoid IP bans when downloading many images:
python
import requests
# Proxy configuration (from your provider)
proxies = {
'http': 'http://username:password@proxy-host:port',
'https': 'http://username:password@proxy-host:port'
}
url = 'https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg'
response = requests.get(url, proxies=proxies)
with open('image.jpg', 'wb') as file:
file.write(response.content)
Downloading with Urllib3
Urllib3 offers more control but requires more code:
python
import urllib3
# Basic download
url = 'https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg'
response = urllib3.request('GET', url)
def extract_filename(url):
return url.split("/")[-1]
with open(extract_filename(url), 'wb') as file:
file.write(response.data)
With proxies (authenticated):
python
import urllib3
# Setup proxy authentication
headers = urllib3.make_headers(proxy_basic_auth='username:password')
http = urllib3.ProxyManager('http://proxy-host:port', proxy_headers=headers)
url = 'https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg'
response = http.request('GET', url)
with open(extract_filename(url), 'wb') as file:
file.write(response.data)
Downloading with Wget
Simplest option for quick downloads:
python
import wget url = 'https://books.toscrape.com/media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg' wget.download(url) # Saves with filename from URL
Note: Wget Python library doesn’t support proxies or HTML parsing. For complex tasks, use Requests.
Downloading Multiple Images from a Website
Combine Requests with BeautifulSoup to scrape and download all images:
python
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
# Fetch the page
page_url = "https://books.toscrape.com/"
response = requests.get(page_url)
soup = BeautifulSoup(response.text, "html.parser")
# Find all image tags
img_tags = soup.find_all("img")
# Download each image
for img in img_tags:
img_url = img.get("src")
full_url = urljoin(page_url, img_url) # Handle relative URLs
# Download image
img_response = requests.get(full_url)
# Save with filename from URL
filename = full_url.split("/")[-1]
with open(filename, "wb") as file:
file.write(img_response.content)
print(f"Downloaded: {filename}")
Library Comparison
| Feature | Requests | Urllib3 | Wget |
|---|---|---|---|
| Ease of use | Very easy | Moderate | Extremely easy |
| Proxy support | Built-in | Advanced options | None (Python lib) |
| Error handling | Excellent | Good | Basic |
| Multiple images | Easy with loops | Moderate | Needs other libs |
| Performance | Good | High | Basic |
| Best for | Most projects | Advanced users | Quick one-offs |
Complete Example: Scraping All Images with Proxies
python
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from requests.exceptions import HTTPError, Timeout
# Proxy setup
proxies = {
'http': 'http://username:password@proxy-host:port',
'https': 'http://username:password@proxy-host:port'
}
def download_image(img_url, folder="images/"):
"""Download single image with error handling"""
try:
response = requests.get(img_url, proxies=proxies, timeout=10)
response.raise_for_status()
filename = img_url.split("/")[-1]
with open(folder + filename, "wb") as file:
file.write(response.content)
return True
except (HTTPError, Timeout, IOError) as e:
print(f"Failed to download {img_url}: {e}")
return False
# Main scraping
page_url = "https://books.toscrape.com/"
response = requests.get(page_url, proxies=proxies)
soup = BeautifulSoup(response.text, "html.parser")
# Download all images
for img in soup.find_all("img"):
img_url = urljoin(page_url, img.get("src"))
download_image(img_url)
Key Takeaways
| Task | Recommended Library |
|---|---|
| Single image download | Requests or Wget |
| Multiple images with HTML parsing | Requests + BeautifulSoup |
| High-performance scraping | Urllib3 |
| Quick scripts | Wget |
| Avoiding blocks | Requests + rotating proxies |
| Production systems | Requests with error handling |
Pro tip: Always add delays between requests and use proxies when downloading many images to avoid being blocked.