- Python Basics
- Python - Home
- Python - Overview
- Python - History
- Python - Features
- Python vs C++
- Python - Hello World Program
- Python - Application Areas
- Python - Interpreter
- Python - Environment Setup
- Python - Virtual Environment
- Python - Basic Syntax
- Python - Variables
- Python - Data Types
- Python - Type Casting
- Python - Unicode System
- Python - Literals
- Python - Operators
- Python - Arithmetic Operators
- Python - Comparison Operators
- Python - Assignment Operators
- Python - Logical Operators
- Python - Bitwise Operators
- Python - Membership Operators
- Python - Identity Operators
- Python - Operator Precedence
- Python - Comments
- Python - User Input
- Python - Numbers
- Python - Booleans
- Python Control Statements
- Python - Control Flow
- Python - Decision Making
- Python - If Statement
- Python - If else
- Python - Nested If
- Python - Match-Case Statement
- Python - Loops
- Python - for Loops
- Python - for-else Loops
- Python - While Loops
- Python - break Statement
- Python - continue Statement
- Python - pass Statement
- Python - Nested Loops
- Python Functions & Modules
- Python - Functions
- Python - Default Arguments
- Python - Keyword Arguments
- Python - Keyword-Only Arguments
- Python - Positional Arguments
- Python - Positional-Only Arguments
- Python - Arbitrary Arguments
- Python - Variables Scope
- Python - Function Annotations
- Python - Modules
- Python - Built in Functions
- Python Strings
- Python - Strings
- Python - Slicing Strings
- Python - Modify Strings
- Python - String Concatenation
- Python - String Formatting
- Python - Escape Characters
- Python - String Methods
- Python - String Exercises
- Python Lists
- Python - Lists
- Python - Access List Items
- Python - Change List Items
- Python - Add List Items
- Python - Remove List Items
- Python - Loop Lists
- Python - List Comprehension
- Python - Sort Lists
- Python - Copy Lists
- Python - Join Lists
- Python - List Methods
- Python - List Exercises
- Python Tuples
- Python - Tuples
- Python - Access Tuple Items
- Python - Update Tuples
- Python - Unpack Tuples
- Python - Loop Tuples
- Python - Join Tuples
- Python - Tuple Methods
- Python - Tuple Exercises
- Python Sets
- Python - Sets
- Python - Access Set Items
- Python - Add Set Items
- Python - Remove Set Items
- Python - Loop Sets
- Python - Join Sets
- Python - Copy Sets
- Python - Set Operators
- Python - Set Methods
- Python - Set Exercises
- Python Dictionaries
- Python - Dictionaries
- Python - Access Dictionary Items
- Python - Change Dictionary Items
- Python - Add Dictionary Items
- Python - Remove Dictionary Items
- Python - Dictionary View Objects
- Python - Loop Dictionaries
- Python - Copy Dictionaries
- Python - Nested Dictionaries
- Python - Dictionary Methods
- Python - Dictionary Exercises
- Python Arrays
- Python - Arrays
- Python - Access Array Items
- Python - Add Array Items
- Python - Remove Array Items
- Python - Loop Arrays
- Python - Copy Arrays
- Python - Reverse Arrays
- Python - Sort Arrays
- Python - Join Arrays
- Python - Array Methods
- Python - Array Exercises
- Python File Handling
- Python - File Handling
- Python - Write to File
- Python - Read Files
- Python - Renaming and Deleting Files
- Python - Directories
- Python - File Methods
- Python - OS File/Directory Methods
- Python - OS Path Methods
- Object Oriented Programming
- Python - OOPs Concepts
- Python - Classes & Objects
- Python - Class Attributes
- Python - Class Methods
- Python - Static Methods
- Python - Constructors
- Python - Access Modifiers
- Python - Inheritance
- Python - Polymorphism
- Python - Method Overriding
- Python - Method Overloading
- Python - Dynamic Binding
- Python - Dynamic Typing
- Python - Abstraction
- Python - Encapsulation
- Python - Interfaces
- Python - Packages
- Python - Inner Classes
- Python - Anonymous Class and Objects
- Python - Singleton Class
- Python - Wrapper Classes
- Python - Enums
- Python - Reflection
- Python Errors & Exceptions
- Python - Syntax Errors
- Python - Exceptions
- Python - try-except Block
- Python - try-finally Block
- Python - Raising Exceptions
- Python - Exception Chaining
- Python - Nested try Block
- Python - User-defined Exception
- Python - Logging
- Python - Assertions
- Python - Built-in Exceptions
- Python Multithreading
- Python - Multithreading
- Python - Thread Life Cycle
- Python - Creating a Thread
- Python - Starting a Thread
- Python - Joining Threads
- Python - Naming Thread
- Python - Thread Scheduling
- Python - Thread Pools
- Python - Main Thread
- Python - Thread Priority
- Python - Daemon Threads
- Python - Synchronizing Threads
- Python Synchronization
- Python - Inter-thread Communication
- Python - Thread Deadlock
- Python - Interrupting a Thread
- Python Networking
- Python - Networking
- Python - Socket Programming
- Python - URL Processing
- Python - Generics
- Python Libraries
- NumPy Tutorial
- Pandas Tutorial
- SciPy Tutorial
- Matplotlib Tutorial
- Django Tutorial
- OpenCV Tutorial
- Python Miscellenous
- Python - Date & Time
- Python - Maths
- Python - Iterators
- Python - Generators
- Python - Closures
- Python - Decorators
- Python - Recursion
- Python - Reg Expressions
- Python - PIP
- Python - Database Access
- Python - Weak References
- Python - Serialization
- Python - Templating
- Python - Output Formatting
- Python - Performance Measurement
- Python - Data Compression
- Python - CGI Programming
- Python - XML Processing
- Python - GUI Programming
- Python - Command-Line Arguments
- Python - Docstrings
- Python - JSON
- Python - Sending Email
- Python - Further Extensions
- Python - Tools/Utilities
- Python - GUIs
- Python Useful Resources
- Python Compiler
- NumPy Compiler
- Matplotlib Compiler
- SciPy Compiler
- Python - Questions & Answers
- Python - Online Quiz
- Python - Programming Examples
- Python - Quick Guide
- Python - Useful Resources
- Python - Discussion
Python - URL Processing
In the world of Internet, different resources are identified by URLs (Uniform Resource Locators). The urllib package which is bundled with Python's standard library provides several utilities to handle URLs. It has the following modules −
urllib.parse module is used for parsing a URL into its parts.
urllib.request module contains functions for opening and reading URLs
urllib.error module carries definitions of the exceptions raised by urllib.request
urllib.robotparser module parses the robots.txt files
The urllib.parse M odule
This module serves as a standard interface to obtain various parts from a URL string. The module contains following functions −
urlparse(urlstring)
Parse a URL into six components, returning a 6-item named tuple. Each tuple item is a string corresponding to following attributes −
Attribute | Index | Value |
---|---|---|
scheme | 0 | URL scheme specifier |
netloc | 1 | Network location part |
path | 2 | Hierarchical path |
params | 3 | Parameters for last path element |
query | 4 | Query component |
fragment | 5 | Fragment identifier |
username | User name | |
password | Password | |
hostname | Host name (lower case) | |
Port | Port number as integer, if present |
Example
from urllib.parse import urlparse url = "https://example.com/employees/name/?salary>=25000" parsed_url = urlparse(url) print (type(parsed_url)) print ("Scheme:",parsed_url.scheme) print ("netloc:", parsed_url.netloc) print ("path:", parsed_url.path) print ("params:", parsed_url.params) print ("Query string:", parsed_url.query) print ("Frgment:", parsed_url.fragment)
It will produce the following output −
<class 'urllib.parse.ParseResult'> Scheme: https netloc: example.com path: /employees/name/ params: Query string: salary>=25000 Frgment:
parse_qs(qs))
This function Parse a query string given as a string argument. Data is returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name.
To further fetch the query parameters from the query string into a dictionary, use parse_qs() function of the query attribute of ParseResult object as follows −
Example
from urllib.parse import urlparse, parse_qs url = "https://example.com/employees?name=Anand&salary=25000" parsed_url = urlparse(url) dct = parse_qs(parsed_url.query) print ("Query parameters:", dct)
It will produce the following output −
Query parameters: {'name': ['Anand'], 'salary': ['25000']}
urlsplit(urlstring)
This is similar to urlparse(), but does not split the params from the URL. This should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL is wanted.
urlunparse(parts)
This function is the opposite of urlparse() function. It constructs a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable. This returns an equivalent URL.
Example
from urllib.parse import urlunparse lst = ['https', 'example.com', '/employees/name/', '', 'salary>=25000', ''] new_url = urlunparse(lst) print ("URL:", new_url)
It will produce the following output −
URL: https://example.com/employees/name/?salary>=25000
urlunsplit(parts)
Combine the elements of a tuple as returned by urlsplit() into a complete URL as a string. The parts argument can be any five-item iterable.
The urllib.request Module
This module defines functions and classes which help in opening URLs
urlopen() function
This function opens the given URL, which can be either a string or a Request object. The optional timeout parameter specifies a timeout in seconds for blocking operations This actually only works for HTTP, HTTPS and FTP connections.
This function always returns an object which can work as a context manager and has the properties url, headers, and status.
For HTTP and HTTPS URLs, this function returns a http.client.HTTPResponse object slightly modified.
Example
The following code uses urlopen() function to read the binary data from an image file, and writes it to local file. You can open the image file on your computer using any image viewer.
from urllib.request import urlopen obj = urlopen("https://www.tutorialspoint.com/static/images/simply-easy-learning.jpg") data = obj.read() img = open("img.jpg", "wb") img.write(data) img.close()
It will produce the following output −
The Request Object
The urllib.request module includes Request class. This class is an abstraction of a URL request. The constructor requires a mandatory string argument a valid URL.
Syntax
urllib.request.Request(url, data, headers, origin_req_host, method=None)
Parameters
url − A string that is a valid URL
data − An object specifying additional data to send to the server. This parameter can only be used with HTTP requests. Data may be bytes, file-like objects, and iterables of bytes-like objects.
headers − Should be a dictionary of headers and their associated values.
origin_req_host − Should be the request-host of the origin transaction
method − should be a string that indicates the HTTP request method. One of GET, POST, PUT, DELETE and other HTTP verbs. Default is GET.
Example
from urllib.request import Request obj = Request("https://www.tutorialspoint.com/")
This Request object can now be used as an argument to urlopen() method.
from urllib.request import Request, urlopen obj = Request("https://www.tutorialspoint.com/") resp = urlopen(obj)
The urlopen() function returns a HttpResponse object. Calling its read() method fetches the resource at the given URL.
from urllib.request import Request, urlopen obj = Request("https://www.tutorialspoint.com/") resp = urlopen(obj) data = resp.read() print (data)
Sending Data
If you define data argument to the Request constructor, a POST request will be sent to the server. The data should be any object represented in bytes.
Example
from urllib.request import Request, urlopen from urllib.parse import urlencode values = {'name': 'Madhu', 'location': 'India', 'language': 'Hindi' } data = urlencode(values).encode('utf-8') obj = Request("https://example.com", data)
Sending Headers
The Request constructor also accepts header argument to push header information into the request. It should be in a dictionary object.
headers = {'User-Agent': user_agent} obj = Request("https://example.com", data, headers)
The urllib.error Module
Following exceptions are defined in urllib.error module −
URLError
URLError is raised because there is no network connection (no route to the specified server), or the specified server doesn't exist. In this case, the exception raised will have a 'reason' attribute.
Example
from urllib.request import Request, urlopen import urllib.error as err obj = Request("http://www.nosuchserver.com") try: urlopen(obj) except err.URLError as e: print(e)
It will produce the following output −
HTTP Error 403: Forbidden
HTTPError
Every time the server sends a HTTP response it is associated with a numeric "status code". It code indicates why the server is unable to fulfil the request. The default handlers will handle some of these responses for you. For those it can't handle, urlopen() function raises an HTTPError. Typical examples of HTTPErrors are '404' (page not found), '403' (request forbidden), and '401' (authentication required).
Example
from urllib.request import Request, urlopen import urllib.error as err obj = Request("http://www.python.org/fish.html") try: urlopen(obj) except err.HTTPError as e: print(e.code)
It will produce the following output −
404