- Python Basics
- Python - Home
- Python - Overview
- Python - History
- Python - Features
- Python vs C++
- Python - Hello World Program
- Python - Application Areas
- Python - Interpreter
- Python - Environment Setup
- Python - Virtual Environment
- Python - Basic Syntax
- Python - Variables
- Python - Data Types
- Python - Type Casting
- Python - Unicode System
- Python - Literals
- Python - Operators
- Python - Arithmetic Operators
- Python - Comparison Operators
- Python - Assignment Operators
- Python - Logical Operators
- Python - Bitwise Operators
- Python - Membership Operators
- Python - Identity Operators
- Python - Operator Precedence
- Python - Comments
- Python - User Input
- Python - Numbers
- Python - Booleans
- Python Control Statements
- Python - Control Flow
- Python - Decision Making
- Python - If Statement
- Python - If else
- Python - Nested If
- Python - Match-Case Statement
- Python - Loops
- Python - for Loops
- Python - for-else Loops
- Python - While Loops
- Python - break Statement
- Python - continue Statement
- Python - pass Statement
- Python - Nested Loops
- Python Functions & Modules
- Python - Functions
- Python - Default Arguments
- Python - Keyword Arguments
- Python - Keyword-Only Arguments
- Python - Positional Arguments
- Python - Positional-Only Arguments
- Python - Arbitrary Arguments
- Python - Variables Scope
- Python - Function Annotations
- Python - Modules
- Python - Built in Functions
- Python Strings
- Python - Strings
- Python - Slicing Strings
- Python - Modify Strings
- Python - String Concatenation
- Python - String Formatting
- Python - Escape Characters
- Python - String Methods
- Python - String Exercises
- Python Lists
- Python - Lists
- Python - Access List Items
- Python - Change List Items
- Python - Add List Items
- Python - Remove List Items
- Python - Loop Lists
- Python - List Comprehension
- Python - Sort Lists
- Python - Copy Lists
- Python - Join Lists
- Python - List Methods
- Python - List Exercises
- Python Tuples
- Python - Tuples
- Python - Access Tuple Items
- Python - Update Tuples
- Python - Unpack Tuples
- Python - Loop Tuples
- Python - Join Tuples
- Python - Tuple Methods
- Python - Tuple Exercises
- Python Sets
- Python - Sets
- Python - Access Set Items
- Python - Add Set Items
- Python - Remove Set Items
- Python - Loop Sets
- Python - Join Sets
- Python - Copy Sets
- Python - Set Operators
- Python - Set Methods
- Python - Set Exercises
- Python Dictionaries
- Python - Dictionaries
- Python - Access Dictionary Items
- Python - Change Dictionary Items
- Python - Add Dictionary Items
- Python - Remove Dictionary Items
- Python - Dictionary View Objects
- Python - Loop Dictionaries
- Python - Copy Dictionaries
- Python - Nested Dictionaries
- Python - Dictionary Methods
- Python - Dictionary Exercises
- Python Arrays
- Python - Arrays
- Python - Access Array Items
- Python - Add Array Items
- Python - Remove Array Items
- Python - Loop Arrays
- Python - Copy Arrays
- Python - Reverse Arrays
- Python - Sort Arrays
- Python - Join Arrays
- Python - Array Methods
- Python - Array Exercises
- Python File Handling
- Python - File Handling
- Python - Write to File
- Python - Read Files
- Python - Renaming and Deleting Files
- Python - Directories
- Python - File Methods
- Python - OS File/Directory Methods
- Python - OS Path Methods
- Object Oriented Programming
- Python - OOPs Concepts
- Python - Classes & Objects
- Python - Class Attributes
- Python - Class Methods
- Python - Static Methods
- Python - Constructors
- Python - Access Modifiers
- Python - Inheritance
- Python - Polymorphism
- Python - Method Overriding
- Python - Method Overloading
- Python - Dynamic Binding
- Python - Dynamic Typing
- Python - Abstraction
- Python - Encapsulation
- Python - Interfaces
- Python - Packages
- Python - Inner Classes
- Python - Anonymous Class and Objects
- Python - Singleton Class
- Python - Wrapper Classes
- Python - Enums
- Python - Reflection
- Python Errors & Exceptions
- Python - Syntax Errors
- Python - Exceptions
- Python - try-except Block
- Python - try-finally Block
- Python - Raising Exceptions
- Python - Exception Chaining
- Python - Nested try Block
- Python - User-defined Exception
- Python - Logging
- Python - Assertions
- Python - Built-in Exceptions
- Python Multithreading
- Python - Multithreading
- Python - Thread Life Cycle
- Python - Creating a Thread
- Python - Starting a Thread
- Python - Joining Threads
- Python - Naming Thread
- Python - Thread Scheduling
- Python - Thread Pools
- Python - Main Thread
- Python - Thread Priority
- Python - Daemon Threads
- Python - Synchronizing Threads
- Python Synchronization
- Python - Inter-thread Communication
- Python - Thread Deadlock
- Python - Interrupting a Thread
- Python Networking
- Python - Networking
- Python - Socket Programming
- Python - URL Processing
- Python - Generics
- Python Libraries
- NumPy Tutorial
- Pandas Tutorial
- SciPy Tutorial
- Matplotlib Tutorial
- Django Tutorial
- OpenCV Tutorial
- Python Miscellenous
- Python - Date & Time
- Python - Maths
- Python - Iterators
- Python - Generators
- Python - Closures
- Python - Decorators
- Python - Recursion
- Python - Reg Expressions
- Python - PIP
- Python - Database Access
- Python - Weak References
- Python - Serialization
- Python - Templating
- Python - Output Formatting
- Python - Performance Measurement
- Python - Data Compression
- Python - CGI Programming
- Python - XML Processing
- Python - GUI Programming
- Python - Command-Line Arguments
- Python - Docstrings
- Python - JSON
- Python - Sending Email
- Python - Further Extensions
- Python - Tools/Utilities
- Python - GUIs
- Python Useful Resources
- Python Compiler
- NumPy Compiler
- Matplotlib Compiler
- SciPy Compiler
- Python - Questions & Answers
- Python - Online Quiz
- Python - Programming Examples
- Python - Quick Guide
- Python - Useful Resources
- Python - Discussion
Python - XML Processing
XML is a portable, open-source language that allows programmers to develop applications that can be read by other applications, regardless of operating system and/or developmental language.
What is XML?
The Extensible Markup Language (XML) is a markup language much like HTML or SGML. This is recommended by the World Wide Web Consortium and available as an open standard.
XML is extremely useful for keeping track of small to medium amounts of data without requiring an SQL- based backbone.
XML Parser Architectures and APIs.
The Python standard library provides a minimal but useful set of interfaces to work with XML. All the submodules for XML processing are available in the xml package.
xml.etree.ElementTree − the ElementTree API, a simple and lightweight XML processor
xml.dom − the DOM API definition.
xml.dom.minidom − a minimal DOM implementation.
xml.dom.pulldom − support for building partial DOM trees.
xml.sax − SAX2 base classes and convenience functions.
xml.parsers.expat − the Expat parser binding.
The two most basic and broadly used APIs to XML data are the SAX and DOM interfaces.
Simple API for XML (SAX) − Here, you register callbacks for events of interest and then let the parser proceed through the document. This is useful when your documents are large or you have memory limitations, it parses the file as it reads it from the disk and the entire file is never stored in the memory.
Document Object Model (DOM) − This is a World Wide Web Consortium recommendation wherein the entire file is read into the memory and stored in a hierarchical (tree-based) form to represent all the features of an XML document.
SAX obviously cannot process information as fast as DOM, when working with large files. On the other hand, using DOM exclusively can really kill your resources, especially if used on many small files.
SAX is read-only, while DOM allows changes to the XML file. Since these two different APIs literally complement each other, there is no reason why you cannot use them both for large projects.
For all our XML code examples, let us use a simple XML file movies.xml as an input −
<collection shelf="New Arrivals"> <movie title="Enemy Behind"> <type>War, Thriller</type> <format>DVD</format> <year>2003</year> <rating>PG</rating> <stars>10</stars> <description>Talk about a US-Japan war</description> </movie> <movie title="Transformers"> <type>Anime, Science Fiction</type> <format>DVD</format> <year>1989</year> <rating>R</rating> <stars>8</stars> <description>A schientific fiction</description> </movie> <movie title="Trigun"> <type>Anime, Action</type> <format>DVD</format> <episodes>4</episodes> <rating>PG</rating> <stars>10</stars> <description>Vash the Stampede!</description> </movie> <movie title="Ishtar"> <type>Comedy</type> <format>VHS</format> <rating>PG</rating> <stars>2</stars> <description>Viewable boredom</description> </movie> </collection>
Parsing XML with SAX APIs
SAX is a standard interface for event-driven XML parsing. Parsing XML with SAX generally requires you to create your own ContentHandler by subclassing xml.sax.ContentHandler.
Your ContentHandler handles the particular tags and attributes of your flavor(s) of XML. A ContentHandler object provides methods to handle various parsing events. Its owning parser calls ContentHandler methods as it parses the XML file.
The methods startDocument and endDocument are called at the start and the end of the XML file. The method characters(text) is passed the character data of the XML file via the parameter text.
The ContentHandler is called at the start and end of each element. If the parser is not in namespace mode, the methods startElement(tag, attributes) andendElement(tag) are called; otherwise, the corresponding methodsstartElementNS and endElementNS are called. Here, tag is the element tag, and attributes is an Attributes object.
Here are other important methods to understand before proceeding −
The make_parser Method
The following method creates a new parser object and returns it. The parser object created will be of the first parser type, the system finds.
xml.sax.make_parser( [parser_list] )
Here is the detail of the parameters −
parser_list − The optional argument consisting of a list of parsers to use, which must all implement the make_parser method.
The parse Method
The following method creates a SAX parser and uses it to parse a document.
xml.sax.parse( xmlfile, contenthandler[, errorhandler])
Here are the details of the parameters −
xmlfile − This is the name of the XML file to read from.
contenthandler − This must be a ContentHandler object.
errorhandler − If specified, errorhandler must be a SAX ErrorHandler object.
The parseString Method
There is one more method to create a SAX parser and to parse the specifiedXML string.
xml.sax.parseString(xmlstring, contenthandler[, errorhandler])
Here are the details of the parameters −
xmlstring − This is the name of the XML string to read from.
contenthandler − This must be a ContentHandler object.
errorhandler − If specified, errorhandler must be a SAX ErrorHandler object.
Example
import xml.sax class MovieHandler( xml.sax.ContentHandler ): def __init__(self): self.CurrentData = "" self.type = "" self.format = "" self.year = "" self.rating = "" self.stars = "" self.description = "" # Call when an element starts def startElement(self, tag, attributes): self.CurrentData = tag if tag == "movie": print ("*****Movie*****") title = attributes["title"] print ("Title:", title) # Call when an elements ends def endElement(self, tag): if self.CurrentData == "type": print ("Type:", self.type) elif self.CurrentData == "format": print ("Format:", self.format) elif self.CurrentData == "year": print ("Year:", self.year) elif self.CurrentData == "rating": print ("Rating:", self.rating) elif self.CurrentData == "stars": print ("Stars:", self.stars) elif self.CurrentData == "description": print ("Description:", self.description) self.CurrentData = "" # Call when a character is read def characters(self, content): if self.CurrentData == "type": self.type = content elif self.CurrentData == "format": self.format = content elif self.CurrentData == "year": self.year = content elif self.CurrentData == "rating": self.rating = content elif self.CurrentData == "stars": self.stars = content elif self.CurrentData == "description": self.description = content if ( __name__ == "__main__"): # create an XMLReader parser = xml.sax.make_parser() # turn off namepsaces parser.setFeature(xml.sax.handler.feature_namespaces, 0) # override the default ContextHandler Handler = MovieHandler() parser.setContentHandler( Handler ) parser.parse("movies.xml")
This would produce the following result −
*****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Year: 2003 Rating: PG Stars: 10 Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Year: 1989 Rating: R Stars: 8 Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Stars: 10 Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Stars: 2 Description: Viewable boredom
For a complete detail on SAX API documentation, please refer to standard Python SAX APIs.
Parsing XML with DOM APIs
The Document Object Model ("DOM") is a cross-language API from the World Wide Web Consortium (W3C) for accessing and modifying the XML documents.
The DOM is extremely useful for random-access applications. SAX only allows you a view of one bit of the document at a time. If you are looking at one SAX element, you have no access to another.
Here is the easiest way to load an XML document quickly and to create a minidom object using the xml.dom module. The minidom object provides a simple parser method that quickly creates a DOM tree from the XML file.
The sample phrase calls the parse( file [,parser] ) function of the minidom object to parse the XML file, designated by file into a DOM tree object.
from xml.dom.minidom import parse import xml.dom.minidom # Open XML document using minidom parser DOMTree = xml.dom.minidom.parse("movies.xml") collection = DOMTree.documentElement if collection.hasAttribute("shelf"): print ("Root element : %s" % collection.getAttribute("shelf")) # Get all the movies in the collection movies = collection.getElementsByTagName("movie") # Print detail of each movie. for movie in movies: print ("*****Movie*****") if movie.hasAttribute("title"): print ("Title: %s" % movie.getAttribute("title")) type = movie.getElementsByTagName('type')[0] print ("Type: %s" % type.childNodes[0].data) format = movie.getElementsByTagName('format')[0] print ("Format: %s" % format.childNodes[0].data) rating = movie.getElementsByTagName('rating')[0] print ("Rating: %s" % rating.childNodes[0].data) description = movie.getElementsByTagName('description')[0] print ("Description: %s" % description.childNodes[0].data)
This would produce the following output −
Root element : New Arrivals *****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Rating: PG Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Rating: R Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Description: Viewable boredom
For a complete detail on DOM API documentation, please refer to standard Python DOM APIs.
ElementTree XML API
The xml package has an ElementTree module. This is a simple and lightweight XML processor API.
XML is a tree-like hierarchical data format. The 'ElementTree' in this module treats the whole XML document as a tree. The 'Element' class represents a single node in this tree. Reading and writing operations on XML files are done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.
Create an XML File
The tree is a hierarchical structure of elements starting with root followed by other elements. Each element is created by using the Element() function of this module.
import xml.etree.ElementTree as et e=et.Element('name')
Each element is characterized by a tag and attrib attribute which is a dict object. For tree's starting element, attrib is an empty dictionary.
>>> root=xml.Element('employees') >>> root.tag 'employees' >>> root.attrib {}
You may now set up one or more child elements to be added under the root element. Each child may have one or more sub elements. Add them using the SubElement() function and define its text attribute.
child=xml.Element("employee") nm = xml.SubElement(child, "name") nm.text = student.get('name') age = xml.SubElement(child, "salary") age.text = str(student.get('salary'))
Each child is added to root by append() function as −
root.append(child)
After adding required number of child elements, construct a tree object by elementTree() function −
tree = et.ElementTree(root)
The entire tree structure is written to a binary file by tree object's write() function −
f=open('employees.xml', "wb") tree.write(f)
Example
In this example, a tree is constructed out of a list of dictionary items. Each dictionary item holds key-value pairs describing a student data structure. The tree so constructed is written to 'myfile.xml'
import xml.etree.ElementTree as et employees=[{'name':'aaa','age':21,'sal':5000},{'name':xyz,'age':22,'sal':6000}] root = et.Element("employees") for employee in employees: child=xml.Element("employee") root.append(child) nm = xml.SubElement(child, "name") nm.text = student.get('name') age = xml.SubElement(child, "age") age.text = str(student.get('age')) sal=xml.SubElement(child, "sal") sal.text=str(student.get('sal')) tree = et.ElementTree(root) with open('employees.xml', "wb") as fh: tree.write(fh)
The 'myfile.xml' is stored in current working directory.
<employees><employee><name>aaa</name><age>21</age><sal>5000</sal></employee><employee><name>xyz</name><age>22</age><sal>60</sal></employee></employee>
Parse an XML File
Let us now read back the 'myfile.xml' created in above example. For this purpose, following functions in ElementTree module will be used −
ElementTree() − This function is overloaded to read the hierarchical structure of elements to a tree objects.
tree = et.ElementTree(file='students.xml')
getroot() − This function returns root element of the tree.
root = tree.getroot()
You can obtain the list of sub-elements one level below of an element.
children = list(root)
In the following example, elements and sub-elements of the 'myfile.xml' are parsed into a list of dictionary items.
Example
import xml.etree.ElementTree as et tree = et.ElementTree(file='employees.xml') root = tree.getroot() employees=[] children = list(root) for child in children: employee={} pairs = list(child) for pair in pairs: employee[pair.tag]=pair.text employees.append(employee) print (employees)
It will produce the following output −
[{'name': 'aaa', 'age': '21', 'sal': '5000'}, {'name': 'xyz', 'age':'22', 'sal': '6000'}]
Modify an XML file
We shall use iter() function of Element. It creates a tree iterator for given tag with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order.
Let us build iterator for all 'marks' subelements and increment text of each sal tag by 100.
import xml.etree.ElementTree as et tree = et.ElementTree(file='students.xml') root = tree.getroot() for x in root.iter('sal'): s=int (x.text) s=s+100 x.text=str(s) with open("employees.xml", "wb") as fh: tree.write(fh)
Our 'employees.xml' will now be modified accordingly. We can also use set() to update value of a certain key.
x.set(marks, str(mark))