Use safe XML libraries to avoid XML vulnerabilities
=====================
XML vulnerabilities are known and well studied. The [defusedxml](https://pypi.python.org/pypi/defusedxml/) library provides a great synposis of XML vulnerabilities, how they're exploited, and which Python libraries are vulnerable to which attacks.
Most XML vulnerabilities essentially amount to Denial of Service attacks but as [previous blackhat presentations](https://media.blackhat.com/eu-13/briefings/Osipov/bh-eu-13-XML-data-osipov-slides.pdf) have shown, XML vulnerabilities can lead to local file reading, intranet access, and some times remote code execution.
We don't attempt to rehash the details of each vulnerability class and instead recommend those interested read [defuxedxml](https://pypi.python.org/pypi/defusedxml/)'s page, including references.
### Incorrect
Currently, the following Python XML libraries are vulnerable to some form of XML attack:
* [xml.sax](https://docs.python.org/2/library/xml.sax.html)
- vulnerable to: billion laughs, quadratic blowup, external entity expansion, DTD retrieval
* [xml.etree.ElementTree](https://docs.python.org/2/library/xml.etree.elementtree.html)
- vulnerable to: billion laughs, quadratic blowup
* [xml.dom.minidom](https://docs.python.org/2/library/xml.dom.minidom.html)
- vulnerable to: billion laughs, quadratic blowup
* [xml.dom.pulldom](https://docs.python.org/2/library/xml.dom.pulldom.html)
- vulnerable to: billion laughs, quadratic blowup, external entity expansion, DTD retrieval
* [xmlrpclib](https://docs.python.org/2/library/xmlrpclib.html)
- vulnerable to: billion laughs, quadratic blowup, decompression bomb
[Python's XML library page](https://docs.python.org/2/library/xml.html#xml-vulnerabilities) indicates that [defusedxml](https://pypi.python.org/pypi/defusedxml/) is the correct choice for XML libraries.
### Correct
#### xml.sax
Replace all xml.sax parsers with defusedxml parsers:
* ```xml.sax.parser()``` -> ```defusedxml.sax.parser()```
* ```xml.sax.parseString()``` -> ```defusedxml.sax.parseString()```
* ```xml.sax.create_parser()``` -> ```defusedxml.sax.parseString()```
Intead of this:
```python
import xml.sax
class ExampleContentHandler(xml.sax.ContentHandler):
def __init__(self):
xml.sax.ContentHandler.__init__(self)
def startElement(self, name, attrs):
print 'start:', name
def endElement(self, name):
print 'end:', name
def characters(self, content):
print 'chars:', content
def main():
xml.sax.parse(open('input.xml'), ExampleContentHandler())
if __name__ == "__main__":
main()
```
Do this:
```python
import xml.sax
import defusedxml.sax
class ExampleContentHandler(xml.sax.ContentHandler):
def __init__(self):
xml.sax.ContentHandler.__init__(self)
def startElement(self, name, attrs):
print 'start:', name
def endElement(self, name):
print 'end:', name
def characters(self, content):
print 'chars:', content
def main():
defusedxml.sax.parse(open('input.xml'), ExampleContentHandler())
if __name__ == "__main__":
main()
```
#### xml.etree.ElementTree
Replace the following instances of xml.etree.ElementTree functions with the corresponding defusedxml functions:
* ```xml.etree.ElementTree.parse()``` -> ```defusedxml.ElementTree.parse()```
* ```xml.etree.ElementTree.iterparse()``` -> ```defusedxml.ElementTree.iterparse()```
* ```xml.etree.ElementTree.fromstring()``` -> ```defusedxml.ElementTree.fromstring()```
* ```xml.etree.ElementTree.XMLParser``` -> ```defusedxml.ElementTree.XMLParser```
Intead of this:
```python
import xml.etree.ElementTree as ET
tree = ET.parse("input.xml")
root = tree.getroot()
```
Do this:
```python
import defusedxml.ElementTree as ET
tree = ET.parse("input.xml")
root = tree.getroot()
```
#### xml.etree.cElementTree
Replace the following instances of xml.etree.cElementTree functions with the corresponding defusedxml functions:
* ```xml.etree.cElementTree.parse()``` -> ```defusedxml.cElementTree.parse()```
* ```xml.etree.cElementTree.iterparse()``` -> ```defusedxml.cElementTree.iterparse()```
* ```xml.etree.cElementTree.fromstring()``` -> ```defusedxml.cElementTree.fromstring()```
* ```xml.etree.cElementTree.XMLParser``` -> ```defusedxml.cElementTree.XMLParser```
Intead of this:
```python
import xml.etree.cElementTree as ET
tree = ET.parse("input.xml")
root = tree.getroot()
```
Do this:
```python
import defusedxml.cElementTree as ET
tree = ET.parse("input.xml")
root = tree.getroot()
```
#### xml.dom.minidom
Replace the following instances of xml.dom.minidom functions with the corresponding defusedxml functions:
* ```xml.dom.minidom.parse()``` -> ```defusedxml.minidom.parse()```
* ```xml.dom.minidom.parseString()``` -> ```defusedxml.minidom.parseString()```
Intead of this:
```python
from xml.dom.minidom import parseString
parseString('Some data some more data')
```
Do this:
```python
from defusedxml.minidom import parseString
parseString('Some data some more data')
```
#### xml.dom.pulldom
Replace the following instances of xml.dom.pulldom functions with the corresponding defusedxml functions:
* ```xml.dom.pulldom.parse()``` -> ```defusedxml.pulldom.parse()```
* ```xml.dom.pulldom.parseString()``` -> ```defusedxml.pulldom.parseString()```
Intead of this:
```python
from xml.dom.pulldom import parseString
parseString('Some data some more data')
```
Do this:
```python
from defusedxml.pulldom import parseString
parseString('Some data some more data')
```
#### xmlrpclib
Taken directly from the defusedxml page:
"The function monkey_patch() enables the fixes, unmonkey_patch() removes the patch and puts the code in its former state."
Intead of this:
```python
from xmlrpclib import ServerProxy, Error
server = ServerProxy("http://betty.userland.com")
print server
try:
print server.examples.getStateName(41)
except Error as v:
print "ERROR", v
```
Do this:
```python
from xmlrpclib import ServerProxy, Error
import defusedxml.xmlrpc
defusedxml.xmlrpc.monkey_patch()
server = ServerProxy("http://betty.userland.com")
print server
try:
print server.examples.getStateName(41)
except Error as v:
print "ERROR", v
```
#### lxml.etree
Replace the following instances of lxml functions with the corresponding defusedxml functions:
* ```lxml.etree.parse()``` -> ```defusedxml.lxml.parse```
* ```lxml.etree.fromstring()``` -> ```defusedxml.lxml.fromstring()```
* ```lxml.etree.RestrictedElement()``` -> ```defusedxml.lxml.RestrictedElement()```
* ```lxml.etree.getDefaultParser()``` -> ```defusedxml.lxml.getDefaultParser()```
* ```lxml.etree.check_docinfo()``` -> ```defusedxml.lxml.check_docinfo()```
Intead of this:
```python
from lxml import etree
root = etree.parse('input.xml')
```
Do this:
```python
from defusedxml.lxml import parse
root = parse('input.xml')
```
## References
* https://pypi.python.org/pypi/defusedxml/
* https://media.blackhat.com/eu-13/briefings/Osipov/bh-eu-13-XML-data-osipov-slides.pdf
* https://docs.python.org/2/library/xml.sax.html
* https://docs.python.org/2/library/xml.etree.elementtree.html
* https://docs.python.org/2/library/xml.dom.minidom.html
* https://docs.python.org/2/library/xml.dom.pulldom.html
* https://docs.python.org/2/library/xmlrpclib.html
* https://docs.python.org/2/library/xml.html#xml-vulnerabilities