XML Binding

XmlParser

The parser has three instance methods from_string, from_bytes and from_path, to parse from memory or to let the parser load the input document. All of them require the target class Type to bind the input data.

Parameters

config (ParserConfig)

Namespace

Type

Description

base_url

str

A base URL for when parsing from memory and you want support for relative links eg xinclude, default: None

process_xinclude

bool

Process xinclude statements. , default: False

fail_on_unknown_properties

bool

Should fail on unknown properties that can’t be mapped to any wildcard field, default: True

context (XmlContext)

The cache layer for the binding directives of models and their fields. You may share a context instance between parser/serializer instances to avoid compiling the cache more than once.

Hint

it’s recommended to use a static or global instance of your parser or serializer per document type.

handler (type of XmlHandler)

The XmlHandler type to use in order to read the xml source and push element events to the main parser.

Default: LxmlEventHandler

Handlers

LxmlEventHandler

It’s based on lxml.etree.iterparse incremental parser and offers the best balance between features and performance. If the xinclude parser config is enabled the handler will parse the whole tree and then use iterwalk to feed the main parser with element events.

LxmlSaxHandler

It’s based on the lxml target parser interface. xinclude statements are not supported and is quite slower than the iterparse implementation.

XmlEventHandler

It’s based on the native python xml.etree.ElementTree.interparse incremental parser. xinclude statements are not supported and it doesn’t support the newly allowed characters in XML 1.1. Despite it’s drawbacks in some cases it’s slightly faster than the lxml iterparse implementation.

XmlSaxHandler

It’s based on the native python xml.sax.ContentHandler and doesn’t support xinclude statements and is a lot slower than the iterparse implementation.

Hint

Why keep them all? The hard part was the decouple of the parser from a specific implementation. The handlers are quite simple and very easy to test.

It’s also recommended to give all of them a try, based on your use case you might get different results.

You can also extend one of them if you want to do any optimization like skipping irrelevant events earlier than the binding process when it’s instructed to skip unknown properties.

Example: from path

from tests.fixtures.primer import PurchaseOrder, Usaddress
from xsdata.formats.dataclass.parsers import XmlParser
from xsdata.formats.dataclass.parsers.config import ParserConfig

config = ParserConfig(fail_on_unknown_properties=True)
parser = XmlParser(config=config)
order = parser.from_path(fixtures_dir.joinpath("primer/order.xml"), PurchaseOrder)

assert order.bill_to == Usaddress(
    name='Robert Smith',
    street='8 Oak Avenue',
    city='Old Town',
    state='PA',
    zip=95819.0
)

Example: from memory

With support for XML Inclusions

path = fixtures_dir.joinpath("books/books-xinclude.xml")
config = ParserConfig(process_xinclude=True, base_url=path.as_uri())
parser = XmlParser(config=config)
actual = parser.from_bytes(path.read_bytes(), Books)

Example: alternative handler

from xsdata.formats.dataclass.parsers.handlers import XmlEventHandler

parser = XmlParser(handler=XmlEventHandler)
order = parser.from_path(fixtures_dir.joinpath("primer/order.xml"), PurchaseOrder)

XmlSerializer

The serializer can also be initialized with a xml context instance, if your use case needs to parse and serialize the same type of objects you could share the same xml context instance between them to save on memory and processing.

Hint

The serializer used to add a default namespace if the root object supported it and moved all the prefixes to the root node with a performance penalty. This behavior was removed in version 20.10 with the new xml writer interface for consistency between implementations.

You can still get the same output if you provide a prefix-URI namespaces mapping, see examples

Parameters

encoding (str)

text encoding, default: UTF-8

pretty_print (bool)

Enable pretty output, default: False

context (XmlContext)

The cache layer for the binding directives of models and their fields. You may share a context instance between parser/serializer instances to avoid compiling the cache more than once.

writer (type of XmlWriter)

The XmlWriter type to use for serialization.

Default: LxmlEventWriter

Writers

LxmlEventWriter

It’s based on the lxml ElementTreeContentHandler, which means your object tree will first be converted to an lxml ElementTree and then to string. Despite that since it’s lxml it’s still pretty fast and supports special characters and encodings a bit better than the native python writer.

XmlEventWriter

It’s based on the native python xml.sax.saxutils.XMLGenrator with support for indentation. The object tree is converted directly to string without any intermediate steps, which makes it’s slightly faster than the lxml implementation and more memory efficient if you write directly to an output stream.

The pretty print output is identical to the lxml’s except for some mixed content cases, because of the nature of a sax content handler.

Example: render

from tests.fixtures.books import Books, BookForm
from xsdata.formats.dataclass.serializers import XmlSerializer
from xsdata.formats.dataclass.serializers.writers import XmlEventWriter

books = Books(
    book=[
        BookForm(
            id="bk001",
            author="Hightower, Kim",
            title="The First Book",
            genre="Fiction",
            price=44.95,
            pub_date="2000-10-01",
            review="An amazing story of nothing.",
        )
    ]
)

serializer = XmlSerializer(pretty_print=True)
xml = serializer.render(books)
<?xml version='1.0' encoding='UTF-8'?>
<ns0:books xmlns:ns0="urn:books">
  <book id="bk001" lang="en">
    <author>Hightower, Kim</author>
    <title>The First Book</title>
    <genre>Fiction</genre>
    <price>44.95</price>
    <pub_date>2000-10-01</pub_date>
    <review>An amazing story of nothing.</review>
  </book>
</ns0:books>

Example: custom prefixes

xml = serializer.render(books, ns_map={"bk": "urn:books"})
<?xml version='1.0' encoding='UTF-8'?>
<bk:books xmlns:bk="urn:books">
  <book id="bk001" lang="en">

Example: default prefix

xml = serializer.render(books, ns_map={None: "urn:books"})
<?xml version='1.0' encoding='UTF-8'?>
<books xmlns="urn:books">
  <book xmlns="" id="bk001" lang="en">
    <author>Hightower, Kim</author>

Example: native handler

serializer = XmlSerializer(
    pretty_print=True, encoding="US-ASCII", writer=XmlEventWriter
)
<?xml version="1.0" encoding="US-ASCII"?>
<books xmlns="urn:books">
  <book xmlns="" id="bk001" lang="en">
    <author>Hightower, Kim</author>
    <title>The First Book</title>
    <genre>Fiction</genre>
    <price>44.95</price>
    <pub_date>2000-10-01</pub_date>
    <review>An amazing story of nothing.</review>
  </book>
</books>

Example: write to stream

serializer.write(sys.stdout, books, ns_map={None: "urn:books"})
with tempfile.TemporaryFile() as fp:
    serializer.write(fp, books)

Benchmarks

The benchmarks run with the test suite.

------------------------------------- benchmark 'Parse: 100 books': 4 tests -------------------------------------
Name (time in ms)                          Min                Max               Mean             Median
-----------------------------------------------------------------------------------------------------------------
test_parse_small[XmlEventHandler]      11.0167 (1.0)      11.9290 (1.0)      11.2879 (1.0)      11.2433 (1.0)
test_parse_small[LxmlEventHandler]     11.2080 (1.02)     12.5390 (1.05)     11.4432 (1.01)     11.3900 (1.01)
test_parse_small[LxmlSaxHandler]       12.6364 (1.15)     13.3323 (1.12)     12.8680 (1.14)     12.8464 (1.14)
test_parse_small[XmlSaxHandler]        15.3508 (1.39)     17.4243 (1.46)     15.6225 (1.38)     15.5706 (1.38)
-----------------------------------------------------------------------------------------------------------------

--------------------------------------- benchmark 'Parse: 1000 books': 4 tests ---------------------------------------
Name (time in ms)                            Min                 Max                Mean              Median
----------------------------------------------------------------------------------------------------------------------
test_parse_medium[XmlEventHandler]      109.2143 (1.0)      113.9947 (1.0)      110.3843 (1.0)      109.9962 (1.0)
test_parse_medium[LxmlEventHandler]     110.9572 (1.02)     118.3406 (1.04)     112.3722 (1.02)     111.6027 (1.01)
test_parse_medium[LxmlSaxHandler]       124.0605 (1.14)     141.5221 (1.24)     133.6450 (1.21)     136.4759 (1.24)
test_parse_medium[XmlSaxHandler]        153.4569 (1.41)     155.8310 (1.37)     155.0615 (1.40)     155.1828 (1.41)
----------------------------------------------------------------------------------------------------------------------

---------------------------------- benchmark 'Parse: 10000 books': 4 tests ----------------------------------
Name (time in s)                          Min               Max              Mean            Median
-------------------------------------------------------------------------------------------------------------
test_parse_large[XmlEventHandler]      1.0975 (1.0)      1.1230 (1.0)      1.1055 (1.0)      1.1034 (1.0)
test_parse_large[LxmlEventHandler]     1.1199 (1.02)     1.1934 (1.06)     1.1433 (1.03)     1.1370 (1.03)
test_parse_large[LxmlSaxHandler]       1.2568 (1.15)     1.2955 (1.15)     1.2741 (1.15)     1.2675 (1.15)
test_parse_large[XmlSaxHandler]        1.5144 (1.38)     1.5603 (1.39)     1.5321 (1.39)     1.5273 (1.38)
-------------------------------------------------------------------------------------------------------------

------------------------------------ benchmark 'Serialize: 100 books': 2 tests -------------------------------------
Name (time in ms)                             Min                Max               Mean             Median
--------------------------------------------------------------------------------------------------------------------
test_serialize_small[XmlEventWriter]      13.5096 (1.0)      15.2481 (1.0)      14.0662 (1.0)      13.9101 (1.0)
test_serialize_small[LxmlEventWriter]     14.0560 (1.04)     17.7745 (1.17)     14.6972 (1.04)     14.3864 (1.03)
--------------------------------------------------------------------------------------------------------------------

--------------------------------------- benchmark 'Serialize: 1000 books': 2 tests --------------------------------------
Name (time in ms)                               Min                 Max                Mean              Median
-------------------------------------------------------------------------------------------------------------------------
test_serialize_medium[XmlEventWriter]      123.7788 (1.0)      125.6158 (1.0)      124.4991 (1.0)      124.4314 (1.0)
test_serialize_medium[LxmlEventWriter]     125.8150 (1.02)     130.7346 (1.04)     128.4448 (1.03)     128.3278 (1.03)
-------------------------------------------------------------------------------------------------------------------------

--------------------------------- benchmark 'Serialize: 10000 books': 2 tests ----------------------------------
Name (time in s)                             Min               Max              Mean            Median
----------------------------------------------------------------------------------------------------------------
test_serialize_large[XmlEventWriter]      1.2096 (1.0)      1.2278 (1.0)      1.2224 (1.0)      1.2243 (1.0)
test_serialize_large[LxmlEventWriter]     1.2416 (1.03)     1.3073 (1.06)     1.2751 (1.04)     1.2836 (1.05)
----------------------------------------------------------------------------------------------------------------