- Notifications
You must be signed in to change notification settings - Fork46
Fast and well tested serialization library
License
Fatal1ty/mashumaro
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
In Python, you often need to dump and load objects based on the schema youhave. It can be a dataclass model, a list of third-party generic classes orwhatever. Mashumaro not only lets you save and load things in different ways,but it also does itsuper quick.
Key features
- 🚀 One of the fastest libraries
- ☝️ Mature and time-tested
- 👶 Easy to use out of the box
- ⚙️ Highly customizable
- 🎉 Built-in support for JSON, YAML, TOML, MessagePack
- 📦 Built-in support for almost all Python types including typing-extensions
- 📝 JSON Schema generation
- Table of contents
- Introduction
- Installation
- Changelog
- Supported data types
- Usage example
- How does it work?
- Benchmark
- Supported serialization formats
- Customization
- SerializableType interface
- SerializationStrategy
- Field options
- Config options
debug
config optioncode_generation_options
config optionserialization_strategy
config optionaliases
config optionserialize_by_alias
config optionallow_deserialization_not_by_alias
config optionomit_none
config optionomit_default
config optionnamedtuple_as_dict
config optionallow_postponed_evaluation
config optiondialect
config optionorjson_options
config optiondiscriminator
config optionlazy_compilation
config optionsort_keys
config optionforbid_extra_keys
config option
- Passing field values as is
- Extending existing types
- Field aliases
- Dialects
- Discriminator
- Code generation options
- Generic dataclasses
- GenericSerializableType interface
- Serialization hooks
- JSON Schema
This library provides two fundamentally different approaches to convertingyour data to and from various formats. Each of them is useful in differentsituations:
- Codecs
- Mixins
Codecs are represented by a set of decoder / encoder classes anddecode / encode functions for each supported format. You can use themto convert data of any python built-in and third-party type to JSON, YAML,TOML, MessagePack or a basic form accepted by other serialization formats.For example, you can convert a list of datetime objects to JSON arraycontaining string-represented datetimes and vice versa.
Mixins are primarily for dataclass models. They are represented by mixinclasses that add methods for converting to and from JSON, YAML, TOML,MessagePack or a basic form accepted by other serialization formats.If you have a root dataclass model, then it will be the easiest way to make itserializable. All you have to do is inherit a particular mixin class.
In addition to serialization functionality, this library also provides JSONSchema builder that can be used in places where interoperability matters.
Use pip to install:
$ pip install mashumaro
The current version ofmashumaro
supports Python versions 3.9 — 3.13.
It's not recommended to use any version of Python that has reached itsend of life and is no longer receivingsecurity updates or bug fixes from the Python development team.For convenience, there is a table below that outlines thelast version ofmashumaro
that can be installed on unmaintained versionsof Python.
Python Version | Last Version of mashumaro | Python EOL |
---|---|---|
3.8 | 3.14 | 2024-10-07 |
3.7 | 3.9.1 | 2023-06-27 |
3.6 | 3.1.1 | 2021-12-23 |
This project follows the principles ofSemantic Versioning.Changelog is available onGitHub Releases page.
There is support for generic types from the standardtyping
module:
List
Tuple
NamedTuple
Set
FrozenSet
Deque
Dict
OrderedDict
DefaultDict
TypedDict
Mapping
MutableMapping
Counter
ChainMap
Sequence
for standard generic types onPEP 585 compatible Python (3.9+):
list
tuple
namedtuple
set
frozenset
collections.abc.Set
collections.abc.MutableSet
collections.deque
dict
collections.OrderedDict
collections.defaultdict
collections.abc.Mapping
collections.abc.MutableMapping
collections.Counter
collections.ChainMap
collections.abc.Sequence
collections.abc.MutableSequence
for special primitives from thetyping
module:
for standard interpreter types fromtypes
module:
for enumerations based on classes from the standardenum
module:
for common built-in types:
for built-in datetime oriented types (seemore details):
for pathlike types:
for other less popular built-in types:
uuid.UUID
decimal.Decimal
fractions.Fraction
ipaddress.IPv4Address
ipaddress.IPv6Address
ipaddress.IPv4Network
ipaddress.IPv6Network
ipaddress.IPv4Interface
ipaddress.IPv6Interface
typing.Pattern
re.Pattern
for backported types fromtyping-extensions
:
for arbitrary types:
Suppose we're developing a financial application and we operate with currenciesand stocks:
fromdataclassesimportdataclassfromenumimportEnumclassCurrency(Enum):USD="USD"EUR="EUR"@dataclassclassCurrencyPosition:currency:Currencybalance:float@dataclassclassStockPosition:ticker:strname:strbalance:int
Now we want a dataclass for portfolio that will be serialized to and from JSON.We inheritDataClassJSONMixin
that adds this functionality:
frommashumaro.mixins.jsonimportDataClassJSONMixin...@dataclassclassPortfolio(DataClassJSONMixin):currencies:list[CurrencyPosition]stocks:list[StockPosition]
Let's create a portfolio instance and check methodsfrom_json
andto_json
:
portfolio=Portfolio(currencies=[CurrencyPosition(Currency.USD,238.67),CurrencyPosition(Currency.EUR,361.84), ],stocks=[StockPosition("AAPL","Apple",10),StockPosition("AMZN","Amazon",10), ])portfolio_json=portfolio.to_json()assertPortfolio.from_json(portfolio_json)==portfolio
If we need to serialize something different from a root dataclass,we can use codecs. In the following example we create a JSON decoder and encoderfor a list of currencies:
frommashumaro.codecs.jsonimportJSONDecoder,JSONEncoder...decoder=JSONDecoder(list[CurrencyPosition])encoder=JSONEncoder(list[CurrencyPosition])currencies= [CurrencyPosition(Currency.USD,238.67),CurrencyPosition(Currency.EUR,361.84),]currencies_json=encoder.encode(currencies)assertdecoder.decode(currencies_json)==currencies
This library works by taking the schema of the data and generating aspecific decoder and encoder for exactly that schema, taking into account thespecifics of serialization format. This is much faster than inspection ofdata types on every call of decoding or encoding at runtime.
These specific decoders and encoders are generated bycodecs and mixins:
- When using codecs, these methods are compiled during the creation of thedecoder or encoder.
- When using serializationmixins, these methods are compiled during import time (or at runtime in somecases) and are set as attributes to your dataclasses. To minimize the importtime, you can explicitly enablelazy compilation.
- macOS 15.1 Sequoia
- Apple M3 Max
- 36GB RAM
- Python 3.13.0
Benchmark usingpyperf with GitHub Issue model. Please note that thefollowing charts use logarithmic scale, as it is convenient for displayingvery large ranges of values.
Note
Benchmark results may vary depending on the specific configuration andparameters used for serialization and deserialization. However, we have madean attempt to use the available options that can speed up and smooth out thedifferences in how libraries work.
To run benchmark in your environment:
git clone git@github.com:Fatal1ty/mashumaro.gitcd mashumaropython3 -m venv env&&source env/bin/activatepip install -e.pip install -r requirements-dev.txt./benchmark/run.sh
This library has built-in support for multiple popular formats:
There are preconfigured codecs and mixin classes. However, you're freeto override some settings if necessary.
Important
As for codecs, you areoffered to choose between convenience and efficiency. When you need to decodeor encode typed data more than once, it's highly recommended to createand reuse a decoder or encoder specifically for that data type. For one-timeuse with default settings it may be convenient to use global functions thatcreate a disposable decoder or encoder under the hood. Remember that youshould not use these convenient global functions more that once for the samedata type if performance is important to you.
Basic form denotes a python object consisting only of basic data typessupported by most serialization formats. These types are:str
,int
,float
,bool
,list
,dict
.
This is also a starting point you can play with for a comprehensivetransformation of your data.
Efficient decoder and encoder can be used as follows:
frommashumaro.codecsimportBasicDecoder,BasicEncoder# or from mashumaro.codecs.basic import BasicDecoder, BasicEncoderdecoder=BasicDecoder(<shape_type>, ...)decoder.decode(...)encoder=BasicEncoder(<shape_type>, ...)encoder.encode(...)
Convenient functions are recommended to be used as follows:
importmashumaro.codecs.basicasbasic_codecbasic_codec.decode(...,<shape_type>)basic_codec.encode(...,<shape_type>)
Mixin can be used as follows:
frommashumaroimportDataClassDictMixin# or from mashumaro.mixins.dict import DataClassDictMixin@dataclassclassMyModel(DataClassDictMixin): ...MyModel.from_dict(...)MyModel(...).to_dict()
Tip
You don't need to inheritDataClassDictMixin
along with other serializationmixins because it's a base class for them.
JSON is a lightweight data-interchange format. You canchoose between standard libraryjson for compatibility andthird-party dependencyorjson for betterperformance.
Efficient decoder and encoder can be used as follows:
frommashumaro.codecs.jsonimportJSONDecoder,JSONEncoderdecoder=JSONDecoder(<shape_type>, ...)decoder.decode(...)encoder=JSONEncoder(<shape_type>, ...)encoder.encode(...)
Convenient functions can be used as follows:
frommashumaro.codecs.jsonimportjson_decode,json_encodejson_decode(...,<shape_type>)json_encode(...,<shape_type>)
Convenient function aliases are recommended to be used as follows:
importmashumaro.codecs.jsonasjson_codecjson_codec.decode(...<shape_type>)json_codec.encode(...,<shape_type>)
Mixin can be used as follows:
frommashumaro.mixins.jsonimportDataClassJSONMixin@dataclassclassMyModel(DataClassJSONMixin): ...MyModel.from_json(...)MyModel(...).to_json()
In order to useorjson
library, it mustbe installed manually or using an extra option formashumaro
:
pip install mashumaro[orjson]
The following data types will be handled byorjson
library by default:
Efficient decoder and encoder can be used as follows:
frommashumaro.codecs.orjsonimportORJSONDecoder,ORJSONEncoderdecoder=ORJSONDecoder(<shape_type>, ...)decoder.decode(...)encoder=ORJSONEncoder(<shape_type>, ...)encoder.encode(...)
Convenient functions can be used as follows:
frommashumaro.codecs.orjsonimportjson_decode,json_encodejson_decode(...,<shape_type>)json_encode(...,<shape_type>)
Convenient function aliases are recommended to be used as follows:
importmashumaro.codecs.orjsonasjson_codecjson_codec.decode(...<shape_type>)json_codec.encode(...,<shape_type>)
Mixin can be used as follows:
frommashumaro.mixins.orjsonimportDataClassORJSONMixin@dataclassclassMyModel(DataClassORJSONMixin): ...MyModel.from_json(...)MyModel(...).to_json()MyModel(...).to_jsonb()
YAML is a human-friendly data serialization language forall programming languages. In order to use this format, thepyyaml
package must be installed.You can install it manually or using an extra option formashumaro
:
pip install mashumaro[yaml]
Efficient decoder and encoder can be used as follows:
frommashumaro.codecs.yamlimportYAMLDecoder,YAMLEncoderdecoder=YAMLDecoder(<shape_type>, ...)decoder.decode(...)encoder=YAMLEncoder(<shape_type>, ...)encoder.encode(...)
Convenient functions can be used as follows:
frommashumaro.codecs.yamlimportyaml_decode,yaml_encodeyaml_decode(...,<shape_type>)yaml_encode(...,<shape_type>)
Convenient function aliases are recommended to be used as follows:
importmashumaro.codecs.yamlasyaml_codecyaml_codec.decode(...<shape_type>)yaml_codec.encode(...,<shape_type>)
Mixin can be used as follows:
frommashumaro.mixins.yamlimportDataClassYAMLMixin@dataclassclassMyModel(DataClassYAMLMixin): ...MyModel.from_yaml(...)MyModel(...).to_yaml()
TOML is config file format for humans.In order to use this format, thetomli
andtomli-w
packages must be installed.In Python 3.11+,tomli
is included astomlib
standard librarymodule and is used for this format. You can install the missing packagesmanually or using an extra optionformashumaro
:
pip install mashumaro[toml]
The following data types will be handled bytomli
/tomli-w
library by default:
Fields with valueNone
will be omitted on serialization because TOMLdoesn't support null values.
Efficient decoder and encoder can be used as follows:
frommashumaro.codecs.tomlimportTOMLDecoder,TOMLEncoderdecoder=TOMLDecoder(<shape_type>, ...)decoder.decode(...)encoder=TOMLEncoder(<shape_type>, ...)encoder.encode(...)
Convenient functions can be used as follows:
frommashumaro.codecs.tomlimporttoml_decode,toml_encodetoml_decode(...,<shape_type>)toml_encode(...,<shape_type>)
Convenient function aliases are recommended to be used as follows:
importmashumaro.codecs.tomlastoml_codectoml_codec.decode(...<shape_type>)toml_codec.encode(...,<shape_type>)
Mixin can be used as follows:
frommashumaro.mixins.tomlimportDataClassTOMLMixin@dataclassclassMyModel(DataClassTOMLMixin): ...MyModel.from_toml(...)MyModel(...).to_toml()
MessagePack is an efficient binary serialization format.In order to use this mixin, themsgpack
package must be installed. You can install it manually or using an extraoption formashumaro
:
pip install mashumaro[msgpack]
The following data types will be handled bymsgpack
library by default:
Efficient decoder and encoder can be used as follows:
frommashumaro.codecs.msgpackimportMessagePackDecoder,MessagePackEncoderdecoder=MessagePackDecoder(<shape_type>, ...)decoder.decode(...)encoder=MessagePackEncoder(<shape_type>, ...)encoder.encode(...)
Convenient functions can be used as follows:
frommashumaro.codecs.msgpackimportmsgpack_decode,msgpack_encodemsgpack_decode(...,<shape_type>)msgpack_encode(...,<shape_type>)
Convenient function aliases are recommended to be used as follows:
importmashumaro.codecs.msgpackasmsgpack_codecmsgpack_codec.decode(...<shape_type>)msgpack_codec.encode(...,<shape_type>)
Mixin can be used as follows:
frommashumaro.mixins.msgpackimportDataClassMessagePackMixin@dataclassclassMyModel(DataClassMessagePackMixin): ...MyModel.from_msgpack(...)MyModel(...).to_msgpack()
Customization options ofmashumaro
are extensive and will most likely cover your needs.When it comes to non-standard data types and non-standard serialization support, you can do the following:
- Turn an existing regular or generic class into a serializable oneby inheriting the
SerializableType
class - Write different serialization strategies for an existing regular or generic type that is not under your controlusing
SerializationStrategy
class - Define serialization / deserialization methods:
- for a specific dataclass field by usingfield options
- for a specific data type used in the dataclass by using
Config
class
- Alter input and output data with serialization / deserializationhooks
- Separate serialization scheme from a dataclass in a reusable manner usingdialects
- Choose from predefined serialization engines for the specific data types, e.g.
datetime
andNamedTuple
If you have a custom class or hierarchy of classes whose instances you wantto serialize withmashumaro
, the first option is to implementSerializableType
interface.
Let's look at this not very practicable example:
fromdataclassesimportdataclassfrommashumaroimportDataClassDictMixinfrommashumaro.typesimportSerializableTypeclassAirport(SerializableType):def__init__(self,code,city):self.code,self.city=code,citydef_serialize(self):return [self.code,self.city]@classmethoddef_deserialize(cls,value):returncls(*value)def__eq__(self,other):returnself.code,self.city==other.code,other.city@dataclassclassFlight(DataClassDictMixin):origin:Airportdestination:AirportJFK=Airport("JFK","New York City")LAX=Airport("LAX","Los Angeles")input_data= {"origin": ["JFK","New York City"],"destination": ["LAX","Los Angeles"]}my_flight=Flight.from_dict(input_data)assertmy_flight==Flight(JFK,LAX)assertmy_flight.to_dict()==input_data
You can see howAirport
instances are seamlessly created from lists of twostrings and serialized into them.
By default_deserialize
method will get raw input data without anytransformations before. This should be enough in many cases, especially whenyou need to perform non-standard transformations yourself, but let's extendour example:
classItinerary(SerializableType):def__init__(self,flights):self.flights=flightsdef_serialize(self):returnself.flights@classmethoddef_deserialize(cls,flights):returncls(flights)@dataclassclassTravelPlan(DataClassDictMixin):budget:floatitinerary:Itineraryinput_data= {"budget":10_000,"itinerary": [ {"origin": ["JFK","New York City"],"destination": ["LAX","Los Angeles"] }, {"origin": ["LAX","Los Angeles"],"destination": ["SFO","San Fransisco"] } ]}
If we pass the flight list as is intoItinerary._deserialize
, our itinerarywill have something that we may not expect —list[dict]
instead oflist[Flight]
. The solution is quite simple. Instead of callingFlight._deserialize
yourself, just use annotations:
classItinerary(SerializableType,use_annotations=True):def__init__(self,flights):self.flights=flightsdef_serialize(self)->list[Flight]:returnself.flights@classmethoddef_deserialize(cls,flights:list[Flight]):returncls(flights)my_plan=TravelPlan.from_dict(input_data)assertisinstance(my_plan.itinerary.flights[0],Flight)assertisinstance(my_plan.itinerary.flights[1],Flight)assertmy_plan.to_dict()==input_data
Here we add annotations to the only argument of_deserialize
method andto the return value of_serialize
method as well. The latter is needed forcorrect serialization.
Important
The importance of explicit passinguse_annotations=True
when defining aclass is that otherwise implicit using annotations might break compatibilitywith old code that wasn't aware of this feature. It will be enabled bydefault in the future major release.
The great thing to note about using annotations inSerializableType
is thatthey work seamlessly withgenericandvariadic generic types.Let's see how this can be useful:
fromdatetimeimportdatefromtypingimportTypeVarfromdataclassesimportdataclassfrommashumaroimportDataClassDictMixinfrommashumaro.typesimportSerializableTypeKT=TypeVar("KT")VT=TypeVar("VT")classDictWrapper(dict[KT,VT],SerializableType,use_annotations=True):def_serialize(self)->dict[KT,VT]:returndict(self)@classmethoddef_deserialize(cls,value:dict[KT,VT])->'DictWrapper[KT, VT]':returncls(value)@dataclassclassDataClass(DataClassDictMixin):x:DictWrapper[date,str]y:DictWrapper[str,date]input_data= {"x": {"2022-12-07":"2022-12-07"},"y": {"2022-12-07":"2022-12-07"}}obj=DataClass.from_dict(input_data)assertobj==DataClass(x=DictWrapper({date(2022,12,7):"2022-12-07"}),y=DictWrapper({"2022-12-07":date(2022,12,7)}))assertobj.to_dict()==input_data
You can see that formatted date is deserialized todate
object before passingtoDictWrapper._deserialize
in a key or value according to the genericparameters.
If you have generic dataclass types, you can useSerializableType
for them as well, but it's not necessary sincethey'resupported out of the box.
If you want to add support for a custom third-party type that is not under your control,you can write serialization and deserialization logic insideSerializationStrategy
class,which will be reusable and so well suited in case that third-party type is widely used.SerializationStrategy
is also good if you want to create strategies that are slightly different from each other,because you can add the strategy differentiator in the__init__
method.
To demonstrate howSerializationStrategy
works let's write a simple strategy for datetime serializationin different formats. In this example we will use the same strategy class for two dataclass fields,but a string representing the date and time will be different.
fromdataclassesimportdataclass,fieldfromdatetimeimportdatetimefrommashumaroimportDataClassDictMixin,field_optionsfrommashumaro.typesimportSerializationStrategyclassFormattedDateTime(SerializationStrategy):def__init__(self,fmt):self.fmt=fmtdefserialize(self,value:datetime)->str:returnvalue.strftime(self.fmt)defdeserialize(self,value:str)->datetime:returndatetime.strptime(value,self.fmt)@dataclassclassDateTimeFormats(DataClassDictMixin):short:datetime=field(metadata=field_options(serialization_strategy=FormattedDateTime("%d%m%Y%H%M%S") ) )verbose:datetime=field(metadata=field_options(serialization_strategy=FormattedDateTime("%A %B %d, %Y, %H:%M:%S") ) )formats=DateTimeFormats(short=datetime(2019,1,1,12),verbose=datetime(2019,1,1,12),)dictionary=formats.to_dict()# {'short': '01012019120000', 'verbose': 'Tuesday January 01, 2019, 12:00:00'}assertDateTimeFormats.from_dict(dictionary)==formats
Similarly toSerializableType
,SerializationStrategy
could also take advantage of annotations:
fromdataclassesimportdataclassfromdatetimeimportdatetimefrommashumaroimportDataClassDictMixinfrommashumaro.typesimportSerializationStrategyclassTsSerializationStrategy(SerializationStrategy,use_annotations=True):defserialize(self,value:datetime)->float:returnvalue.timestamp()defdeserialize(self,value:float)->datetime:# value will be converted to float before being passed to this methodreturndatetime.fromtimestamp(value)@dataclassclassExample(DataClassDictMixin):dt:datetimeclassConfig:serialization_strategy= {datetime:TsSerializationStrategy(), }example=Example.from_dict({"dt":"1672531200"})print(example)# Example(dt=datetime.datetime(2023, 1, 1, 3, 0))print(example.to_dict())# {'dt': 1672531200.0}
Here the passed string value"1672531200"
will be converted tofloat
before being passed todeserialize
methodthanks to thefloat
annotation.
Important
As well as forSerializableType
, the value ofuse_annotatons
will beTrue
by default in the future major release.
To create a generic version of a serialization strategy you need to follow these steps:
- inherit
Generic[...]
typewith the number of parameters matching the number of parametersof the target generic type - Write generic annotations for
serialize
method's return type and fordeserialize
method's argument type - Use the origin type of the target generic type in the
serialization_strategy
config section(typing.get_origin
might be helpful)
There is no need to adduse_annotations=True
here because it's enabled implicitlyfor generic serialization strategies.
For example, there is a third-partymultidict package that has a genericMultiDict
type.A generic serialization strategy for it might look like this:
fromdataclassesimportdataclassfromdatetimeimportdatefrompprintimportpprintfromtypingimportGeneric,List,Tuple,TypeVarfrommashumaroimportDataClassDictMixinfrommashumaro.typesimportSerializationStrategyfrommultidictimportMultiDictT=TypeVar("T")classMultiDictSerializationStrategy(SerializationStrategy,Generic[T]):defserialize(self,value:MultiDict[T])->List[Tuple[str,T]]:return [(k,v)fork,vinvalue.items()]defdeserialize(self,value:List[Tuple[str,T]])->MultiDict[T]:returnMultiDict(value)@dataclassclassExample(DataClassDictMixin):floats:MultiDict[float]date_lists:MultiDict[List[date]]classConfig:serialization_strategy= {MultiDict:MultiDictSerializationStrategy() }example=Example(floats=MultiDict([("x",1.1), ("x",2.2)]),date_lists=MultiDict( [("x", [date(2023,1,1),date(2023,1,2)]), ("x", [date(2023,2,1),date(2023,2,2)])] ),)pprint(example.to_dict())# {'date_lists': [['x', ['2023-01-01', '2023-01-02']],# ['x', ['2023-02-01', '2023-02-02']]],# 'floats': [['x', 1.1], ['x', 2.2]]}assertExample.from_dict(example.to_dict())==example
In some cases creating a new class just for one little thing could beexcessive. Moreover, you may need to deal with third party classes that you arenot allowed to change. You can usedataclasses.field
function toconfigure some serialization aspects through itsmetadata
parameter. Nextsection describes all supported options to use inmetadata
mapping.
If you don't want to remember the names of the options you can usefield_options
helper function:
fromdataclassesimportdataclass,fieldfrommashumaroimportfield_options@dataclassclassA:x:int=field(metadata=field_options(...))
This option allows you to change the serialization method. When usingthis option, the serialization behaviour depends on what type of value theoption has. It could be eitherCallable[[Any], Any]
orstr
.
A value of typeCallable[[Any], Any]
is a generic way to specify any callableobject like a function, a class method, a class instance method, an instanceof a callable class or even a lambda function to be called for serialization.
A value of typestr
sets a specific engine for serialization. Keep in mindthat all possible engines depend on the data type that this option is usedwith. At this moment there are next serialization engines to choose from:
Applicable data types | Supported engines | Description |
---|---|---|
NamedTuple ,namedtuple | as_list ,as_dict | How to pack named tuples. By defaultas_list engine is used that means your named tuple class instance will be packed into a list of its values. You can pack it into a dictionary usingas_dict engine. |
Any | omit | Skip the field during serialization |
Tip
You can pass a field value as is without changes on serialization usingpass_through
.
Example:
fromdatetimeimportdatetimefromdataclassesimportdataclass,fieldfromtypingimportNamedTuplefrommashumaroimportDataClassDictMixinclassMyNamedTuple(NamedTuple):x:inty:float@dataclassclassA(DataClassDictMixin):dt:datetime=field(metadata={"serialize":lambdav:v.strftime('%Y-%m-%d %H:%M:%S') } )t:MyNamedTuple=field(metadata={"serialize":"as_dict"})
This option allows you to change the deserialization method. When usingthis option, the deserialization behaviour depends on what type of value theoption has. It could be eitherCallable[[Any], Any]
orstr
.
A value of typeCallable[[Any], Any]
is a generic way to specify any callableobject like a function, a class method, a class instance method, an instanceof a callable class or even a lambda function to be called for deserialization.
A value of typestr
sets a specific engine for deserialization. Keep in mindthat all possible engines depend on the data type that this option is usedwith. At this moment there are next deserialization engines to choose from:
Applicable data types | Supported engines | Description |
---|---|---|
datetime ,date ,time | ciso8601 ,pendulum | How to parse datetime string. By default nativefromisoformat of corresponding class will be used fordatetime ,date andtime fields. It's the fastest way in most cases, but you can choose an alternative. |
NamedTuple ,namedtuple | as_list ,as_dict | How to unpack named tuples. By defaultas_list engine is used that means your named tuple class instance will be created from a list of its values. You can unpack it from a dictionary usingas_dict engine. |
Tip
You can pass a field value as is without changes on deserialization usingpass_through
.
Example:
fromdatetimeimportdatetimefromdataclassesimportdataclass,fieldfromtypingimportList,NamedTuplefrommashumaroimportDataClassDictMixinimportciso8601importdateutilclassMyNamedTuple(NamedTuple):x:inty:float@dataclassclassA(DataClassDictMixin):x:datetime=field(metadata={"deserialize":"pendulum"} )classB(DataClassDictMixin):x:datetime=field(metadata={"deserialize":ciso8601.parse_datetime_as_naive} )@dataclassclassC(DataClassDictMixin):dt:List[datetime]=field(metadata={"deserialize":lambdal:list(map(dateutil.parser.isoparse,l)) } )@dataclassclassD(DataClassDictMixin):x:MyNamedTuple=field(metadata={"deserialize":"as_dict"})
This option is useful when you want to change the serialization logicfor a dataclass field depending on some defined parameters using a reusableserialization scheme. You can find an example in theSerializationStrategy
chapter.
Tip
You can pass a field value as is without changes onserialization / deserialization usingpass_through
.
This option can be used to assignfield aliases:
fromdataclassesimportdataclass,fieldfrommashumaroimportDataClassDictMixin,field_options@dataclassclassDataClass(DataClassDictMixin):a:int=field(metadata=field_options(alias="FieldA"))b:int=field(metadata=field_options(alias="#invalid"))x=DataClass.from_dict({"FieldA":1,"#invalid":2})# DataClass(a=1, b=2)
If inheritance is not an empty word for you, you'll fall in love with theConfig
class. You can registerserialize
anddeserialize
methods, definecode generation options and other things just in one place. Or in someclasses in different ways if you need flexibility. Inheritance is always on thefirst place.
There is a base classBaseConfig
that you can inherit for the sake ofconvenience, but it's not mandatory.
In the following example you can see howthedebug
flag is changed from class to class:ModelA
will have debug mode enabled butModelB
will not.
frommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfigclassBaseModel(DataClassDictMixin):classConfig(BaseConfig):debug=TrueclassModelA(BaseModel):a:intclassModelB(BaseModel):b:intclassConfig(BaseConfig):debug=False
Next section describes all supported options to use in the config.
If you enable thedebug
option the generated code for your data classwill be printed.
Some users may need functionality that wouldn't exist without extra cost suchas valuable cpu time to execute additional instructions. Since not everyoneneeds such instructions, they can be enabled by a constant in the list,so the fastest basic behavior of the library will always remain by default.The following table provides a brief overview of all the available constantsdescribed below.
Constant | Description |
---|---|
TO_DICT_ADD_OMIT_NONE_FLAG | Addsomit_none keyword-only argument toto_* methods. |
TO_DICT_ADD_BY_ALIAS_FLAG | Addsby_alias keyword-only argument toto_* methods. |
ADD_DIALECT_SUPPORT | Addsdialect keyword-only argument tofrom_* andto_* methods. |
ADD_SERIALIZATION_CONTEXT | Addscontext keyword-only argument toto_* methods. |
You can register customSerializationStrategy
,serialize
anddeserialize
methods for specific types just in one place. It could be configured usinga dictionary with types as keys. The value could be either aSerializationStrategy
instance or a dictionary withserialize
anddeserialize
values with the same meaning as in thefield options.
fromdataclassesimportdataclassfromdatetimeimportdatetime,datefrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfigfrommashumaro.typesimportSerializationStrategyclassFormattedDateTime(SerializationStrategy):def__init__(self,fmt):self.fmt=fmtdefserialize(self,value:datetime)->str:returnvalue.strftime(self.fmt)defdeserialize(self,value:str)->datetime:returndatetime.strptime(value,self.fmt)@dataclassclassDataClass(DataClassDictMixin):x:datetimey:dateclassConfig(BaseConfig):serialization_strategy= {datetime:FormattedDateTime("%Y"),date: {# you can use specific str values for datetime here as well"deserialize":"pendulum","serialize":date.isoformat, }, }instance=DataClass.from_dict({"x":"2021","y":"2021"})# DataClass(x=datetime.datetime(2021, 1, 1, 0, 0), y=Date(2021, 1, 1))dictionary=instance.to_dict()# {'x': '2021', 'y': '2021-01-01'}
Note that you can register different methods for multiple logical types whichare based on the same type usingNewType
andAnnotated
,seeExtending existing types for details.
It's also possible to define a generic (de)serialization method for a generictype by registering a method for itsorigin type.Although this technique is widely used when working withthird-party generictypes using generic strategies, it can also beapplied in simple scenarios:
fromdataclassesimportdataclassfrommashumaroimportDataClassDictMixin@dataclassclassC(DataClassDictMixin):ints:list[int]floats:list[float]classConfig:serialization_strategy= {list: {# origin type for list[int] and list[float] is list"serialize":lambdax:list(map(str,x)), } }assertC([1], [2.2]).to_dict()== {'ints': ['1'],'floats': ['2.2']}
Sometimes it's better to write thefield aliases in one place. You can mixaliases here withaliases in the field options, but the last ones will alwaystake precedence.
fromdataclassesimportdataclassfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfig@dataclassclassDataClass(DataClassDictMixin):a:intb:intclassConfig(BaseConfig):aliases= {"a":"FieldA","b":"FieldB", }DataClass.from_dict({"FieldA":1,"FieldB":2})# DataClass(a=1, b=2)
All the fields withaliases will be serialized by them bydefault when this option is enabled. You can mix this config option withby_alias
keyword argument.
fromdataclassesimportdataclass,fieldfrommashumaroimportDataClassDictMixin,field_optionsfrommashumaro.configimportBaseConfig@dataclassclassDataClass(DataClassDictMixin):field_a:int=field(metadata=field_options(alias="FieldA"))classConfig(BaseConfig):serialize_by_alias=TrueDataClass(field_a=1).to_dict()# {'FieldA': 1}
When using aliases, the deserializer defaults to requiring the keys to matchwhat is defined as the alias.If the flexibility to deserialize aliased and unaliased keys is required thenthe config optionallow_deserialization_not_by_alias
can be set toenable the feature.
fromdataclassesimportdataclass,fieldfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfig@dataclassclassAliasedDataClass(DataClassDictMixin):foo:int=field(metadata={"alias":"alias_foo"})bar:int=field(metadata={"alias":"alias_bar"})classConfig(BaseConfig):allow_deserialization_not_by_alias=Truealias_dict= {"alias_foo":1,"alias_bar":2}t1=AliasedDataClass.from_dict(alias_dict)no_alias_dict= {"foo":1,"bar":2}# This would raise `mashumaro.exceptions.MissingField`# if allow_deserialization_not_by_alias was Falset2=AliasedDataClass.from_dict(no_alias_dict)assertt1==t2
All the fields withNone
values will be skipped during serialization bydefault when this option is enabled. You can mix this config option withomit_none
keyword argument.
fromdataclassesimportdataclassfromtypingimportOptionalfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfig@dataclassclassDataClass(DataClassDictMixin):x:Optional[int]=42classConfig(BaseConfig):omit_none=TrueDataClass(x=None).to_dict()# {}
When this option enabled, all the fields that have values equal to the defaultsor the default_factory results will be skipped during serialization.
fromdataclassesimportdataclass,fieldfromtypingimportList,Optional,TuplefrommashumaroimportDataClassDictMixin,field_optionsfrommashumaro.configimportBaseConfig@dataclassclassFoo:foo:str@dataclassclassDataClass(DataClassDictMixin):a:int=42b:Tuple[int, ...]=field(default=(1,2,3))c:List[Foo]=field(default_factory=lambda: [Foo("foo")])d:Optional[str]=NoneclassConfig(BaseConfig):omit_default=TrueDataClass(a=42,b=(1,2,3),c=[Foo("foo")]).to_dict()# {}
Dataclasses are a great way to declare and use data models. But it's not the only way.Python has a typed version ofnamedtuplecalledNamedTuplewhich looks similar to dataclasses:
fromtypingimportNamedTupleclassPoint(NamedTuple):x:inty:int
the same with a dataclass will look like this:
fromdataclassesimportdataclass@dataclassclassPoint:x:inty:int
At first glance, you can use both options. But imagine that you need to createa bunch of instances of thePoint
class. Due to how dataclasses work you willhave more memory consumption compared to named tuples. In such a case it couldbe more appropriate to use named tuples.
By default, all named tuples are packed into lists. But withnamedtuple_as_dict
option you have a drop-in replacement for dataclasses:
fromdataclassesimportdataclassfromtypingimportList,NamedTuplefrommashumaroimportDataClassDictMixinclassPoint(NamedTuple):x:inty:int@dataclassclassDataClass(DataClassDictMixin):points:List[Point]classConfig:namedtuple_as_dict=Trueobj=DataClass.from_dict({"points": [{"x":0,"y":0}, {"x":1,"y":1}]})print(obj.to_dict())# {"points": [{"x": 0, "y": 0}, {"x": 1, "y": 1}]}
If you want to serialize only certain named tuple fields as dictionaries, youcan use the correspondingserialization anddeserialization engines.
PEP 563 solved the problem of forward references by postponing the evaluationof annotations, so you can write the following code:
from __future__importannotationsfromdataclassesimportdataclassfrommashumaroimportDataClassDictMixin@dataclassclassA(DataClassDictMixin):x:B@dataclassclassB(DataClassDictMixin):y:intobj=A.from_dict({'x': {'y':1}})
You don't need to write anything special here, forward references work out ofthe box. If a field of a dataclass has a forward reference in the typeannotations, building offrom_*
andto_*
methods of this dataclasswill be postponed until they are called once. However, if for some reason youdon't want the evaluation to be possibly postponed, you can disable it usingallow_postponed_evaluation
option:
from __future__importannotationsfromdataclassesimportdataclassfrommashumaroimportDataClassDictMixin@dataclassclassA(DataClassDictMixin):x:BclassConfig:allow_postponed_evaluation=False# UnresolvedTypeReferenceError: Class A has unresolved type reference B# in some of its fields@dataclassclassB(DataClassDictMixin):y:int
In this case you will getUnresolvedTypeReferenceError
regardless of whetherclass B is declared below or not.
This option is describedbelow in theDialects section.
This option changes default options fororjson.dumps
encoder which isused inDataClassORJSONMixin
. For example, you cantell orjson to handle non-str
dict
keys as the built-injson.dumps
encoder does. Seeorjson documentationto read more about these options.
importorjsonfromdataclassesimportdataclassfromtypingimportDictfrommashumaro.configimportBaseConfigfrommashumaro.mixins.orjsonimportDataClassORJSONMixin@dataclassclassMyClass(DataClassORJSONMixin):x:Dict[int,int]classConfig(BaseConfig):orjson_options=orjson.OPT_NON_STR_KEYSassertMyClass({1:2}).to_json()== {"1":2}
This option is described in theClass level discriminator section.
By using this option, the compilation of thefrom_*
andto_*
methods willbe deferred until they are called first time. This will reduce the import timeand, in certain instances, may enhance the speed of deserializationby leveraging the data that is accessible after the class has been created.
Caution
If you need to save a reference tofrom_*
orto_*
method, you shoulddo it after the method is compiled. To be safe, you can always use lambdafunction:
from_dict=lambdax:MyModel.from_dict(x)to_dict=lambdax:x.to_dict()
When set, the keys on serialized dataclasses will be sorted in alphabetical order.
Unlike thesort_keys
option in the standard library'sjson.dumps
function, this option acts at class creation time and has no effect on the performance of serialization.
fromdataclassesimportdataclassfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfig@dataclassclassSortedDataClass(DataClassDictMixin):foo:intbar:intclassConfig(BaseConfig):sort_keys=Truet=SortedDataClass(1,2)assertt.to_dict()== {"bar":2,"foo":1}
When set, the deserialization of dataclasses will fail if the input dictionary contains keys that are not present in the dataclass.
fromdataclassesimportdataclassfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfig@dataclassclassDataClass(DataClassDictMixin):a:intclassConfig(BaseConfig):forbid_extra_keys=TrueDataClass.from_dict({"a":1,"b":2})# ExtraKeysError: Extra keys: {'b'}
It plays well withaliases
andallow_deserialization_not_by_alias
options.
In some cases it's needed to pass a field value as is without any changesduring serialization / deserialization. There is a predefinedpass_through
object that can be used asserialization_strategy
orserialize
/deserialize
options:
fromdataclassesimportdataclass,fieldfrommashumaroimportDataClassDictMixin,pass_throughclassMyClass:def__init__(self,some_value):self.some_value=some_value@dataclassclassA1(DataClassDictMixin):x:MyClass=field(metadata={"serialize":pass_through,"deserialize":pass_through, } )@dataclassclassA2(DataClassDictMixin):x:MyClass=field(metadata={"serialization_strategy":pass_through, } )@dataclassclassA3(DataClassDictMixin):x:MyClassclassConfig:serialization_strategy= {MyClass:pass_through, }@dataclassclassA4(DataClassDictMixin):x:MyClassclassConfig:serialization_strategy= {MyClass: {"serialize":pass_through,"deserialize":pass_through, } }my_class_instance=MyClass(42)assertA1.from_dict({'x':my_class_instance}).x==my_class_instanceassertA2.from_dict({'x':my_class_instance}).x==my_class_instanceassertA3.from_dict({'x':my_class_instance}).x==my_class_instanceassertA4.from_dict({'x':my_class_instance}).x==my_class_instancea1_dict=A1(my_class_instance).to_dict()a2_dict=A2(my_class_instance).to_dict()a3_dict=A3(my_class_instance).to_dict()a4_dict=A4(my_class_instance).to_dict()asserta1_dict==a2_dict==a3_dict==a4_dict== {"x":my_class_instance}
There are situations where you might want some values of the same type to betreated as their own type. You can create new logical types withNewType
,Annotated
orTypeAliasType
and register serialization strategies for them:
fromtypingimportMapping,NewType,AnnotatedfromdataclassesimportdataclassfrommashumaroimportDataClassDictMixinSessionID=NewType("SessionID",str)AccountID=Annotated[str,"AccountID"]typeDeviceID=str@dataclassclassContext(DataClassDictMixin):account_sessions:Mapping[AccountID,SessionID]account_devices:list[DeviceID]classConfig:serialization_strategy= {AccountID: {"deserialize":lambdax: ...,"serialize":lambdax: ..., },SessionID: {"deserialize":lambdax: ...,"serialize":lambdax: ..., },DeviceID: {"deserialize":lambdax: ...,"serialize":lambdax: ..., } }
Although usingNewType
is usually the most reliable way to avoid logicalerrors, you have to pay for it with notable overhead. If you are creatingdataclass instances manually, then you know that type checkers willenforce you to enclose a value in your"NewType"
callable, which leadsto performance degradation:
python-mtimeit-s"from typing import NewType; MyInt = NewType('MyInt', int)""MyInt(42)"10000000loops,bestof5:31.1nsecperlooppython-mtimeit-s"from typing import NewType; MyInt = NewType('MyInt', int)""42"50000000loops,bestof5:4.35nsecperloop
However, when you create dataclass instances using thefrom_*
method providedby one of the mixins or using one of the decoders, there will be no performancedegradation, because the value won't be enclosed in the callable in thegenerated code. Therefore, if performance is more important to you thancatching logical errors by type checkers, and you are actively creating orchanging dataclasses manually, then you should take a closer look at usingAnnotated
.
In some cases it's better to have different names for a field in your dataclassand in its serialized view. For example, a third-party legacy API you areworking with might operate with camel case style, but you stick to snake casestyle in your code base. Or you want to load data with keys that areinvalid identifiers in Python. Aliases can solve this problem.
There are multiple ways to assign an alias:
- Using
Alias(...)
annotation in a field type - Using
alias
parameter in field metadata - Using
aliases
parameter in a dataclass config
By default, aliases only affect deserialization, but it can be extended toserialization as well. If you want to serialize all the fields by aliases youhave two options to do so:
serialize_by_alias
config optionserialize_by_alias
dialect optionby_alias
keyword argument into_*
methods
Here is an example withAlias
annotation in a field type:
fromdataclassesimportdataclassfromtypingimportAnnotatedfrommashumaroimportDataClassDictMixinfrommashumaro.typesimportAlias@dataclassclassDataClass(DataClassDictMixin):foo_bar:Annotated[int,Alias("fooBar")]obj=DataClass.from_dict({"fooBar":42})# DataClass(foo_bar=42)obj.to_dict()# {"foo_bar": 42} # no aliases on serialization by default
The same with field metadata:
fromdataclassesimportdataclass,fieldfrommashumaroimportfield_options@dataclassclassDataClass:foo_bar:str=field(metadata=field_options(alias="fooBar"))
And with a dataclass config:
fromdataclassesimportdataclassfrommashumaro.configimportBaseConfig@dataclassclassDataClass:foo_bar:strclassConfig(BaseConfig):aliases= {"foo_bar":"fooBar"}
Tip
If you want to deserialize all the fields by its names along with aliases,there isa config optionfor that.
Sometimes it's needed to have different serialization and deserializationmethods depending on the data source where entities of the dataclass arestored or on the API to which the entities are being sent or received from.There is a specialDialect
type that may contain all the differences from thedefault serialization and deserialization methods. You can create differentdialects and use each of them for the same dataclass depending onthe situation.
Suppose we have the following dataclass with a field of typedate
:
@dataclassclassEntity(DataClassDictMixin):dt:date
By default, a field ofdate
type serializes to a string in ISO 8601 format,so the serialized entity will look like{'dt': '2021-12-31'}
. But what if wehave, for example, two sensitive legacy Ethiopian and Japanese APIs that usetwo different formats for dates —dd/mm/yyyy
andyyyy年mm月dd日
? Instead ofcreating two similar dataclasses we can have one dataclass and two dialects:
fromdataclassesimportdataclassfromdatetimeimportdate,datetimefrommashumaroimportDataClassDictMixinfrommashumaro.configimportADD_DIALECT_SUPPORTfrommashumaro.dialectimportDialectfrommashumaro.typesimportSerializationStrategyclassDateTimeSerializationStrategy(SerializationStrategy):def__init__(self,fmt:str):self.fmt=fmtdefserialize(self,value:date)->str:returnvalue.strftime(self.fmt)defdeserialize(self,value:str)->date:returndatetime.strptime(value,self.fmt).date()classEthiopianDialect(Dialect):serialization_strategy= {date:DateTimeSerializationStrategy("%d/%m/%Y") }classJapaneseDialect(Dialect):serialization_strategy= {date:DateTimeSerializationStrategy("%Y年%m月%d日") }@dataclassclassEntity(DataClassDictMixin):dt:dateclassConfig:code_generation_options= [ADD_DIALECT_SUPPORT]entity=Entity(date(2021,12,31))entity.to_dict(dialect=EthiopianDialect)# {'dt': '31/12/2021'}entity.to_dict(dialect=JapaneseDialect)# {'dt': '2021年12月31日'}Entity.from_dict({'dt':'2021年12月31日'},dialect=JapaneseDialect)
This dialect option has the same meaning as thesimilar config optionbut for the dialect scope. You can register customSerializationStrategy
,serialize
anddeserialize
methods for the specific types.
This dialect option has the same meaning as thesimilar config optionbut for the dialect scope.
This dialect option has the same meaning as thesimilar config option but for the dialect scope.
This dialect option has the same meaning as thesimilar config option but for the dialect scope.
This dialect option has the same meaning as thesimilar config optionbut for the dialect scope.
By default, all collection data types are serialized as a copy to preventmutation of the original collection. As an example, if a dataclass containsa field of typelist[str]
, then it will be serialized as a copy of theoriginal list, so you can safely mutate it after. The downside is that copyingis always slower than using a reference to the original collection. In somecases we know beforehand that mutation doesn't take place or is even desirable,so we can benefit from avoiding unnecessary copies by settingno_copy_collections
to a sequence of origin collection data types.This is applicable only for collections containing elements that do notrequire conversion.
fromdataclassesimportdataclassfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfigfrommashumaro.dialectimportDialectclassNoCopyDialect(Dialect):no_copy_collections= (list,dict,set)@dataclassclassDataClass(DataClassDictMixin):simple_list:list[str]simple_dict:dict[str,str]simple_set:set[str]classConfig(BaseConfig):dialect=NoCopyDialectobj=DataClass(["foo"], {"bar":"baz"}, {"foobar"})data=obj.to_dict()assertdata["simple_list"]isobj.simple_listassertdata["simple_dict"]isobj.simple_dictassertdata["simple_set"]isobj.simple_set
This option is enabled forlist
anddict
in the default dialects thatbelong to mixins and codecs for the following formats:
You can change the default serialization and deserialization methods not onlyin theserialization_strategy
configoption but also using thedialect
config option. If you have multipledataclasses without a common parent class the default dialect can help youto reduce the number of code lines written:
@dataclassclassEntity(DataClassDictMixin):dt:dateclassConfig:dialect=JapaneseDialectentity=Entity(date(2021,12,31))entity.to_dict()# {'dt': '2021年12月31日'}assertEntity.from_dict({'dt':'2021年12月31日'})==entity
Default dialect can also be set when using codecs:
frommashumaro.codecsimportBasicDecoder,BasicEncoder@dataclassclassEntity:dt:datedecoder=BasicDecoder(Entity,default_dialect=JapaneseDialect)encoder=BasicEncoder(Entity,default_dialect=JapaneseDialect)entity=Entity(date(2021,12,31))encoder.encode(entity)# {'dt': '2021年12月31日'}assertdecoder.decode({'dt':'2021年12月31日'})==entity
There is a specialDiscriminator
class that allows you to customize howa union of dataclasses or their hierarchy will be deserialized.It has the following parameters that affects class selection rules:
field
— optional name of the input dictionary key (also known as tag)by which all the variants can be distinguishedinclude_subtypes
— allow to deserialize subclassesinclude_supertypes
— allow to deserialize superclassesvariant_tagger_fn
— a custom function used to generate tag valuesassociated with a variant
By default, each variant that you want to discriminate by tags should have aclass-level attribute containing an associated tag value. This attribute shouldhave a name defined byfield
parameter. The tag value coule be in thefollowing forms:
- without annotations:
type = 42
- annotated as ClassVar:
type: ClassVar[int] = 42
- annotated as Final:
type: Final[int] = 42
- annotated as Literal:
type: Literal[42] = 42
- annotated as StrEnum:
type: ResponseType = ResponseType.OK
Note
Keep in mind that by default only Final, Literal and StrEnum fields areprocessed during serialization.
However, it is possible to use discriminator without the class-levelattribute. You can provide a custom function that generates one or many varianttag values. This function should take a class as the only argument and returneither a single value of the basic type likestr
orint
or a list of themto associate multiple tags with a variant. The common practice is to usea class name as a single tag value:
variant_tagger_fn=lambdacls:cls.__name__
Next, we will look at different use cases, as well as their pros and cons.
Often you have a base dataclass and multiple subclasses that are easilydistinguishable from each other by the value of a particular field.For example, there may be different events, messages or requests witha discriminator field "event_type", "message_type" or just "type". You could'velisted all of them withinUnion
type, but it would be too verbose andimpractical. Moreover, deserialization of the union would be slow, since weneed to iterate over each variant in the list until we find the right one.
We can improve subclass deserialization usingDiscriminator
as annotationwithinAnnotated
type. We will usefield
parameter and setinclude_subtypes
toTrue
.
Important
The discriminator field should be accessible from the__dict__
attributeof a specific descendant, i.e. defined at the level of that descendant.A descendant class without a discriminator field will be ignored, butits descendants won't.
Suppose we have a hierarchy of client events distinguishable by a classattribute "type":
fromdataclassesimportdataclassfromipaddressimportIPv4AddressfrommashumaroimportDataClassDictMixin@dataclassclassClientEvent(DataClassDictMixin):pass@dataclassclassClientConnectedEvent(ClientEvent):type="connected"client_ip:IPv4Address@dataclassclassClientDisconnectedEvent(ClientEvent):type="disconnected"client_ip:IPv4Address
We use base dataclassClientEvent
for a field of another dataclass:
fromtypingimportAnnotated,List# or from typing_extensions import Annotatedfrommashumaro.typesimportDiscriminator@dataclassclassAggregatedEvents(DataClassDictMixin):list:List[Annotated[ClientEvent,Discriminator(field="type",include_subtypes=True) ] ]
Now we can deserialize events based on "type" value:
events=AggregatedEvents.from_dict( {"list": [ {"type":"connected","client_ip":"10.0.0.42"}, {"type":"disconnected","client_ip":"10.0.0.42"}, ] })assertevents==AggregatedEvents(list=[ClientConnectedEvent(client_ip=IPv4Address("10.0.0.42")),ClientDisconnectedEvent(client_ip=IPv4Address("10.0.0.42")), ])
In rare cases you have to deal with subclasses that don't have a common fieldname which they can be distinguished by. SinceDiscriminator
can beinitialized without "field" parameter you can use it with onlyinclude_subclasses
enabled. The drawback is that we will have to go through allthe subclasses until we find the suitable one. It's almost like usingUnion
type but with subclasses support.
Suppose we're making a brunch. We have some ingredients:
@dataclassclassIngredient(DataClassDictMixin):name:str@dataclassclassHummus(Ingredient):made_of:Literal["chickpeas","beet","artichoke"]grams:int@dataclassclassCelery(Ingredient):pieces:int
Let's create a plate:
@dataclassclassPlate(DataClassDictMixin):ingredients:List[Annotated[Ingredient,Discriminator(include_subtypes=True)] ]
And now we can put our ingredients on the plate:
plate=Plate.from_dict( {"ingredients": [ {"name":"hummus from the shop","made_of":"chickpeas","grams":150, }, {"name":"celery from my garden","pieces":5}, ] })assertplate==Plate(ingredients=[Hummus(name="hummus from the shop",made_of="chickpeas",grams=150),Celery(name="celery from my garden",pieces=5), ])
In some cases it's necessary to fall back to the base class if there is nosuitable subclass. We can setinclude_supertypes
toTrue
:
@dataclassclassPlate(DataClassDictMixin):ingredients:List[Annotated[Ingredient,Discriminator(include_subtypes=True,include_supertypes=True), ] ]plate=Plate.from_dict( {"ingredients": [ {"name":"hummus from the shop","made_of":"chickpeas","grams":150, }, {"name":"celery from my garden","pieces":5}, {"name":"cumin"}# <- new unknown ingredient ] })assertplate==Plate(ingredients=[Hummus(name="hummus from the shop",made_of="chickpeas",grams=150),Celery(name="celery from my garden",pieces=5),Ingredient(name="cumin"),# <- unknown ingredient added ])
It may often be more convenient to specify aDiscriminator
once at the classlevel and use that class withoutAnnotated
type for subclass deserialization.Depending on theDiscriminator
parameters, it can be used as a replacement forsubclasses distinguishable by a fieldas well as forsubclasses without a common field.The only difference is that you can't useinclude_supertypes=True
becauseit would lead to a recursion error.
Reworked example will look like this:
fromdataclassesimportdataclassfromipaddressimportIPv4AddressfromtypingimportListfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfigfrommashumaro.typesimportDiscriminator@dataclassclassClientEvent(DataClassDictMixin):classConfig(BaseConfig):discriminator=Discriminator(# <- add discriminatorfield="type",include_subtypes=True, )@dataclassclassClientConnectedEvent(ClientEvent):type="connected"client_ip:IPv4Address@dataclassclassClientDisconnectedEvent(ClientEvent):type="disconnected"client_ip:IPv4Address@dataclassclassAggregatedEvents(DataClassDictMixin):list:List[ClientEvent]# <- use base class here
And now we can deserialize events based on "type" value as we did earlier:
events=AggregatedEvents.from_dict( {"list": [ {"type":"connected","client_ip":"10.0.0.42"}, {"type":"disconnected","client_ip":"10.0.0.42"}, ] })assertevents==AggregatedEvents(list=[ClientConnectedEvent(client_ip=IPv4Address("10.0.0.42")),ClientDisconnectedEvent(client_ip=IPv4Address("10.0.0.42")), ])
What's more interesting is that you can now deserialize subclasses simply bycalling the superclassfrom_*
method, which is very useful:
disconnected_event=ClientEvent.from_dict( {"type":"disconnected","client_ip":"10.0.0.42"})assertdisconnected_event==ClientDisconnectedEvent(IPv4Address("10.0.0.42"))
The same is applicable for subclasses without a common field:
@dataclassclassIngredient(DataClassDictMixin):name:strclassConfig:discriminator=Discriminator(include_subtypes=True)...celery=Ingredient.from_dict({"name":"celery from my garden","pieces":5})assertcelery==Celery(name="celery from my garden",pieces=5)
Deserialization of union of types distinguishable by a particular field willbe much faster usingDiscriminator
because there will be no traversalof all classes and an attempt to deserialize each of them.Usually this approach can be used when you have multiple classes without acommon superclass or when you only need to deserialize some of the subclasses.In the following example we will useinclude_supertypes=True
todeserialize two subclasses out of three:
fromdataclassesimportdataclassfromtypingimportAnnotated,Literal,Union# or from typing_extensions import AnnotatedfrommashumaroimportDataClassDictMixinfrommashumaro.typesimportDiscriminator@dataclassclassEvent(DataClassDictMixin):pass@dataclassclassEvent1(Event):code:Literal[1]=1 ...@dataclassclassEvent2(Event):code:Literal[2]=2 ...@dataclassclassEvent3(Event):code:Literal[3]=3 ...@dataclassclassMessage(DataClassDictMixin):event:Annotated[Union[Event1,Event2],Discriminator(field="code",include_supertypes=True), ]event1_msg=Message.from_dict({"event": {"code":1, ...}})event2_msg=Message.from_dict({"event": {"code":2, ...}})assertisinstance(event1_msg.event,Event1)assertisinstance(event2_msg.event,Event2)# raises InvalidFieldValue:Message.from_dict({"event": {"code":3, ...}})
Again, it's not necessary to have a common superclass. If you have a union ofdataclasses without a field that they can be distinguishable by, you can stilluseDiscriminator
, but deserialization will almost be the same as forUnion
type withoutDiscriminator
except that it could be possible to deserializesubclasses withinclude_subtypes=True
.
Important
When bothinclude_subtypes
andinclude_supertypes
are enabled,all subclasses will be attempted to be deserialized first,superclasses — at the end.
In the following example you can see how priority works — first we tryto deserializeChickpeaHummus
, and if it fails, then we tryHummus
:
@dataclassclassHummus(DataClassDictMixin):made_of:Literal["chickpeas","artichoke"]grams:int@dataclassclassChickpeaHummus(Hummus):made_of:Literal["chickpeas"]@dataclassclassCelery(DataClassDictMixin):pieces:int@dataclassclassPlate(DataClassDictMixin):ingredients:List[Annotated[Union[Hummus,Celery],Discriminator(include_subtypes=True,include_supertypes=True), ] ]plate=Plate.from_dict( {"ingredients": [ {"made_of":"chickpeas","grams":100}, {"made_of":"artichoke","grams":50}, {"pieces":4}, ] })assertplate==Plate(ingredients=[ChickpeaHummus(made_of='chickpeas',grams=100),# <- subclassHummus(made_of='artichoke',grams=50),# <- superclassCelery(pieces=4), ])
Sometimes it is impractical to have a class-level attribute with a tag value,especially when you have a lot of classes. We can have a custom taggerfunction instead. This method is applicable for all scenarios of usingthe discriminator, but for demonstration purposes, let's focus only on oneof them.
Suppose we want to use the middle part ofClient*Event
as a tag value:
fromdataclassesimportdataclassfromipaddressimportIPv4AddressfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfigfrommashumaro.typesimportDiscriminatordefclient_event_tagger(cls):# not the best way of doing it, it's just a demoreturncls.__name__[6:-5].lower()@dataclassclassClientEvent(DataClassDictMixin):classConfig(BaseConfig):discriminator=Discriminator(field="type",include_subtypes=True,variant_tagger_fn=client_event_tagger, )@dataclassclassClientConnectedEvent(ClientEvent):client_ip:IPv4Address@dataclassclassClientDisconnectedEvent(ClientEvent):client_ip:IPv4Address
We can now deserialize subclasses as we did it earlierwithout variant tagger:
disconnected_event=ClientEvent.from_dict( {"type":"disconnected","client_ip":"10.0.0.42"})assertdisconnected_event==ClientDisconnectedEvent(IPv4Address("10.0.0.42"))
If we need to associate multiple tags with a single variant, we can returna list of tags:
defclient_event_tagger(cls):name=cls.__name__[6:-5]return [name.lower(),name.upper()]
If you want to have control over whether to skipNone
values on serializationyou can addomit_none
parameter toto_*
methods using thecode_generation_options
list. The default value ofomit_none
parameter depends on whether theomit_none
config option oromit_none
dialect option is enabled.
fromdataclassesimportdataclassfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfig,TO_DICT_ADD_OMIT_NONE_FLAG@dataclassclassInner(DataClassDictMixin):x:int=None# "x" won't be omitted since there is no TO_DICT_ADD_OMIT_NONE_FLAG here@dataclassclassModel(DataClassDictMixin):x:Innera:int=Noneb:str=None# will be omittedclassConfig(BaseConfig):code_generation_options= [TO_DICT_ADD_OMIT_NONE_FLAG]Model(x=Inner(),a=1).to_dict(omit_none=True)# {'x': {'x': None}, 'a': 1}
If you want to have control over whether to serialize fields by theiraliases you can addby_alias
parameter toto_*
methodsusing thecode_generation_options
list. The default value ofby_alias
parameter depends on whether theserialize_by_alias
config option is enabled.
fromdataclassesimportdataclass,fieldfrommashumaroimportDataClassDictMixin,field_optionsfrommashumaro.configimportBaseConfig,TO_DICT_ADD_BY_ALIAS_FLAG@dataclassclassDataClass(DataClassDictMixin):field_a:int=field(metadata=field_options(alias="FieldA"))classConfig(BaseConfig):code_generation_options= [TO_DICT_ADD_BY_ALIAS_FLAG]DataClass(field_a=1).to_dict()# {'field_a': 1}DataClass(field_a=1).to_dict(by_alias=True)# {'FieldA': 1}
Support fordialects is disabled by default for performance reasons. You can enableit using aADD_DIALECT_SUPPORT
constant:
fromdataclassesimportdataclassfromdatetimeimportdatefrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfig,ADD_DIALECT_SUPPORT@dataclassclassEntity(DataClassDictMixin):dt:dateclassConfig(BaseConfig):code_generation_options= [ADD_DIALECT_SUPPORT]
Sometimes it's needed to pass a "context" object to the serialization hooksthat will take it into account. For example, you could want to have an optionto remove sensitive data from the serialization result if you need to.You can addcontext
parameter toto_*
methods that will be passed to__pre_serialize__
and__post_serialize__
hooks. The type of this contextas well as its mutability is up to you.
fromdataclassesimportdataclassfromtypingimportDict,OptionalfromuuidimportUUIDfrommashumaroimportDataClassDictMixinfrommashumaro.configimportBaseConfig,ADD_SERIALIZATION_CONTEXTclassBaseModel(DataClassDictMixin):classConfig(BaseConfig):code_generation_options= [ADD_SERIALIZATION_CONTEXT]@dataclassclassAccount(BaseModel):id:UUIDusername:strname:strdef__pre_serialize__(self,context:Optional[Dict]=None):returnselfdef__post_serialize__(self,d:Dict,context:Optional[Dict]=None):ifcontextandcontext.get("remove_sensitive_data"):d["username"]="***"d["name"]="***"returnd@dataclassclassSession(BaseModel):id:UUIDkey:straccount:Accountdef__pre_serialize__(self,context:Optional[Dict]=None):returnselfdef__post_serialize__(self,d:Dict,context:Optional[Dict]=None):ifcontextandcontext.get("remove_sensitive_data"):d["key"]="***"returndfoo=Session(id=UUID('03321c9f-6a97-421e-9869-918ff2867a71'),key="VQ6Q9bX4c8s",account=Account(id=UUID('4ef2baa7-edef-4d6a-b496-71e6d72c58fb'),username="john_doe",name="John" ))assertfoo.to_dict()== {'id':'03321c9f-6a97-421e-9869-918ff2867a71','key':'VQ6Q9bX4c8s','account': {'id':'4ef2baa7-edef-4d6a-b496-71e6d72c58fb','username':'john_doe','name':'John' }}assertfoo.to_dict(context={"remove_sensitive_data":True})== {'id':'03321c9f-6a97-421e-9869-918ff2867a71','key':'***','account': {'id':'4ef2baa7-edef-4d6a-b496-71e6d72c58fb','username':'***','name':'***' }}
Along withuser-defined generic typesimplementingSerializableType
interface, generic and variadicgeneric dataclasses can also be used. There are two applicable scenariosfor them.
If you have a generic dataclass and want to serialize and deserialize itsinstances depending on the concrete types, you can use inheritance for that:
fromdataclassesimportdataclassfromdatetimeimportdatefromtypingimportGeneric,Mapping,TypeVar,TypeVarTuplefrommashumaroimportDataClassDictMixinKT=TypeVar("KT")VT=TypeVar("VT",date,str)Ts=TypeVarTuple("Ts")@dataclassclassGenericDataClass(Generic[KT,VT,*Ts]):x:Mapping[KT,VT]y:Tuple[*Ts,KT]@dataclassclassConcreteDataClass(GenericDataClass[str,date,*Tuple[float, ...]],DataClassDictMixin,):passConcreteDataClass.from_dict({"x": {"a":"2021-01-01"},"y": [1,2,"a"]})# ConcreteDataClass(x={'a': datetime.date(2021, 1, 1)}, y=(1.0, 2.0, 'a'))
You can overrideTypeVar
field with a concrete type or anotherTypeVar
.Partial specification of concrete types is also allowed. If a generic dataclassis inherited without type overriding the types of its fields remain untouched.
Another approach is to specify concrete types in the field type hints. This canhelp to have different versions of the same generic dataclass:
fromdataclassesimportdataclassfromdatetimeimportdatefromtypingimportGeneric,TypeVarfrommashumaroimportDataClassDictMixinT=TypeVar('T')@dataclassclassGenericDataClass(Generic[T],DataClassDictMixin):x:T@dataclassclassDataClass(DataClassDictMixin):date:GenericDataClass[date]str:GenericDataClass[str]instance=DataClass(date=GenericDataClass(x=date(2021,1,1)),str=GenericDataClass(x='2021-01-01'),)dictionary= {'date': {'x':'2021-01-01'},'str': {'x':'2021-01-01'}}assertDataClass.from_dict(dictionary)==instance
There is a generic alternative toSerializableType
calledGenericSerializableType
. It makes it possible to decide yourself howto serialize and deserialize input data depending on the types provided:
fromdataclassesimportdataclassfromdatetimeimportdatefromtypingimportDict,TypeVarfrommashumaroimportDataClassDictMixinfrommashumaro.typesimportGenericSerializableTypeKT=TypeVar("KT")VT=TypeVar("VT")classDictWrapper(Dict[KT,VT],GenericSerializableType):__packers__= {date:lambdax:x.isoformat(),str:str}__unpackers__= {date:date.fromisoformat,str:str}def_serialize(self,types)->Dict[KT,VT]:k_type,v_type=typesk_conv=self.__packers__[k_type]v_conv=self.__packers__[v_type]return {k_conv(k):v_conv(v)fork,vinself.items()}@classmethoddef_deserialize(cls,value,types)->"DictWrapper[KT, VT]":k_type,v_type=typesk_conv=cls.__unpackers__[k_type]v_conv=cls.__unpackers__[v_type]returncls({k_conv(k):v_conv(v)fork,vinvalue.items()})@dataclassclassDataClass(DataClassDictMixin):x:DictWrapper[date,str]y:DictWrapper[str,date]input_data= {"x": {"2022-12-07":"2022-12-07"},"y": {"2022-12-07":"2022-12-07"},}obj=DataClass.from_dict(input_data)assertobj==DataClass(x=DictWrapper({date(2022,12,7):"2022-12-07"}),y=DictWrapper({"2022-12-07":date(2022,12,7)}),)assertobj.to_dict()==input_data
As you can see, the code turns out to be massive compared to thealternative but in rare cases such flexibilitycan be useful. You should think twice about whether it's really worth using it.
In some cases you need to prepare input / output data or do some extraordinaryactions at different stages of the deserialization / serialization lifecycle.You can do this with different types of hooks.
For doing something with a dictionary that will be passed to deserializationyou can use__pre_deserialize__
class method:
@dataclassclassA(DataClassJSONMixin):abc:int@classmethoddef__pre_deserialize__(cls,d:Dict[Any,Any])->Dict[Any,Any]:return {k.lower():vfork,vind.items()}print(DataClass.from_dict({"ABC":123}))# DataClass(abc=123)print(DataClass.from_json('{"ABC": 123}'))# DataClass(abc=123)
For doing something with a dataclass instance that was created as a resultof deserialization you can use__post_deserialize__
class method:
@dataclassclassA(DataClassJSONMixin):abc:int@classmethoddef__post_deserialize__(cls,obj:'A')->'A':obj.abc=456returnobjprint(DataClass.from_dict({"abc":123}))# DataClass(abc=456)print(DataClass.from_json('{"abc": 123}'))# DataClass(abc=456)
For doing something before serialization you can use__pre_serialize__
method:
@dataclassclassA(DataClassJSONMixin):abc:intcounter:ClassVar[int]=0def__pre_serialize__(self)->'A':self.counter+=1returnselfobj=DataClass(abc=123)obj.to_dict()obj.to_json()print(obj.counter)# 2
Note that you can add an additionalcontext
argument using thecorresponding code generation option.
For doing something with a dictionary that was created as a result ofserialization you can use__post_serialize__
method:
@dataclassclassA(DataClassJSONMixin):user:strpassword:strdef__post_serialize__(self,d:Dict[Any,Any])->Dict[Any,Any]:d.pop('password')returndobj=DataClass(user="name",password="secret")print(obj.to_dict())# {"user": "name"}print(obj.to_json())# '{"user": "name"}'
Note that you can add an additionalcontext
argument using thecorresponding code generation option.
You can build JSON Schema not only for dataclasses but also for any othersupported datatypes. There is support for the following standards:
For simple one-time cases it's recommended to start from using a configurablebuild_json_schema
function. It returnsJSONSchema
object that can beserialized to json or to dict:
fromdataclassesimportdataclass,fieldfromtypingimportListfromuuidimportUUIDfrommashumaro.jsonschemaimportbuild_json_schema@dataclassclassUser:id:UUIDname:str=field(metadata={"description":"User name"})print(build_json_schema(List[User]).to_json())
Click to show the result
{"type":"array","items": {"type":"object","title":"User","properties": {"id": {"type":"string","format":"uuid" },"name": {"type":"string","description":"User name" } },"additionalProperties":false,"required": ["id","name" ] }}
Additional validation keywords (see below)can be added using annotations:
fromtypingimportAnnotated,Listfrommashumaro.jsonschemaimportbuild_json_schemafrommashumaro.jsonschema.annotationsimportMaximum,MaxItemsprint(build_json_schema(Annotated[List[Annotated[int,Maximum(42)]],MaxItems(4) ] ).to_json())
Click to show the result
{"type":"array","items": {"type":"integer","maximum":42 },"maxItems":4}
The$schema
keyword can be added by settingwith_dialect_uri
to True:
print(build_json_schema(str,with_dialect_uri=True).to_json())
Click to show the result
{"$schema":"https://json-schema.org/draft/2020-12/schema","type":"string"}
By default, Draft 2022-12 dialect is being used, but you can change it toanother one by settingdialect
parameter:
frommashumaro.jsonschemaimportOPEN_API_3_1print(build_json_schema(str,dialect=OPEN_API_3_1,with_dialect_uri=True ).to_json())
Click to show the result
{"$schema":"https://spec.openapis.org/oas/3.1/dialect/base","type":"string"}
All dataclass JSON Schemas can or can not be placed in thedefinitionssection, depending on theall_refs
parameter, which default value comesfrom a dialect used (False
for Draft 2022-12,True
for OpenAPISpecification 3.1.1):
print(build_json_schema(List[User],all_refs=True).to_json())
Click to show the result
{"type":"array","$defs": {"User": {"type":"object","title":"User","properties": {"id": {"type":"string","format":"uuid" },"name": {"type":"string" } },"additionalProperties":false,"required": ["id","name" ] } },"items": {"$ref":"#/$defs/User" }}
The definitions section can be omitted from the final document by settingwith_definitions
parameter toFalse
:
print(build_json_schema(List[User],dialect=OPEN_API_3_1,with_definitions=False ).to_json())
Click to show the result
{"type":"array","items": {"$ref":"#/components/schemas/User" }}
Reference prefix can be changed by usingref_prefix
parameter:
print(build_json_schema(List[User],all_refs=True,with_definitions=False,ref_prefix="#/components/responses", ).to_json())
Click to show the result
{"type":"array","items": {"$ref":"#/components/responses/User" }}
The omitted definitions could be found later in theContext
object thatyou could have created and passed to the function, but it could be easierto useJSONSchemaBuilder
for that. For example, you might found it handyto build OpenAPI Specification step by step passing your models to the builderand get all the registered definitions later. This builder has reasonabledefaults but can be customized if necessary.
frommashumaro.jsonschemaimportJSONSchemaBuilder,OPEN_API_3_1builder=JSONSchemaBuilder(OPEN_API_3_1)@dataclassclassUser:id:UUIDname:str@dataclassclassDevice:id:UUIDmodel:strprint(builder.build(List[User]).to_json())print(builder.build(List[Device]).to_json())print(builder.get_definitions().to_json())
Click to show the result
{"type":"array","items": {"$ref":"#/components/schemas/User" }}
{"type":"array","items": {"$ref":"#/components/schemas/Device" }}
{"User": {"type":"object","title":"User","properties": {"id": {"type":"string","format":"uuid" },"name": {"type":"string" } },"additionalProperties":false,"required": ["id","name" ] },"Device": {"type":"object","title":"Device","properties": {"id": {"type":"string","format":"uuid" },"model": {"type":"string" } },"additionalProperties":false,"required": ["id","model" ] }}
Apart from required keywords, that are added automatically for certain datatypes, you're free to use additional validation keywords.They're presented by the corresponding classes inmashumaro.jsonschema.annotations
:
Number constraints:
String constraints:
Array constraints:
Object constraints:
If the built-in functionality doesn't meet your needs, you can customize theJSON Schema generation or add support for additional types using plugins.Themashumaro.jsonschema.plugins.BasePlugin
class provides aget_schema
method that you can override to implement custombehavior.
The plugin system works by iterating through all registered plugins and callingtheirget_schema
methods. If a plugin'sget_schema
method raises aNotImplementedError
or returnsNone
, it indicates that the plugin doesn'tprovide the required functionality for that particular case.
You can apply multiple plugins sequentially, allowing each to modify the schemain turn. This approach enables a step-by-step transformation of the schema,with each plugin contributing its specific modifications.
Plugins can be registered using theplugins
argument in either thebuild_json_schema
function or theJSONSchemaBuilder
class.
Themashumaro.jsonschema.plugins
module contains several built-in plugins. Currently, one of these plugins addsdescriptions to JSON schemas using docstrings from dataclasses:
fromdataclassesimportdataclassfrommashumaro.jsonschemaimportbuild_json_schemafrommashumaro.jsonschema.pluginsimportDocstringDescriptionPlugin@dataclassclassMyClass:"""My class"""x:intschema=build_json_schema(MyClass,plugins=[DocstringDescriptionPlugin()])print(schema.to_json())
Click to show the result
{"type":"object","title":"MyClass","description":"My class","properties": {"x": {"type":"integer" } },"additionalProperties":false,"required": ["x" ]}
Creating your own custom plugin is straightforward. For instance, if you wantto add support for Pydantic models, you could write a plugin similar to thefollowing:
fromdataclassesimportdataclassfrompydanticimportBaseModelfrommashumaro.jsonschemaimportbuild_json_schemafrommashumaro.jsonschema.modelsimportContext,JSONSchemafrommashumaro.jsonschema.pluginsimportBasePluginfrommashumaro.jsonschema.schemaimportInstanceclassPydanticSchemaPlugin(BasePlugin):defget_schema(self,instance:Instance,ctx:Context,schema:JSONSchema|None=None, )->JSONSchema|None:try:ifissubclass(instance.type,BaseModel):pydantic_schema=instance.type.model_json_schema()returnJSONSchema.from_dict(pydantic_schema)exceptTypeError:returnNoneclassMyPydanticClass(BaseModel):x:int@dataclassclassMyDataClass:y:MyPydanticClassschema=build_json_schema(MyDataClass,plugins=[PydanticSchemaPlugin()])print(schema.to_json())
Click to show the result
{"type":"object","title":"MyDataClass","properties": {"y": {"type":"object","title":"MyPydanticClass","properties": {"x": {"type":"integer","title":"X" } },"required": ["x" ] } },"additionalProperties":false,"required": ["y" ]}
Using aConfig
class it is possible to override some parts of the schema.Currently, you can do the following:
- override some field schemas using the "properties" key
- change
additionalProperties
using the "additionalProperties" key
fromdataclassesimportdataclassfrommashumaro.jsonschemaimportbuild_json_schema@dataclassclassFooBar:foo:strbar:intclassConfig:json_schema= {"properties": {"foo": {"type":"string","description":"bar" } },"additionalProperties":True, }print(build_json_schema(FooBar).to_json())
Click to show the result
{"type":"object","title":"FooBar","properties": {"foo": {"type":"string","description":"bar" },"bar": {"type":"integer" } },"additionalProperties":true,"required": ["foo","bar" ]}
You can also change the "additionalProperties" key to a specific schemaby passing it aJSONSchema
instance instead of a bool value.
Mashumaro provides different ways to override default serialization methods fordataclass fields or specific data types. In order for these overrides to bereflected in the schema, you need to make sure that the methods haveannotations of the return value type.
fromdataclassesimportdataclass,fieldfrommashumaro.configimportBaseConfigfrommashumaro.jsonschemaimportbuild_json_schemadefstr_as_list(s:str)->list[str]:returnlist(s)defint_as_str(i:int)->str:returnstr(i)@dataclassclassFooBar:foo:str=field(metadata={"serialize":str_as_list})bar:intclassConfig(BaseConfig):serialization_strategy= {int: {"serialize":int_as_str } }print(build_json_schema(FooBar).to_json())
Click to show the result
{"type":"object","title":"FooBar","properties": {"foo": {"type":"array","items": {"type":"string" } },"bar": {"type":"string" } },"additionalProperties":false,"required": ["foo","bar" ]}
About
Fast and well tested serialization library