ijl/orjsonPublic

NotificationsYou must be signed in to change notification settings
Fork272
Star7.6k

Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

License

Apache-2.0, MIT licenses found

Licenses found

7.6k stars 272 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 739 Commits
.github		.github
bench		bench
ci		ci
data		data
doc		doc
include		include
integration		integration
pysrc/orjson		pysrc/orjson
script		script
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
build.rs		build.rs
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Repository files navigation

orjson

orjson is a fast, correct JSON library for Python. Itbenchmarks as the fastest Pythonlibrary for JSON and is more correct than the standard json library or otherthird-party libraries. It serializesdataclass,datetime,numpy, andUUID instances natively.

orjson.dumps() issomething like 10x as fast asjson, serializescommon types and subtypes, has adefault parameter for the caller to specifyhow to serialize arbitrary types, and has a number of flags controlling output.

orjson.loads()is something like 2x as fast asjson, and is strictly compliant with UTF-8 andRFC 8259 ("The JavaScript Object Notation (JSON) Data Interchange Format").

Reading from and writing to files, line-delimited JSON files, and so on isnot provided by the library.

orjson supports CPython 3.9, 3.10, 3.11, 3.12, 3.13, and 3.14.

It distributes amd64/x86_64/x64, i686/x86, aarch64/arm64/armv8, arm7,ppc64le/POWER8, and s390x wheels for Linux, amd64 and aarch64 wheelsfor macOS, and amd64, i686, and aarch64 wheels for Windows.

Wheels published to PyPI for amd64 run on x86-64-v1 (2003)or later, but will at runtime use AVX-512 if available for asignificant performance benefit; aarch64 wheels run on ARMv8-A (2011) orlater.

orjson does not and will not support PyPy, embedded Python builds forAndroid/iOS, or PEP 554 subinterpreters.

orjson may support PEP 703 free-threading when it is stable.

Releases follow semantic versioning and serializing a new object typewithout an opt-in flag is considered a breaking change.

orjson is licensed under both the Apache 2.0 and MIT licenses. Therepository and issue tracker isgithub.com/ijl/orjson, and patches may besubmitted there. There is aCHANGELOGavailable in the repository.

Usage

Install

To install a wheel from PyPI, install theorjson package.

Inrequirements.in orrequirements.txt format, specify:

orjson >= 3.10,<4

Inpyproject.toml format, specify:

orjson ="^3.10"

To build a wheel, seepackaging.

Quickstart

This is an example of serializing, with options specified, and deserializing:

>>>importorjson,datetime,numpy>>>data= {"type":"job","created_at":datetime.datetime(1970,1,1),"status":"🆗","payload":numpy.array([[1,2], [3,4]]),}>>>orjson.dumps(data,option=orjson.OPT_NAIVE_UTC|orjson.OPT_SERIALIZE_NUMPY)b'{"type":"job","created_at":"1970-01-01T00:00:00+00:00","status":"\xf0\x9f\x86\x97","payload":[[1,2],[3,4]]}'>>>orjson.loads(_){'type':'job','created_at':'1970-01-01T00:00:00+00:00','status':'🆗','payload': [[1,2], [3,4]]}

Migrating

orjson version 3 serializes more types than version 2. Subclasses ofstr,int,dict, andlist are now serialized. This is faster and more similarto the standard library. It can be disabled withorjson.OPT_PASSTHROUGH_SUBCLASS.dataclasses.dataclass instancesare now serialized by default and cannot be customized in adefault function unlessoption=orjson.OPT_PASSTHROUGH_DATACLASS isspecified.uuid.UUID instances are serialized by default.For any type that is now serialized,implementations in adefault function and options enabling them can beremoved but do not need to be. There was no change in deserialization.

To migrate from the standard library, the largest difference is thatorjson.dumps returnsbytes andjson.dumps returns astr.

Users withdict objects using non-str keys should specifyoption=orjson.OPT_NON_STR_KEYS.

sort_keys is replaced byoption=orjson.OPT_SORT_KEYS.

indent is replaced byoption=orjson.OPT_INDENT_2 and other levels of indentation are notsupported.

ensure_ascii is probably not relevant today and UTF-8 characters cannot beescaped to ASCII.

Serialize

defdumps(__obj:Any,default:Optional[Callable[[Any],Any]]= ...,option:Optional[int]= ...,)->bytes: ...

dumps() serializes Python objects to JSON.

It natively serializesstr,dict,list,tuple,int,float,bool,None,dataclasses.dataclass,typing.TypedDict,datetime.datetime,datetime.date,datetime.time,uuid.UUID,numpy.ndarray, andorjson.Fragment instances. It supports arbitrary types throughdefault. Itserializes subclasses ofstr,int,dict,list,dataclasses.dataclass, andenum.Enum. It does not serialize subclassesoftuple to avoid serializingnamedtuple objects as arrays. To avoidserializing subclasses, specify the optionorjson.OPT_PASSTHROUGH_SUBCLASS.

The output is abytes object containing UTF-8.

The global interpreter lock (GIL) is held for the duration of the call.

It raisesJSONEncodeError on an unsupported type. This exception messagedescribes the invalid object with the error messageType is not JSON serializable: .... To fix this, specifydefault.

It raisesJSONEncodeError on astr that contains invalid UTF-8.

It raisesJSONEncodeError on an integer that exceeds 64 bits by default or,withOPT_STRICT_INTEGER, 53 bits.

It raisesJSONEncodeError if adict has a key of a type other thanstr,unlessOPT_NON_STR_KEYS is specified.

It raisesJSONEncodeError if the output ofdefault recurses to handling bydefault more than 254 levels deep.

It raisesJSONEncodeError on circular references.

It raisesJSONEncodeError if atzinfo on a datetime object isunsupported.

JSONEncodeError is a subclass ofTypeError. This is for compatibilitywith the standard library.

If the failure was caused by an exception indefault thenJSONEncodeError chains the original exception as__cause__.

default

To serialize a subclass or arbitrary types, specifydefault as acallable that returns a supported type.default may be a function,lambda, or callable class instance. To specify that a type was nothandled bydefault, raise an exception such asTypeError.

>>>importorjson,decimal>>>defdefault(obj):ifisinstance(obj,decimal.Decimal):returnstr(obj)raiseTypeError>>>orjson.dumps(decimal.Decimal("0.0842389659712649442845"))JSONEncodeError:TypeisnotJSONserializable:decimal.Decimal>>>orjson.dumps(decimal.Decimal("0.0842389659712649442845"),default=default)b'"0.0842389659712649442845"'>>>orjson.dumps({1,2},default=default)orjson.JSONEncodeError:TypeisnotJSONserializable:set

Thedefault callable may return an object that itselfmust be handled bydefault up to 254 times before an exceptionis raised.

It is important thatdefault raise an exception if a type cannot be handled.Python otherwise implicitly returnsNone, which appears to the callerlike a legitimate value and is serialized:

>>>importorjson,json>>>defdefault(obj):ifisinstance(obj,decimal.Decimal):returnstr(obj)>>>orjson.dumps({"set":{1,2}},default=default)b'{"set":null}'>>>json.dumps({"set":{1,2}},default=default)'{"set":null}'

option

To modify how data is serialized, specifyoption. Eachoption is an integerconstant inorjson. To specify multiple options, mask them together, e.g.,option=orjson.OPT_STRICT_INTEGER | orjson.OPT_NAIVE_UTC.

OPT_APPEND_NEWLINE

Append\n to the output. This is a convenience and optimization for thepattern ofdumps(...) + "\n".bytes objects are immutable and thispattern copies the original contents.

>>>importorjson>>>orjson.dumps([])b"[]">>>orjson.dumps([],option=orjson.OPT_APPEND_NEWLINE)b"[]\n"

OPT_INDENT_2

Pretty-print output with an indent of two spaces. This is equivalent toindent=2 in the standard library. Pretty printing is slower and the outputlarger. orjson is the fastest compared library at pretty printing and hasmuch less of a slowdown to pretty print than the standard library does. Thisoption is compatible with all other options.

>>>importorjson>>>orjson.dumps({"a":"b","c": {"d":True},"e": [1,2]})b'{"a":"b","c":{"d":true},"e":[1,2]}'>>>orjson.dumps(    {"a":"b","c": {"d":True},"e": [1,2]},option=orjson.OPT_INDENT_2)b'{\n  "a": "b",\n  "c": {\n    "d": true\n  },\n  "e": [\n    1,\n    2\n  ]\n}'

If displayed, the indentation and linebreaks appear like this:

{"a":"b","c": {"d":true  },"e": [1,2  ]}

This measures serializing the github.json fixture as compact (52KiB) orpretty (64KiB):

Library	compact (ms)	pretty (ms)	vs. orjson
orjson	0.01	0.02	1
json	0.13	0.54	34

This measures serializing the citm_catalog.json fixture, more of a worstcase due to the amount of nesting and newlines, as compact (489KiB) orpretty (1.1MiB):

Library	compact (ms)	pretty (ms)	vs. orjson
orjson	0.25	0.45	1
json	3.01	24.42	54.4

This can be reproduced using thepyindent script.

OPT_NAIVE_UTC

Serializedatetime.datetime objects without atzinfo as UTC. Thishas no effect ondatetime.datetime objects that havetzinfo set.

>>>importorjson,datetime>>>orjson.dumps(datetime.datetime(1970,1,1,0,0,0),    )b'"1970-01-01T00:00:00"'>>>orjson.dumps(datetime.datetime(1970,1,1,0,0,0),option=orjson.OPT_NAIVE_UTC,    )b'"1970-01-01T00:00:00+00:00"'

OPT_NON_STR_KEYS

Serializedict keys of type other thanstr. This allowsdict keysto be one ofstr,int,float,bool,None,datetime.datetime,datetime.date,datetime.time,enum.Enum, anduuid.UUID. For comparison,the standard library serializesstr,int,float,bool orNone bydefault. orjson benchmarks as being faster at serializing non-str keysthan other libraries. This option is slower forstr keys than the default.

>>>importorjson,datetime,uuid>>>orjson.dumps(        {uuid.UUID("7202d115-7ff3-4c81-a7c1-2a1f067b1ece"): [1,2,3]},option=orjson.OPT_NON_STR_KEYS,    )b'{"7202d115-7ff3-4c81-a7c1-2a1f067b1ece":[1,2,3]}'>>>orjson.dumps(        {datetime.datetime(1970,1,1,0,0,0): [1,2,3]},option=orjson.OPT_NON_STR_KEYS|orjson.OPT_NAIVE_UTC,    )b'{"1970-01-01T00:00:00+00:00":[1,2,3]}'

These types are generally serialized how they would be asvalues, e.g.,datetime.datetime is still an RFC 3339 string and respectsoptions affecting it. The exception is thatint serialization does notrespectOPT_STRICT_INTEGER.

This option has the risk of creating duplicate keys. This is because non-strobjects may serialize to the samestr as an existing key, e.g.,{"1": true, 1: false}. The last key to be inserted to thedict will beserialized last and a JSON deserializer will presumably take the lastoccurrence of a key (in the above,false). The first value will be lost.

This option is compatible withorjson.OPT_SORT_KEYS. If sorting is used,note the sort is unstable and will be unpredictable for duplicate keys.

>>>importorjson,datetime>>>orjson.dumps(    {"other":1,datetime.date(1970,1,5):2,datetime.date(1970,1,3):3},option=orjson.OPT_NON_STR_KEYS|orjson.OPT_SORT_KEYS)b'{"1970-01-03":3,"1970-01-05":2,"other":1}'

This measures serializing 589KiB of JSON comprising alist of 100dictin which eachdict has both 365 randomly-sortedint keys representing epochtimestamps as well as onestr key and the value for each key is asingle integer. In "str keys", the keys were converted tostr beforeserialization, and orjson still specifesoption=orjson.OPT_NON_STR_KEYS(which is always somewhat slower).

Library	str keys (ms)	int keys (ms)	int keys sorted (ms)
orjson	0.5	0.93	2.08
json	2.72	3.59

json is blank because itraisesTypeError on attempting to sort before converting all keys tostr.This can be reproduced using thepynonstr script.

OPT_OMIT_MICROSECONDS

Do not serialize themicrosecond field ondatetime.datetime anddatetime.time instances.

>>>importorjson,datetime>>>orjson.dumps(datetime.datetime(1970,1,1,0,0,0,1),    )b'"1970-01-01T00:00:00.000001"'>>>orjson.dumps(datetime.datetime(1970,1,1,0,0,0,1),option=orjson.OPT_OMIT_MICROSECONDS,    )b'"1970-01-01T00:00:00"'

OPT_PASSTHROUGH_DATACLASS

Passthroughdataclasses.dataclass instances todefault. This allowscustomizing their output but is much slower.

>>>importorjson,dataclasses>>>@dataclasses.dataclassclassUser:id:strname:strpassword:strdefdefault(obj):ifisinstance(obj,User):return {"id":obj.id,"name":obj.name}raiseTypeError>>>orjson.dumps(User("3b1","asd","zxc"))b'{"id":"3b1","name":"asd","password":"zxc"}'>>>orjson.dumps(User("3b1","asd","zxc"),option=orjson.OPT_PASSTHROUGH_DATACLASS)TypeError:TypeisnotJSONserializable:User>>>orjson.dumps(User("3b1","asd","zxc"),option=orjson.OPT_PASSTHROUGH_DATACLASS,default=default,    )b'{"id":"3b1","name":"asd"}'

OPT_PASSTHROUGH_DATETIME

Passthroughdatetime.datetime,datetime.date, anddatetime.time instancestodefault. This allows serializing datetimes to a custom format, e.g.,HTTP dates:

>>>importorjson,datetime>>>defdefault(obj):ifisinstance(obj,datetime.datetime):returnobj.strftime("%a, %d %b %Y %H:%M:%S GMT")raiseTypeError>>>orjson.dumps({"created_at":datetime.datetime(1970,1,1)})b'{"created_at":"1970-01-01T00:00:00"}'>>>orjson.dumps({"created_at":datetime.datetime(1970,1,1)},option=orjson.OPT_PASSTHROUGH_DATETIME)TypeError:TypeisnotJSONserializable:datetime.datetime>>>orjson.dumps(        {"created_at":datetime.datetime(1970,1,1)},option=orjson.OPT_PASSTHROUGH_DATETIME,default=default,    )b'{"created_at":"Thu, 01 Jan 1970 00:00:00 GMT"}'

This does not affect datetimes indict keys if using OPT_NON_STR_KEYS.

OPT_PASSTHROUGH_SUBCLASS

Passthrough subclasses of builtin types todefault.

>>>importorjson>>>classSecret(str):passdefdefault(obj):ifisinstance(obj,Secret):return"******"raiseTypeError>>>orjson.dumps(Secret("zxc"))b'"zxc"'>>>orjson.dumps(Secret("zxc"),option=orjson.OPT_PASSTHROUGH_SUBCLASS)TypeError:TypeisnotJSONserializable:Secret>>>orjson.dumps(Secret("zxc"),option=orjson.OPT_PASSTHROUGH_SUBCLASS,default=default)b'"******"'

This does not affect serializing subclasses asdict keys if usingOPT_NON_STR_KEYS.

OPT_SERIALIZE_DATACLASS

This is deprecated and has no effect in version 3. In version 2 this wasrequired to serializedataclasses.dataclass instances. For more, seedataclass.

OPT_SERIALIZE_NUMPY

Serializenumpy.ndarray instances. For more, seenumpy.

OPT_SERIALIZE_UUID

This is deprecated and has no effect in version 3. In version 2 this wasrequired to serializeuuid.UUID instances. For more, seeUUID.

OPT_SORT_KEYS

Serializedict keys in sorted order. The default is to serialize in anunspecified order. This is equivalent tosort_keys=True in the standardlibrary.

This can be used to ensure the order is deterministic for hashing or tests.It has a substantial performance penalty and is not recommended in general.

>>>importorjson>>>orjson.dumps({"b":1,"c":2,"a":3})b'{"b":1,"c":2,"a":3}'>>>orjson.dumps({"b":1,"c":2,"a":3},option=orjson.OPT_SORT_KEYS)b'{"a":3,"b":1,"c":2}'

This measures serializing the twitter.json fixture unsorted and sorted:

Library	unsorted (ms)	sorted (ms)	vs. orjson
orjson	0.11	0.3	1
json	1.36	1.93	6.4

The benchmark can be reproduced using thepysort script.

The sorting is not collation/locale-aware:

>>>importorjson>>>orjson.dumps({"a":1,"ä":2,"A":3},option=orjson.OPT_SORT_KEYS)b'{"A":3,"a":1,"\xc3\xa4":2}'

This is the same sorting behavior as the standard library.

dataclass also serialize as maps but this has no effect on them.

OPT_STRICT_INTEGER

Enforce 53-bit limit on integers. The limit is otherwise 64 bits, the same asthe Python standard library. For more, seeint.

OPT_UTC_Z

Serialize a UTC timezone ondatetime.datetime instances asZ insteadof+00:00.

>>>importorjson,datetime,zoneinfo>>>orjson.dumps(datetime.datetime(1970,1,1,0,0,0,tzinfo=zoneinfo.ZoneInfo("UTC")),    )b'"1970-01-01T00:00:00+00:00"'>>>orjson.dumps(datetime.datetime(1970,1,1,0,0,0,tzinfo=zoneinfo.ZoneInfo("UTC")),option=orjson.OPT_UTC_Z    )b'"1970-01-01T00:00:00Z"'

Fragment

orjson.Fragment includes already-serialized JSON in a document. This is anefficient way to include JSON blobs from a cache, JSONB field, or separatelyserialized object without first deserializing to Python objects vialoads().

>>>importorjson>>>orjson.dumps({"key":"zxc","data":orjson.Fragment(b'{"a": "b", "c": 1}')})b'{"key":"zxc","data":{"a": "b", "c": 1}}'

It does no reformatting:orjson.OPT_INDENT_2 will not affect acompact blob nor will a pretty-printed JSON blob be rewritten as compact.

The input must bebytes orstr and given as a positional argument.

This raisesorjson.JSONEncodeError if astr is given and the input isnot valid UTF-8. It otherwise does no validation and it is possible towrite invalid JSON. This does not escape characters. The implementation istested to not crash if given invalid strings or invalid JSON.

Deserialize

defloads(__obj:Union[bytes,bytearray,memoryview,str])->Any: ...

loads() deserializes JSON to Python objects. It deserializes todict,list,int,float,str,bool, andNone objects.

bytes,bytearray,memoryview, andstr input are accepted. If the inputexists as amemoryview,bytearray, orbytes object, it is recommended topass these directly rather than creating an unnecessarystr object. That is,orjson.loads(b"{}") instead oforjson.loads(b"{}".decode("utf-8")). Thishas lower memory usage and lower latency.

The input must be valid UTF-8.

orjson maintains a cache of map keys for the duration of the process. Thiscauses a net reduction in memory usage by avoiding duplicate strings. Thekeys must be at most 64 bytes to be cached and 2048 entries are stored.

The global interpreter lock (GIL) is held for the duration of the call.

It raisesJSONDecodeError if given an invalid type or invalidJSON. This includes if the input containsNaN,Infinity, or-Infinity,which the standard library allows, but is not valid JSON.

It raisesJSONDecodeError if a combination of array or object recurses1024 levels deep.

It raisesJSONDecodeError if unable to allocate a buffer large enoughto parse the document.

JSONDecodeError is a subclass ofjson.JSONDecodeError andValueError.This is for compatibility with the standard library.

Types

dataclass

orjson serializes instances ofdataclasses.dataclass natively. It serializesinstances 40-50x as fast as other libraries and avoids a severe slowdown seenin other libraries compared to serializingdict.

It is supported to pass all variants of dataclasses, including dataclassesusing__slots__, frozen dataclasses, those with optional or defaultattributes, and subclasses. There is a performance benefit to notusing__slots__.

Library	dict (ms)	dataclass (ms)	vs. orjson
orjson	0.43	0.95	1
json	5.81	38.32	40

This measures serializing 555KiB of JSON, orjson natively and other librariesusingdefault to serialize the output ofdataclasses.asdict(). This can bereproduced using thepydataclass script.

Dataclasses are serialized as maps, with every attribute serialized and inthe order given on class definition:

>>>importdataclasses,orjson,typing@dataclasses.dataclassclassMember:id:intactive:bool=dataclasses.field(default=False)@dataclasses.dataclassclassObject:id:intname:strmembers:typing.List[Member]>>>orjson.dumps(Object(1,"a", [Member(1,True),Member(2)]))b'{"id":1,"name":"a","members":[{"id":1,"active":true},{"id":2,"active":false}]}'

datetime

orjson serializesdatetime.datetime objects toRFC 3339 format,e.g., "1970-01-01T00:00:00+00:00". This is a subset of ISO 8601 and iscompatible withisoformat() in the standard library.

>>>importorjson,datetime,zoneinfo>>>orjson.dumps(datetime.datetime(2018,12,1,2,3,4,9,tzinfo=zoneinfo.ZoneInfo("Australia/Adelaide")))b'"2018-12-01T02:03:04.000009+10:30"'>>>orjson.dumps(datetime.datetime(2100,9,1,21,55,2).replace(tzinfo=zoneinfo.ZoneInfo("UTC")))b'"2100-09-01T21:55:02+00:00"'>>>orjson.dumps(datetime.datetime(2100,9,1,21,55,2))b'"2100-09-01T21:55:02"'

datetime.datetime supports instances with atzinfo that isNone,datetime.timezone.utc, a timezone instance from the python3.9+zoneinfomodule, or a timezone instance from the third-partypendulum,pytz, ordateutil/arrow libraries.

It is fastest to use the standard library'szoneinfo.ZoneInfo for timezones.

datetime.time objects must not have atzinfo.

>>>importorjson,datetime>>>orjson.dumps(datetime.time(12,0,15,290))b'"12:00:15.000290"'

datetime.date objects will always serialize.

>>>importorjson,datetime>>>orjson.dumps(datetime.date(1900,1,2))b'"1900-01-02"'

Errors withtzinfo result inJSONEncodeError being raised.

To disable serialization ofdatetime objects specify the optionorjson.OPT_PASSTHROUGH_DATETIME.

To use "Z" suffix instead of "+00:00" to indicate UTC ("Zulu") time, use the optionorjson.OPT_UTC_Z.

To assume datetimes without timezone are UTC, use the optionorjson.OPT_NAIVE_UTC.

enum

orjson serializes enums natively. Options apply to their values.

>>>importenum,datetime,orjson>>>classDatetimeEnum(enum.Enum):EPOCH=datetime.datetime(1970,1,1,0,0,0)>>>orjson.dumps(DatetimeEnum.EPOCH)b'"1970-01-01T00:00:00"'>>>orjson.dumps(DatetimeEnum.EPOCH,option=orjson.OPT_NAIVE_UTC)b'"1970-01-01T00:00:00+00:00"'

Enums with members that are not supported types can be serialized usingdefault:

>>>importenum,orjson>>>classCustom:def__init__(self,val):self.val=valdefdefault(obj):ifisinstance(obj,Custom):returnobj.valraiseTypeErrorclassCustomEnum(enum.Enum):ONE=Custom(1)>>>orjson.dumps(CustomEnum.ONE,default=default)b'1'

float

orjson serializes and deserializes double precision floats with no loss ofprecision and consistent rounding.

orjson.dumps() serializes Nan, Infinity, and -Infinity, which are notcompliant JSON, asnull:

>>>importorjson,json>>>orjson.dumps([float("NaN"),float("Infinity"),float("-Infinity")])b'[null,null,null]'>>>json.dumps([float("NaN"),float("Infinity"),float("-Infinity")])'[NaN, Infinity, -Infinity]'

int

orjson serializes and deserializes 64-bit integers by default. The rangesupported is a signed 64-bit integer's minimum (-9223372036854775807) toan unsigned 64-bit integer's maximum (18446744073709551615). Thisis widely compatible, but there are implementationsthat only support 53-bits for integers, e.g.,web browsers. For those implementations,dumps() can be configured toraise aJSONEncodeError on values exceeding the 53-bit range.

>>>importorjson>>>orjson.dumps(9007199254740992)b'9007199254740992'>>>orjson.dumps(9007199254740992,option=orjson.OPT_STRICT_INTEGER)JSONEncodeError:Integerexceeds53-bitrange>>>orjson.dumps(-9007199254740992,option=orjson.OPT_STRICT_INTEGER)JSONEncodeError:Integerexceeds53-bitrange

numpy

orjson natively serializesnumpy.ndarray and individualnumpy.float64,numpy.float32,numpy.float16 (numpy.half),numpy.int64,numpy.int32,numpy.int16,numpy.int8,numpy.uint64,numpy.uint32,numpy.uint16,numpy.uint8,numpy.uintp,numpy.intp,numpy.datetime64, andnumpy.boolinstances.

orjson is compatible with both numpy v1 and v2.

orjson is faster than all compared libraries at serializingnumpy instances. Serializing numpy data requires specifyingoption=orjson.OPT_SERIALIZE_NUMPY.

>>>importorjson,numpy>>>orjson.dumps(numpy.array([[1,2,3], [4,5,6]]),option=orjson.OPT_SERIALIZE_NUMPY,)b'[[1,2,3],[4,5,6]]'

The array must be a contiguous C array (C_CONTIGUOUS) and one of thesupported datatypes.

Note a difference between serializingnumpy.float32 usingndarray.tolist()ororjson.dumps(..., option=orjson.OPT_SERIALIZE_NUMPY):tolist() convertsto adouble before serializing and orjson's native path does not. Thiscan result in different rounding.

numpy.datetime64 instances are serialized as RFC 3339 strings anddatetime options affect them.

>>>importorjson,numpy>>>orjson.dumps(numpy.datetime64("2021-01-01T00:00:00.172"),option=orjson.OPT_SERIALIZE_NUMPY,)b'"2021-01-01T00:00:00.172000"'>>>orjson.dumps(numpy.datetime64("2021-01-01T00:00:00.172"),option=(orjson.OPT_SERIALIZE_NUMPY|orjson.OPT_NAIVE_UTC|orjson.OPT_OMIT_MICROSECONDS        ),)b'"2021-01-01T00:00:00+00:00"'

If an array is not a contiguous C array, contains an unsupported datatype,or contains anumpy.datetime64 using an unsupported representation(e.g., picoseconds), orjson falls through todefault. Indefault,obj.tolist() can be specified.

If an array is not in the native endianness, e.g., an array of big-endian valueson a little-endian system,orjson.JSONEncodeError is raised.

If an array is malformed,orjson.JSONEncodeError is raised.

This measures serializing 92MiB of JSON from annumpy.ndarray withdimensions of(50000, 100) andnumpy.float64 values:

Library	Latency (ms)	RSS diff (MiB)	vs. orjson
orjson	105	105	1
json	1,481	295	14.2

This measures serializing 100MiB of JSON from annumpy.ndarray withdimensions of(100000, 100) andnumpy.int32 values:

Library	Latency (ms)	RSS diff (MiB)	vs. orjson
orjson	68	119	1
json	684	501	10.1

This measures serializing 105MiB of JSON from annumpy.ndarray withdimensions of(100000, 200) andnumpy.bool values:

Library	Latency (ms)	RSS diff (MiB)	vs. orjson
orjson	50	125	1
json	573	398	11.5

In these benchmarks, orjson serializes natively andjson serializesndarray.tolist() viadefault. The RSS column measures peak memoryusage during serialization. This can be reproduced using thepynumpy script.

orjson does not have an installation or compilation dependency on numpy. Theimplementation is independent, readingnumpy.ndarray usingPyArrayInterface.

str

orjson is strict about UTF-8 conformance. This is stricter than the standardlibrary's json module, which will serialize and deserialize UTF-16 surrogates,e.g., "\ud800", that are invalid UTF-8.

Iforjson.dumps() is given astr that does not contain valid UTF-8,orjson.JSONEncodeError is raised. Ifloads() receives invalid UTF-8,orjson.JSONDecodeError is raised.

>>>importorjson,json>>>orjson.dumps('\ud800')JSONEncodeError:strisnotvalidUTF-8:surrogatesnotallowed>>>json.dumps('\ud800')'"\\ud800"'>>>orjson.loads('"\\ud800"')JSONDecodeError:unexpectedendofhexescapeatline1column8:line1column1 (char0)>>>json.loads('"\\ud800"')'\ud800'

To make a best effort at deserializing bad input, first decodebytes usingthereplace orlossy argument forerrors:

>>>importorjson>>>orjson.loads(b'"\xed\xa0\x80"')JSONDecodeError:strisnotvalidUTF-8:surrogatesnotallowed>>>orjson.loads(b'"\xed\xa0\x80"'.decode("utf-8","replace"))'���'

uuid

orjson serializesuuid.UUID instances toRFC 4122 format, e.g.,"f81d4fae-7dec-11d0-a765-00a0c91e6bf6".

>>>importorjson,uuid>>>orjson.dumps(uuid.uuid5(uuid.NAMESPACE_DNS,"python.org"))b'"886313e1-3b8a-5372-9b90-0c9aee199e5d"'

Testing

The library has comprehensive tests. There are tests against fixtures in theJSONTestSuite andnativejson-benchmarkrepositories. It is tested to not crash against theBig List of Naughty Strings.It is tested to not leak memory. It is tested to not crashagainst and not accept invalid UTF-8. There are integration testsexercising the library's use in web servers (gunicorn using multiprocess/forkedworkers) and whenmultithreaded. It also uses some tests from the ultrajson library.

orjson is the most correct of the compared libraries. This graph shows how eachlibrary handles a combined 342 JSON fixtures from theJSONTestSuite andnativejson-benchmark tests:

Library	Invalid JSON documents not rejected	Valid JSON documents not deserialized
orjson	0	0
json	17	0

This shows that all libraries deserialize valid JSON but only orjsoncorrectly rejects the given invalid JSON fixtures. Errors are largely due toaccepting invalid strings and numbers.

The graph above can be reproduced using thepycorrectness script.

Performance

Serialization and deserialization performance of orjson is consistently betterthan the standard library'sjson. The graphs below illustrate a few commonlyused documents.

Latency

twitter.json serialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.1	8453	1
json	1.3	765	11.1

twitter.json deserialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.5	1889	1
json	2.2	453	4.2

github.json serialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.01	103693	1
json	0.13	7648	13.6

github.json deserialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.04	23264	1
json	0.1	10430	2.2

citm_catalog.json serialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	0.3	3975	1
json	3	338	11.8

citm_catalog.json deserialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	1.3	781	1
json	4	250	3.1

canada.json serialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	2.5	399	1
json	29.8	33	11.9

canada.json deserialization

Library	Median latency (milliseconds)	Operations per second	Relative (latency)
orjson	3	333	1
json	18	55	6

Reproducing

The above was measured using Python 3.11.10 in a Fedora 42 container on anx86-64-v4 machine using theorjson-3.10.11-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whlartifact on PyPI. The latency results can be reproduced using thepybench script.

Questions

Will it deserialize to dataclasses, UUIDs, decimals, etc or support object_hook?

No. This requires a schema specifying what types are expected and how tohandle errors etc. This is addressed by data validation libraries alevel above this.

Will it serialize to`str`?

No.bytes is the correct type for a serialized blob.

Will it support NDJSON or JSONL?

No.orjsonl may be appropriate.

Will it support JSON5 or RJSON?

No, it supports RFC 8259.

How do I depend on orjson in a Rust project?

orjson is only shipped as a Python module. The project should depend onorjson in its own Python requirements and should obtain pointers tofunctions and objects using the normalPyImport_* APIs.

Packaging

To package orjson requires at leastRust 1.85and thematurin build tool. The recommendedbuild command is:

maturin build --release --strip

It benefits from also having a C build environment to compile a fasterdeserialization backend. See this project'smanylinux_2_28 builds for anexample using clang and LTO.

The project's own CI tests againstnightly-2025-08-10 and stable 1.82. Itis prudent to pin the nightly version because that channel can introducebreaking changes. There is a significant performance benefit to usingnightly.

orjson is tested on native hardware for amd64, aarch64, and i686 on Linux andfor arm7, ppc64le, and s390x is cross-compiled and may be tested viaemulation. It is tested for aarch64 on macOS and cross-compiles for amd64. ForWindows it is tested on amd64, i686, and aarch64.

There are no runtime dependencies other than libc.

The source distribution on PyPI contains all dependencies' source and can bebuilt without network access. The file can be downloaded fromhttps://files.pythonhosted.org/packages/source/o/orjson/orjson-${version}.tar.gz.

orjson's tests are included in the source distribution on PyPI. The testsrequire onlypytest. There are optional packages such aspytz andnumpylisted intest/requirements.txt and used in ~10% of tests. Not having thesedependencies causes the tests needing them to skip. Tests can be runwithpytest -q test.