afeld/sodapyPublic

NotificationsYou must be signed in to change notification settings
Fork113
Star411

Python client for the Socrata Open Data API

License

MIT license

411 stars 113 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
.github		.github
.vscode		.vscode
examples		examples
sodapy		sodapy
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Repository files navigation

sodapy

sodapy is a python client for theSocrata Open Data API.

Installation

You can install withpip install sodapy.

This package usessemantic versioning.

Documentation

Theofficial Socrata Open Data API docs provide thorough documentation of the available methods, as well asother client libraries. A quick list of eligible domains to use with this API is available via theSocrata Discovery API orSocrata's Open Data Network.

This library supports writing directly to datasets with the Socrata Open Data API. For write operations that use data transformations in the Socrata Data Management Experience (the user interface for creating datasets), use the Socrata Data Management API. For more details on when to use SODA vs the Data Management API, see theData Management API documentation. A Python SDK for the Socrata Data Management API can be found atsocrata-py.

Examples

There are someJupyter notebooks in theexamples directory with usage examples of sodapy in action.

Interface

client

Import the library and set up a connection to get started.

fromsodapyimportSocrataclient=Socrata("sandbox.demo.socrata.com","FakeAppToken",username="fakeuser@somedomain.com",password="mypassword",timeout=10)

username andpassword are only required for creating or modifying data. An application token isn't strictly required (can beNone), but queries executed from a client without an application token will be subjected to strict throttling limits. You may want to increase thetimeout seconds when making large requests. To create a bare-bones client:

client=Socrata("sandbox.demo.socrata.com",None)

A client can also be created with a context manager to obviate the need for teardown:

withSocrata("sandbox.demo.socrata.com",None)asclient:# do some stuff

The client, by default, makes requests over HTTPS. To modify this behavior, or to make requests through a proxy, take a lookhere.

datasets(limit=0, offset=0)

Retrieve datasets associated with a particular domain. The optionallimit andoffset keyword args can be used to retrieve a subset of the datasets. By default, all datasets are returned.

>>> client.datasets()[{"resource" : {"name" : "Approved Building Permits", "id" : "msk6-43c6", "parent_fxf" : null, "description" : "Data of approved building/construction permits",...}, {resource : {...}}, ...]

get(dataset_identifier, content_type="json", **kwargs)

Retrieve data from the requested resources. Filter and query data by field name, id, or usingSoQL keywords.

>>> client.get("nimj-3ivp", limit=2)[{u'geolocation': {u'latitude': u'41.1085', u'needs_recoding': False, u'longitude': u'-117.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Nevada', u'occurred_at': u'2012-09-14T22:38:01', u'number_of_stations': u'15', u'depth': u'7.60', u'magnitude': u'2.7', u'earthquake_id': u'00388610'}, {...}]>>> client.get("nimj-3ivp", where="depth > 300", order="magnitude DESC", exclude_system_fields=False)[{u'geolocation': {u'latitude': u'-15.563', u'needs_recoding': False, u'longitude': u'-175.6104'}, u'version': u'9', u':updated_at': 1348778988, u'number_of_stations': u'275', u'region': u'Tonga', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T21:16:43', u':id': 132, u'source': u'us', u'depth': u'328.30', u'magnitude': u'4.8', u':meta': u'{\n}', u':updated_meta': u'21484', u'earthquake_id': u'c000cnb5', u':created_at': 1348778988}, {...}]>>> client.get("nimj-3ivp/193", exclude_system_fields=False){u'geolocation': {u'latitude': u'21.6711', u'needs_recoding': False, u'longitude': u'142.9236'}, u'version': u'C', u':updated_at': 1348778988, u'number_of_stations': u'136', u'region': u'Mariana Islands region', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T11:19:07', u':id': 193, u'source': u'us', u'depth': u'300.70', u'magnitude': u'4.4', u':meta': u'{\n}', u':updated_meta': u'21484', u':position': 193, u'earthquake_id': u'c000cmsq', u':created_at': 1348778988}>>> client.get("nimj-3ivp", region="Kansas")[{u'geolocation': {u'latitude': u'38.10', u'needs_recoding': False, u'longitude': u'-100.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Kansas', u'occurred_at': u'2010-09-19T20:52:09', u'number_of_stations': u'15', u'depth': u'300.0', u'magnitude': u'1.9', u'earthquake_id': u'00189621'}, {...}]

get_all(dataset_identifier, content_type="json", **kwargs)

Read data from the requested resource, paginating over all results. Accepts the same arguments asget(). Returns a generator.

>>> client.get_all("nimj-3ivp")<generator object Socrata.get_all at 0x7fa0dc8be7b0>>>> for item in client.get_all("nimj-3ivp"):...     print(item)...{'geolocation': {'latitude': '-15.563', 'needs_recoding': False, 'longitude': '-175.6104'}, 'version': '9', ':updated_at': 1348778988, 'number_of_stations': '275', 'region': 'Tonga', ':created_meta': '21484', 'occurred_at': '2012-09-13T21:16:43', ':id': 132, 'source': 'us', 'depth': '328.30', 'magnitude': '4.8', ':meta': '{\n}', ':updated_meta': '21484', 'earthquake_id': 'c000cnb5', ':created_at': 1348778988}...>>> import itertools>>> items = client.get_all("nimj-3ivp")>>> first_five = list(itertools.islice(items, 5))>>> len(first_five)5

get_metadata(dataset_identifier, content_type="json")

Retrieve the metadata associated with a particular dataset.

>>> client.get_metadata("nimj-3ivp"){"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "http://foo.bar.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

update_metadata(dataset_identifier, update_fields, content_type="json")

Update the metadata for a particular dataset.update_fields should be a dictionary containing only the metadata keys that you wish to overwrite.

Note: Invalid payloads to this method could corrupt the dataset or visualization. Seethis comment for more information.

>>> client.update_metadata("nimj-3ivp", {"attributionLink": "https://anothertest.com"}){"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "https://anothertest.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

download_attachments(dataset_identifier, content_type="json", download_dir="~/sodapy_downloads")

Download all attachments associated with a dataset. Return a list of paths to the downloaded files.

>>> client.download_attachments("nimj-3ivp", download_dir="~/Desktop")    ['/Users/xmunoz/Desktop/nimj-3ivp/FireIncident_Codes.PDF', '/Users/xmunoz/Desktop/nimj-3ivp/AccidentReport.jpg']

create(name, **kwargs)

Create a new dataset. Optionally, specify keyword args such as:

description description of the dataset
columns list of fields
category dataset category (must exist in /admin/metadata)
tags list of tag strings
row_identifier field name of primary key
new_backend whether to create the dataset in the new backend

Example usage:

>>> columns = [{"fieldName": "delegation", "name": "Delegation", "dataTypeName": "text"}, {"fieldName": "members", "name": "Members", "dataTypeName": "number"}]>>> tags = ["politics", "geography"]>>> client.create("Delegates", description="List of delegates", columns=columns, row_identifier="delegation", tags=tags, category="Transparency"){u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

publish(dataset_identifier, content_type="json")

Publish a dataset after creating it, i.e. take it out of 'working copy' mode. The dataset idid returned fromcreate will be used to publish.

>>> client.publish("2frc-hyvj"){u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

set_permission(dataset_identifier, permission="private", content_type="json")

Set the permissions of a dataset to public or private.

>>> client.set_permission("2frc-hyvj", "public")<Response [200]>

upsert(dataset_identifier, payload, content_type="json")

Create a new row in an existing dataset.

>>> data = [{'Delegation': 'AJU', 'Name': 'Alaska', 'Key': 'AL', 'Entity': 'Juneau'}]>>> client.upsert("eb9n-hr43", data){u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 1, u'By RowIdentifier': 0}

Update/Delete rows in a dataset.

>>> data = [{'Delegation': 'sfa', ':id': 8, 'Name': 'bar', 'Key': 'doo', 'Entity': 'dsfsd'}, {':id': 7, ':deleted': True}]>>> client.upsert("eb9n-hr43", data){u'Errors': 0, u'Rows Deleted': 1, u'Rows Updated': 1, u'By SID': 2, u'Rows Created': 0, u'By RowIdentifier': 0}

upsert's can even be performed with a csv file.

>>> data = open("upsert_test.csv")>>> client.upsert("eb9n-hr43", data){u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 1, u'By SID': 1, u'Rows Created': 0, u'By RowIdentifier': 0}

replace(dataset_identifier, payload, content_type="json")

Similar in usage toupsert, but overwrites existing data.

>>> data = open("replace_test.csv")>>> client.replace("eb9n-hr43", data){u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 12, u'By RowIdentifier': 0}

create_non_data_file(params, file_obj)

Creates a new file-based dataset with the name provided in the filestuple. A valid file input would be:

files= (    {'file': ("gtfs2",open('myfile.zip','rb'))})

withopen(nondatafile_path,'rb')asf:files= (        {'file': ("nondatafile.zip",f)}    )response=client.create_non_data_file(params,files)

replace_non_data_file(dataset_identifier, params, file_obj)

Same as create_non_data_file, but replaces a file that already exists in afile-based dataset.

Note: a table-based dataset cannot be replaced by a file-based dataset. Use create_non_data_file in order to replace.

withopen(nondatafile_path,'rb')asf:files= (        {'file': ("nondatafile.zip",f)}    )response=client.replace_non_data_file(DATASET_IDENTIFIER, {},files)

delete(dataset_identifier, row_id=None, content_type="json")

Delete an individual row.

>>> client.delete("nimj-3ivp", row_id=2)<Response [200]>

Delete the entire dataset.

>>> client.delete("nimj-3ivp")<Response [200]>

close()

Close the session when you're finished.

client.close()

Contributing

SeeCONTRIBUTING.md.

History

This package was initially created and maintained by@xmunoz. On March 8, 2025, ownership was transferred to@afeld.

About

Python client for the Socrata Open Data API

Languages

Python100.0%

Movatterモバイル変換

License

afeld/sodapy

Folders and files

Latest commit

History

Repository files navigation

sodapy

Installation

Documentation

Examples

Interface

Table of Contents

client

datasets(limit=0, offset=0)

get(dataset_identifier, content_type="json", **kwargs)

get_all(dataset_identifier, content_type="json", **kwargs)

get_metadata(dataset_identifier, content_type="json")

update_metadata(dataset_identifier, update_fields, content_type="json")

download_attachments(dataset_identifier, content_type="json", download_dir="~/sodapy_downloads")

create(name, **kwargs)

publish(dataset_identifier, content_type="json")

set_permission(dataset_identifier, permission="private", content_type="json")

upsert(dataset_identifier, payload, content_type="json")

replace(dataset_identifier, payload, content_type="json")

create_non_data_file(params, file_obj)

replace_non_data_file(dataset_identifier, params, file_obj)

delete(dataset_identifier, row_id=None, content_type="json")

close()

Contributing

History

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors14

Uh oh!

Languages

Packages