zillow/zdatasetsPublic

NotificationsYou must be signed in to change notification settings
Fork3
Star6

Dataset SDK for consistent read/write [batch, online, streaming] data.

License

Apache-2.0 license

6 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
binder		binder
docs		docs
zdatasets		zdatasets
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Repository files navigation

Welcome to zdatasets

==================================================

TODO

importpandasaspdfrommetaflowimportFlowSpec,stepfromzdatasetsimportDataset,Modefromzdatasets.metaflowimportDatasetParameterfromzdatasets.pluginsimportBatchOptions# Can also invoke from CLI:#  > python zdatasets/tutorials/0_hello_dataset_flow.py run \#    --hello_dataset '{"name": "HelloDataset", "mode": "READ_WRITE", \#    "options": {"type": "BatchOptions", "partition_by": "region"}}'classHelloDatasetFlow(FlowSpec):hello_dataset=DatasetParameter("hello_dataset",default=Dataset("HelloDataset",mode=Mode.READ_WRITE,options=BatchOptions(partition_by="region")),    )@stepdefstart(self):df=pd.DataFrame({"region": ["A","A","A","B","B","B"],"zpid": [1,2,3,4,5,6]})print("saving data_frame:\n",df.to_string(index=False))# Example of writing to a datasetself.hello_dataset.write(df)# save this as an output datasetself.output_dataset=self.hello_datasetself.next(self.end)@stepdefend(self):print(f"I have dataset\n{self.output_dataset=}")# output_dataset to_pandas(partitions=dict(region="A")) onlydf:pd.DataFrame=self.output_dataset.to_pandas(partitions=dict(region="A"))print('self.output_dataset.to_pandas(partitions=dict(region="A")):')print(df.to_string(index=False))if__name__=="__main__":HelloDatasetFlow()

About

Dataset SDK for consistent read/write [batch, online, streaming] data.

Releases14

datasets -> zdatasets package rename Latest

Jun 26, 2023

+ 13 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Welcome to zdatasets

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases14

Packages

Uh oh!

Contributors6

Uh oh!

Languages

Movatterモバイル変換

License

zillow/zdatasets

Folders and files

Latest commit

History

Repository files navigation

Welcome to zdatasets

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases14

Packages0

Uh oh!

Contributors6

Uh oh!

Languages

Packages