gitpython-developers/gitdbPublic

NotificationsYou must be signed in to change notification settings
Fork68
Star225

tutorial.rst

Latest commit

History

114 lines (73 loc) · 5.57 KB

tutorial.rst

File metadata and controls

114 lines (73 loc) · 5.57 KB

Raw

Usage Guide

This text briefly introduces you to the basic design decisions and accompanying types.

Design

TheGitDB project models a standard git object database and implements it in pure python. This means that data, being classified by one of four types, can can be stored in the database and will in future be referred to by the generated SHA1 key, which is a 20 byte string within python.

GitDB implementsRW access to loose objects, as well asRO access to packed objects. Compound Databases allow to combine multiple object databases into one.

All data is read and written using streams, which effectively prevents more than a chunk of the data being kept in memory at once mostly[1].

Streams

In order to assure the object database can handle objects of any size, a stream interface is used for data retrieval as well as to fill data into the database.

Basic Stream Types

There are two fundamentally different types of streams,IStreams andOStreams. IStreams are mutable and are used to provide data streams to the database to create new objects.

OStreams are immutable and are used to read data from the database. The base of this type,OInfo, contains only type and size information of the queried object, but no stream, which is slightly faster to retrieve depending on the database.

OStreams are tuples, IStreams are lists. Both, OInfo and OStream, have the same member ordering which allows quick conversion from one type to another.

Data Query and Data Addition

Databases support query and/or addition of objects using simple interfaces. They are calledObjectDBR for read-only access, andObjectDBW for write access to create new objects.

Both have two sets of methods, one of which allows interacting with single objects, the other one allowing to handle a stream of objects simultaneously and asynchronously.

Acquiring information about an object from a database is easy if you have a SHA1 to refer to the object:

ldb = LooseObjectDB(fixture_path("../../../.git/objects"))for sha1 in ldb.sha_iter():    oinfo = ldb.info(sha1)    ostream = ldb.stream(sha1)    assert oinfo[:3] == ostream[:3]    assert len(ostream.read()) == ostream.size# END for each sha in database

To store information, you prepare anIStream object with the required information. The provided stream will be read and converted into an object, and the respective 20 byte SHA1 identifier is stored in the IStream object:

data = "my data"istream = IStream("blob", len(data), StringIO(data))# the object does not yet have a shaassert istream.binsha is Noneldb.store(istream)# now the sha is setassert len(istream.binsha) == 20assert ldb.has_object(istream.binsha)

Asynchronous Operation

For each read or write method that allows a single-object to be handled, an_async version exists which reads items to be processed from a channel, and writes the operation's result into an output channel that is read by the caller or by other async methods, to support chaining.

Using asynchronous operations is easy, but chaining multiple operations together to form a complex one would require you to read the docs of theasync package. At the current time, due to theGIL, theGitDB can only achieve true concurrency during zlib compression and decompression if big objects, if the respective c modules where compiled inasync.

Asynchronous operations are scheduled by aThreadPool which resides in thegitdb.util module:

from gitdb.util import pool# set the pool to use two threadspool.set_size(2)# synchronize the mode of operationpool.set_size(0)

Use async methods with readers, which supply items to be processed. The result is given through readers as well:

from async import IteratorReader# Create a reader from an iteratorreader = IteratorReader(ldb.sha_iter())# get reader for object streamsinfo_reader = ldb.stream_async(reader)# read oneinfo = info_reader.read(1)[0]# read all the rest until depletionostreams = info_reader.read()

Databases

A database implements different interfaces, one if which will always be theObjectDBR interface to support reading of object information and streams.

TheLoose Object Database as well as thePacked Object Database areFile Databases, hence they operate on a directory which contains files they can read.

File databases implementing theObjectDBW interface can also be forced to write their output into the specified stream, using theset_ostream method. This effectively allows you to redirect its output to anywhere you like.

Compound Databases are not implementing their own access type, but instead combine multiple database implementations into one. Examples for this database type are theReference Database, which reads object locations from a file, and theGitDB which combines loose, packed and referenced objects into one database interface.

For more information about the individual database types, please see the:ref:`API Reference <api-label>`, and the unittests for the respective types.

[1]	When reading streams from packs, all deltas are currently applied and the result written into a memory map before the first byte is returned. Future versions of the delta-apply algorithm might improve on this.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial.rst

Latest commit

History

tutorial.rst

File metadata and controls

Usage Guide

Design

Streams

Basic Stream Types

Data Query and Data Addition

Asynchronous Operation

Databases

Movatterモバイル変換

FilesExpand file tree

tutorial.rst

Latest commit

History

tutorial.rst

File metadata and controls

Usage Guide

Design

Streams

Basic Stream Types

Data Query and Data Addition

Asynchronous Operation

Databases