| ![]() ![]() ![]() |
In contrast to most other database systems, Berkeley DB provides relativelysimple data access services.
Records in Berkeley DB are (key,value) pairs. Berkeley DBsupports only a few logical operations on records. They are:
Notice that Berkeley DB never operates on the value part of a record.Values are simply payload, to bestored with keys and reliably delivered back to the application ondemand.
Both keys and values can be arbitrary byte strings, either fixed-lengthor variable-length. As a result, programmers can put native programminglanguage data structures into the database without converting them toa foreign record format first. Storage and retrieval are very simple,but the application needs to know what the structure of a key and avalue is in advance. It cannot ask Berkeley DB, because Berkeley DB doesn't know.
This is an important feature of Berkeley DB, and one worth considering morecarefully. On the one hand, Berkeley DB cannot provide the programmer withany information on the contents or structure of the values that itstores. The application must understand the keys and values that ituses. On the other hand, there is literally no limit to the data typesthat can be store in a Berkeley DB database. The application never needs toconvert its own program data into the data types that Berkeley DB supports.Berkeley DB is able to operate on any data type the application uses, nomatter how complex.
Because both keys and values can be up to four gigabytes in length, asingle record can store images, audio streams, or other large datavalues. Large values are not treated specially in Berkeley DB. They aresimply broken into page-sized chunks, and reassembled on demand whenthe application needs them. Unlike some other database systems, Berkeley DBoffers no special support for binary large objects (BLOBs).
Berkeley DB is not a relational database.
First, Berkeley DB does not support SQL queries. All access to data is throughthe Berkeley DB API. Developers must learn a new set of interfaces in orderto work with Berkeley DB. Although the interfaces are fairly simple, they arenon-standard.
SQL support is a double-edged sword. One big advantage of relationaldatabases is that they allow users to write simple declarative queriesin a high-level language. The database system knows everything aboutthe data and can carry out the command. This means that it's simple tosearch for data in new ways, and to ask new questions of the database.No programming is required.
On the other hand, if a programmer can predict in advance how anapplication will access data, then writing a low-level program to getand store records can be faster. It eliminates the overhead of queryparsing, optimization, and execution. The programmer must understandthe data representation, and must write the code to do the work, butonce that's done, the application can be very fast.
Second, Berkeley DB has no notion ofschema in the way thatrelational systems do. Schema is the structure of records in tables,and the relationships among the tables in the database. For example, ina relational system the programmer can create a record from a fixed menuof data types. Because the record types are declared to the system, therelational engine can reach inside records and examine individual valuesin them. In addition, programmers can use SQL to declare relationshipsamong tables, and to create indices on tables. Relational enginesusually maintain these relationships and indices automatically.
In Berkeley DB, the key and value in a record are opaqueto Berkeley DB. They may have a richinternal structure, but the library is unaware of it. As a result, Berkeley DBcannot decompose the value part of a record into its constituent parts,and cannot use those parts to find values of interest. Only theapplication, which knows the data structure, can do that.
Berkeley DB does allow programmers to create indices on tables, and to usethose indices to speed up searches. However, the programmer has no wayto tell the library how different tables and indices are related. Theapplication needs to make sure that they all stay consistent. In thecase of indices in particular, if the application puts a new record intoa table, it must also put a new record in the index for it. It'sgenerally simple to write a single function to make the requiredupdates, but it is work that relational systems do automatically.
Berkeley DB is not a relational system. Relational database systems aresemantically rich and offer high-level database access. Compared to suchsystems, Berkeley DB is a high-performance, transactional library for recordstorage. It's possible to build a relational system on top of Berkeley DB. Infact, the popular MySQL relational system uses Berkeley DB fortransaction-protected table management, and takes care of all the SQLparsing and execution. It uses Berkeley DB for the storage level, and providesthe semantics and access tools.
Object-oriented databases are designed for very tight integration withobject-oriented programming languages. Berkeley DB is written entirely in theC programming language. It includes language bindings for C++, Java,and other languages, but the library has no information about theobjects created in any object-oriented application. Berkeley DB never makesmethod calls on any application object. It has no idea what methods aredefined on user objects, and cannot see the public or private membersof any instance. The key and value part of all records are opaque toBerkeley DB.
Berkeley DB cannot automatically page in objects as they are accessed, as someobject-oriented databases do. The object-oriented application programmermust decide what records are required, and must fetch them by makingmethod calls on Berkeley DB objects.
Berkeley DB does not support network-style navigation among records, asnetwork databases do. Records in a Berkeley DB table may move around overtime, as new records are added to the table and old ones are deleted.Berkeley DB is able to do fast searches for records based on keys, but thereis no way to create a persistent physical pointer to a record.Applications can only refer to records by key, not by address.
Berkeley DB is not a standalone database server. It is a library, and runs inthe address space of the application that uses it. If more than oneapplication links in Berkeley DB, then all can use the same database at thesame time; the library handles coordination among the applications, andguarantees that they do not interfere with one another.
Recent releases of Berkeley DB allow programmers to compile the library as astandalone process, and to use RPC stubs to connect to it and to carryout operations. However, there are some important limitations to thisfeature. The RPC stubs provide exactly the same API that the libraryitself does. There is no higher-level access provided by the standaloneprocess. Tuning the standalone process is difficult, since Berkeley DB doesno threading in the library (applications can be threaded, but thelibrary never creates a thread on its own).
It is possible to build a server application that uses Berkeley DB for datamanagement. For example, many commercial and open source LightweightDirectory Access Protocol (LDAP) servers use Berkeley DB for record storage.LDAP clients connect to these servers over the network. Individualservers make calls through the Berkeley DB API to find records and return themto clients. On its own, however, Berkeley DB is not a server.
![]() ![]() ![]() |