Why Should I Use This?

The goal of pyPreservica is to allow you to make use of the Preservica APIs for reading and writing objects within a Preservica repository without having to manage the underlying REST HTTPS requests and XML parsing. The library provides a level of abstraction which reflects the underlying data model, such as structural and information objects.

The pyPreservica library allows Preservica users to build applications which interact with the repository such as metadata synchronisation with 3rd party systems, high-throughput ingest pipelines etc.

Note

Access to the Preservica API’s for the cloud hosted system does depend on which Preservica Edition has been licensed. See https://preservica.com/digital-archive-software/products-editions for details.

SDK Features

Entity API Features

Fetch and Update Entity Objects (Folders, Assets, Content Objects)
Add, Delete and Update External Identifiers
Add, Delete and Update Descriptive Metadata Fragments
Change Security tags on Folders and Assets
Create new Folder Entities
Move Assets and Folders within the repository
Deleting Assets and Folders
Fetch Folders and Assets belonging to parent Folders
Retrieve Representations, Generations & Bitstreams from Assets
Download digital files and thumbnails
Fetch lists of changed entities over the last n days
Request information on completed integrity checks
Add or remove asset and folder icons
Replace existing content objects within an Asset
Export OPEX Package
Fetch audit trail events on Entities and across the repository
Create Relationships between Assets

Content API Features

Fetch a list of indexed Solr Fields
Search based on a single query term
Filtered searches on indexed fields

Upload API Features

Create single Content Object Packages with multiple Representations
Create multiple Content Object Packages with multiple Representations
Upload packages to Preservica
Spreadsheet Metadata
Ingest Web Video
Ingest Twitter Feeds

Admin API Features

Schema Management (XML Templates, XSD Schema’s & XSLT Transforms)
User Management (create and remove user accounts)
Security Tags (add and remove security tags)

Retention Management API Features

Create new retention policies
Delete retention policies
Update retention policies
Assign retention policies to entities

Workflow API Features

Get Workflow Contexts
Get Workflow Instance
Start Workflow Instances

Webhook API Features

Subscribe to Webhook endpoints
Unsubscribe
List Subscriptions

Authority Records API Features

Get an Authority table by its reference
List all Authority tables
Return all records from an Authority table
Add records to an Authority table
Delete records from an Authority table

Metadata Groups and Forms Features

Create new Groups
Delete Groups
Add new metadata fields to an existing Group
Fetch existing Groups

Background

The key to working with the pyPreservica library is that the services follow the Preservica core data model closely.

The Preservica data model represents a hierarchy of entities, starting with the structural objects which are used to represent aggregations of digital assets. Structural objects define the organisation of the data. In a library context they may be referred to as collections, in an archival context they may be Fonds, Sub-Fonds, Series etc and in a records management context they could be simply a hierarchy of folders or directories.

These structural objects may contain other structural objects in the same way as a computer filesystem may contain folders within folders.

Within the structural objects comes the information objects. These objects which are sometimes referred to as the digital assets are what PREMIS defines as an Intellectual Entity. Information objects are considered a single intellectual unit for purposes of management and description: for example, a book, document, map, photograph or database etc.

Representations are used to define how the information objects are composed in terms of technology and structure. For example, a book may be represented as a single multiple page PDF, a single eBook file or a set of single page image files.

Representations are usually associated with a use case such as access or long-term preservation. All Information objects have at least one representation defined by default. Multiple representations can be either created outside of Preservica through a process such as digitisation or within Preservica through preservation processes such as normalisation.

Content Objects represent the components of the asset. Simple assets such as digital images may only contain a single content object whereas more complex assets such as books or 3d models may contain multiple content objects. In most cases content objects will map directly to digital files or bitstreams.

Generations represent changes to content objects over time, as formats become obsolete new generations may need to be created to make the information accessible.

Bitstreams represent the actual computer files as ingested into Preservica, i.e. the TIFF photograph or the PDF document.

Installation

pyPreservica is available from the Python Package Index (PyPI)

https://pypi.org/project/pyPreservica/

pyPreservica is built and tested against Python 3.10 onwards. Older versions of Python may not work.

To install pyPreservica, simply run this simple command in your terminal of choice:

$ pip install pyPreservica

or you can install in a virtual python environment using:

$ pipenv install pyPreservica

pyPreservica is under active development and the latest version is installed using

$ pip install --upgrade pyPreservica

Get the Source Code

pyPreservica is developed on GitHub, where the code is always available.

You can clone the public repository

$ git clone git://github.com/carj/pyPreservica.git

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/carj/pyPreservica

Support

pyPreservica is 3rd party open source client. There is no official support provided.

Important

There is no support for pyPreservica from Preservica Ltd

For announcements about new versions and discussion of pyPreservica please subscribe to the google groups forum https://groups.google.com/g/pypreservica

Bug reports can be raised directly on either GitHub or on the google group forum

General questions and queries about using pyPreservica should be posted on the google group forum above.

Examples

Using the python console, create the entity API client object and request an Asset (Information Object) by its unique reference and display some of its attributes.

All entities within the Preservica system have one unique reference which can be used to retrieve them.

The reference used to fetch entities (Assets, Folders) is the Preservica internal unique identifier. This is a universally unique identifier (UUID)

You can find the reference when viewing the object metadata within Explorer. Later on we will look at how we can fetch entities using other 3rd party external identifiers which may be more meaningful such as ISBNs, DOIs etc.

To create the client object you will need valid credentials to connect to the Preservica server. See the following section on available authentication options.

>>> from pyPreservica import *
>>> client = EntityAPI()
>>> client
pyPreservica version: 0.8.5  (Preservica 6.2 Compatible)
Connected to: us.preservica.com Version: 6.2.0 as test@test.com
>>> asset = client.asset("dc949259-2c1d-4658-8eee-c17b27a8823d")
>>> asset.reference
'dc949259-2c1d-4658-8eee-c17b27a8823d'
>>> asset.title
'LC-USZ62-20901'
>>> asset.parent
'ae108c8f-b058-4228-b099-6049175d2f0c'
>>> asset.security_tag
'open'
>>> asset.entity_type
<EntityType.ASSET: 'IO'>

If your credentials are valid, pyPreservica returns a client object which is the connection to the server. Printing the client returns information about the connection such as the server and the user name etc. This can be useful to check that you are connected to the correct system.

All entities have a parent reference attribute, for Assets this always points to the parent Folder. For Content Objects the parent points to the Asset and for Folders it points to the parent Folder if it exists. Folders at the root level of the repository do not have a parent and the attribute returns the special Python value of None

This example shows how pyPreservica can be used to upload and ingest a local file, picture.tiff into Preservica using the UploadAPI class. The tiff file will be ingested as a new Asset object inside the existing Preservica folder given by the folder UUID. The simple_asset_package function creates the package, in this case an XIPv6 formatted package and the upload_zip_package method uploads it directly to the Preservica server using the S3 protocol.

>>> from pyPreservica import *

>>> client = UploadAPI()
>>> folder = "dc949259-2c1d-4658-8eee-c17b27a8823d"
>>> zip_p = simple_asset_package(preservation_file="picture.tiff", parent_folder=folder)
>>> client.upload_zip_package(zip_p)

Authentication

pyPreservica provides 4 different methods for authentication. The library requires the username and password of a Preservica user and an optional Tenant identifier along with the server hostname.

Tip

The Tenant parameter is now optional when connecting to a Preservica 6.3 system.

1 Method Arguments

Include the user credentials as arguments to the EntityAPI Class

from pyPreservica import *

client = EntityAPI(username="test@test.com", password="123444",
                   tenant="PREVIEW", server="preview.preservica.com")

If you don’t want to include your Preservica credentials within your python script because you are sharing scripts or using a version control system then one of the following two methods should be used.

2 Environment Variable

Export the credentials as environment variables as part of the session

$ export PRESERVICA_USERNAME="test@test.com"
$ export PRESERVICA_PASSWORD="123444"
$ export PRESERVICA_TENANT="PREVIEW"
$ export PRESERVICA_SERVER="preview.preservica.com"

$ python3

from pyPreservica import *

client = EntityAPI()

3 Properties File

Create a properties file called “credentials.properties” with the following property names and save to the working directory

[credentials]
username=test@test.com
password=123444
tenant=PREVIEW
server=preview.preservica.com

from pyPreservica import *

client = EntityAPI()

You can create a new credentials.properties file automatically using the save_config() method

from pyPreservica import *

client = EntityAPI(username="test@test.com", password="123444",
                      tenant="PREVIEW", server="preview.preservica.com")
client.save_config()

4 Shared Secrets

Warning

The shared secret authentication method is deprecated and may be removed in future releases. It should not be used for new projects. See https://developers.preservica.com/blog/developer-blog-api-updates-in-preservica-8-4

pyPreservica supports authentication using shared secrets rather than a login account username and password. This allows a trusted external applications such as pyPreservica to acquire a Preservica API authentication token without having to use a set of login credentials.

This option is useful if you want to provide limited API access to a 3rd party without providing login access to Preservica.

To use the shared secret authentication you need to add a secure secret key to your Preservica system.

The username, password, tenant and server attributes are used as normal, the password field now holds the shared secret and not the users password.

from pyPreservica import *

client = EntityAPI(username="test@test.com", password="shared-secret", tenant="PREVIEW",
                      server="preview.preservica.com", use_shared_secret=True)

If you are using a credentials.properties file then

from pyPreservica import *

client = EntityAPI(use_shared_secret=True)

2 Factor Authentication

pyPreservica now supports the new 2-Factor authentication for APIs introduced with Preservica 6.8

The Preservica system should be first setup for 2-Factor authentication and the one time password key used to seed the 2FA (HMAC-Based One-Time Password Algorithm) should be retained and used with the API.

The one time password or seed key is available to view and should be saved when setting up the 2FA for a user. You can find the two factor seed key from the user 2FA setup page under the “Reveal Key” button at the bottom of the page.

Keep this key secret along with your account password as it will be required when authenticating the API calls.

To call pyPreservica once 2-Factor authentication process has been setup, you need the username and password as normal along with the additional two factor key.

You can pass the additional two factor key as an argument to the constructor for the API classes or use environment variables or the credentials file.

from pyPreservica import *

client = EntityAPI(username="test@test.com", password="my-login-password", tenant="PREVIEW",
                      server="preview.preservica.com", two_fa_secret_key="AJC5DEGUVM6UQ1TT")

The environment variable for holding the 2 factor seed key is called PRESERVICA_2FA_TOKEN and the credential file property name is twoFactorToken.

$ export PRESERVICA_2FA_TOKEN=AJC5DEGUVM6UQ1TT

i.e

[credentials]
username=test@test.com
password=123444
tenant=PREVIEW
server=preview.preservica.com
twoFactorToken=AJC5DEGUVM6UQ1TT

Tip

Preservica uses time based One Time Passwords (OTP), this means the time on your local machine must match time on the server.

SSL Certificates

pyPreservica will by default connect to servers which use the https:// protocol and will always validate certificates when connected via https.

For Enterprise on Premise customers on secure networks, you can change the default protocol to use http:// via the constructor.

client = EntityAPI(protocol="http")

pyPreservica uses the Certifi project to provide SSL certificate validation.

Self-signed certificates used by on-premise deployments are not part of the Certifi certification authority (CA) bundle and therefore need to be set explicitly.

The CA bundle is a file that contains root and intermediate certificates. The end-entity certificate along with a CA bundle constitutes the certificate chain.

For on-premise deployments the trusted CAs can be specified through the REQUESTS_CA_BUNDLE environment variable. e.g.

$ export REQUESTS_CA_BUNDLE=/usr/local/share/ca-certificates/my-server.cert

Application Logging

You can add logging to your pyPreservica scripts by simply including the following

import logging
from pyPreservica import *

logging.basicConfig(level=logging.DEBUG)

client = EntityAPI()

This will log all messages from level DEBUG or higher to standard output, i.e the console.

When logging to files, the main thing to be wary of is that log files need to be rotated regularly. The application needs to detect the log file being renamed and handle that situation. While Python provides its own file rotation handler, it is best to leave log rotation to dedicated tools such as logrotate. The WatchedFileHandler will keep track of the log file and reopen it if it is rotated, making it work well with logrotate without requiring any specific signals.

Here’s a sample implementation.

import logging
import logging.handlers
import os

from pyPreservica import *

handler = logging.handlers.WatchedFileHandler("pyPreservica.log")
formatter = logging.Formatter(logging.BASIC_FORMAT)
handler.setFormatter(formatter)
root = logging.getLogger()
root.setLevel(logging.DEBUG)
root.addHandler(handler)

client = EntityAPI()

pyPreservica now provides low level event hooks into the underlying API requests to the server. To use this functionality, create a call back function with the following signature call_back(r, *args, **kwargs)

The first argument is the request object which can be queried.

This API allows clients to do things such as audit all the API endpoints which are called for example.

def print_url(r, *args, **kwargs):
    print(r.url)


client = EntityAPI(request_hook=print_url)

for f in client.descendants():
    pass

https://us.preservica.com/api/accesstoken/login
https://us.preservica.com/api/entity/versiondetails/version
https://us.preservica.com/api/user/details
https://us.preservica.com/api/entity/root/children?start=0&max=100