Content API

pyPreservica now contains interfaces to the content API which supports searching the repository.

https://us.preservica.com/api/content/documentation.html

The content API is a readonly interface which returns json documents rather than XML and which has some duplication with the entity API, but it does contain search capabilities.

The content API client is created using

from pyPreservica import *

client = ContentAPI()

object-details

Get the details for a Asset or Folder as a Python dictionary object containing CMIS attributes

client = ContentAPI()

client.object_details("IO", "uuid")
client.object_details("SO", "uuid")

e.g.

from pyPreservica import *

client = ContentAPI()

details = client.object_details("IO", "de1c32a3-bd9f-4843-a5f1-46df080f83d2")
print(details['name'])

or

from pyPreservica import *

client = ContentAPI()

details = client.object_details(EntityType.ASSET, "de1c32a3-bd9f-4843-a5f1-46df080f83d2")
print(details['name'])

indexed-fields

Get a list of all the indexed metadata fields within the Preservica search engine. This includes the default xip.* fields and any custom indexes which have been created through custom index files.

client = ContentAPI()

client.indexed_fields():

Search Progress

Searching across a large Preservica repository is very quick, but returning very large datasets back to the client can be slow. To avoid putting undue load on the server pyPreservica will request a single page of results at a time for each server request.

If you are using the `simple_search_csv` or `search_index_filter_csv` functions which write directly to a csv file then it can be difficult to monitor the report generation progress.

To allow allow monitoring of search result downloads, you can add a callback to the search client. The callback class will be called for every page of search results returned to the client. The value passed to the callback contains the total number of search hits for the query and the current number of results processed.

Preservica provides a default callback

class ReportProgressCallBack:
    def __init__(self):
        self.current = 0
        self.total = 0
        self._lock = threading.Lock()

    def __call__(self, value):
        with self._lock:
            values = value.split(":")
            self.total = int(values[1])
            self.current = int(values[0])
            percentage = (self.current / self.total) * 100
            sys.stdout.write("\r%s / %s  (%.2f%%)" % (self.current, self.total, percentage))
            sys.stdout.flush()

To use the default callback in your scripts include the following line

client.search_callback(client.ReportProgressCallBack())

Reporting Examples

Create a spreadsheet containing all Assets within the repository

Generate a CSV report on all assets within the system, spreadsheet columns include asset title, description, security tag etc

from pyPreservica import *

client = ContentAPI()


if __name__ == '__main__':
    metadata_fields = {
        "xip.reference": "*", "xip.title": "",  "xip.description": "", "xip.document_type": "IO",  "xip.parent_ref": "",
        "xip.security_descriptor": "*",
        "xip.identifier": "", "xip.bitstream_names_r_Preservation": ""}

    client.search_callback(client.ReportProgressCallBack())

    client.search_index_filter_csv("%", "assets.csv", metadata_fields)

Create a spreadsheet containing all Assets and Folders within the repository

from pyPreservica import *

client = ContentAPI()

if __name__ == '__main__':
    metadata_fields = {
        "xip.reference": "*", "xip.title": "",  "xip.description": "", "xip.document_type": "*",  "xip.parent_ref": "",
        "xip.security_descriptor": "*",
        "xip.identifier": "", "xip.bitstream_names_r_Preservation": ""}

    client.search_callback(client.ReportProgressCallBack())

    client.search_index_filter_csv("%", "all_objects.csv", metadata_fields)

Create a spreadsheet containing all Assets and Folders underneath a specific folder

from pyPreservica import *

content = ContentAPI()
entity = EntityAPI()

folder = entity.folder(sys.argv[1])

print(f"Searching inside folder {folder.title}")

if __name__ == '__main__':
    metadata_fields = {
        "xip.reference": "*", "xip.title": "", "xip.description": "", "xip.document_type": "*", "xip.parent_hierarchy": f"{folder.reference}",
        "xip.security_descriptor": "*",
        "xip.identifier": "", "xip.bitstream_names_r_Preservation": ""}


    content.search_callback(content.ReportProgressCallBack())

    content.search_index_filter_csv("%", "assets.csv", metadata_fields)

User Security Tags

You can get a list of available security tags for the current user by calling:

client = ContentAPI()

client.user_security_tags()