# External Storage - Python SDK

> Offload large payloads to external storage using the claim check pattern in the Python SDK.

> **ℹ️ Info:**
> Release, stability, and dependency info
>
> External Storage is in [Public Preview](/evaluate/development-production-features/release-stages#public-preview). APIs
> and configuration may change before General Availability. Join the
> [#large-payloads Slack channel](https://temporalio.slack.com/archives/C09VA2DE15Y) to provide feedback or ask for help.
>

The Temporal Service enforces a 2 MB per-payload limit by default. This limit is configurable on self-hosted
deployments. When your Workflows or Activities handle data larger than the limit, you can offload payloads to external
storage, such as Amazon S3, and pass a small reference token through the Event History instead. This page shows you how
to set up External Storage with Amazon S3 and how to implement a custom storage driver.

For a conceptual overview of External Storage and its use cases, see [External Storage](/external-storage).

## Store and retrieve large payloads with Amazon S3

The Python SDK includes an S3 storage driver. Follow these steps to set it up:

### Prerequisites

- An Amazon S3 bucket that you have read and write access to. Refer to
  [lifecycle management](/external-storage#lifecycle) to ensure that your payloads remain available for the entire
  lifetime of the Workflow. For multi-region durability, see
  [Durable External Storage](/external-storage#durable-external-storage).
- Install the `aioboto3` extra: `python -m pip install "temporalio[aioboto3]"`

### Procedure

1. Create an S3 client using `aioboto3` and pass it to the `S3StorageDriver`. The driver uses your standard AWS
   credentials from the environment (environment variables, IAM role, or AWS config file):

      <!--SNIPSTART python-s3-driver-create-->
[features/snippets/external_storage/s3_setup/s3_driver_create.py](https://github.com/temporalio/features/blob/main/features/snippets/external_storage/s3_setup/s3_driver_create.py)
```py
session = aioboto3.Session(profile_name=AWS_PROFILE, region_name=AWS_REGION)
async with session.client("s3") as s3_client:
    driver = S3StorageDriver(
        client=new_aioboto3_client(s3_client),
        bucket="my-temporal-payloads",
    )
```
   <!--SNIPEND-->

2. Configure the driver on your `DataConverter` and pass the converter to your Client and Worker:

      <!--SNIPSTART python-s3-external-storage-setup-->
[features/snippets/external_storage/s3_setup/s3_external_storage_setup.py](https://github.com/temporalio/features/blob/main/features/snippets/external_storage/s3_setup/s3_external_storage_setup.py)
```py
data_converter = dataclasses.replace(
    DataConverter.default,
    external_storage=ExternalStorage(drivers=[driver]),
)

client_config = ClientConfig.load_client_connect_config()

client = await Client.connect(**client_config, data_converter=data_converter)

worker = Worker(
    client,
    task_queue="my-task-queue",
    workflows=[],
    activities=[],
)
```
   <!--SNIPEND-->

By default, payloads larger than 256 KiB are offloaded to external storage. You can adjust this with the
`payload_size_threshold` parameter, even setting it to 0 to externalize all payloads regardless of size. Refer to
[Configure payload size threshold](#configure-payload-size-threshold) for more information.

All Workflows and Activities running on the Worker use the storage driver automatically without changes to your business
logic. The driver uploads and downloads payloads concurrently and validates payload integrity on retrieve.

The S3 driver includes diagnostic metadata, such as the AWS region, in error messages to help troubleshoot storage
failures. For a complete working example that includes a Worker, Codec Server, and S3 driver, see the
[External Storage sample](https://github.com/temporalio/samples-python/tree/main/external_storage).

## Implement a custom storage driver

If you need a storage backend other than what the built-in drivers allow, you can implement your own storage driver.
Refer to [Choose a storage system](/external-storage#choose-storage) for guidance on selecting a backing store and [Lifecycle management](/external-storage#lifecycle) for retention requirements.

The following example shows a custom driver that uses local disk as the backing store. This example is for local
development and testing only. In production, use a durable storage system that is accessible to all Workers.
For example, see the [Redis storage driver sample](https://github.com/temporalio/samples-python/tree/main/external_storage_redis).

<!--SNIPSTART python-custom-storage-driver-->
[features/snippets/external_storage/custom_driver/custom_storage_driver.py](https://github.com/temporalio/features/blob/main/features/snippets/external_storage/custom_driver/custom_storage_driver.py)
```py
class LocalDiskStorageDriver(StorageDriver):
    def __init__(self, store_dir: str = "/tmp/temporal-payload-store") -> None:
        self._store_dir = store_dir

    def name(self) -> str:
        return "local-disk"

    async def store(
        self,
        context: StorageDriverStoreContext,
        payloads: Sequence[Payload],
    ) -> list[StorageDriverClaim]:
        os.makedirs(self._store_dir, exist_ok=True)

        prefix = self._store_dir
        target = context.target
        if isinstance(target, StorageDriverWorkflowInfo) and target.id:
            prefix = os.path.join(self._store_dir, target.namespace, target.id)
            os.makedirs(prefix, exist_ok=True)

        claims = []
        for payload in payloads:
            key = f"{uuid.uuid4()}.bin"
            file_path = os.path.join(prefix, key)
            with open(file_path, "wb") as f:
                f.write(payload.SerializeToString())
            claims.append(StorageDriverClaim(claim_data={"path": file_path}))
        return claims

    async def retrieve(
        self,
        context: StorageDriverRetrieveContext,
        claims: Sequence[StorageDriverClaim],
    ) -> list[Payload]:
        payloads = []
        for claim in claims:
            file_path = claim.claim_data["path"]
            with open(file_path, "rb") as f:
                raw = f.read()
            payload = Payload()
            payload.ParseFromString(raw)
            payloads.append(payload)
        return payloads

```
<!--SNIPEND-->

The following sections walk through the key parts of the driver implementation.

### 1. Extend the StorageDriver class

A custom driver extends the `StorageDriver` abstract class and implements three methods:

- `name()` returns a unique string that identifies the driver. The SDK stores this name in the claim check reference so
  it can route retrieval requests to the correct driver. Changing the name after payloads have been stored breaks
  retrieval.
- `store()` receives a list of payloads and returns one `StorageDriverClaim` per payload. A claim is a set of string
  key-value pairs that the driver uses to locate the payload later.
- `retrieve()` receives the claims that `store()` produced and returns the original payloads.

### 2. Store payloads

In `store()`, convert each Payload protobuf message to bytes with `payload.SerializeToString()` and write the bytes to
your storage system. The application data has already been serialized by the
[Payload Converter](/develop/python/data-handling/data-conversion) and
[Payload Codec](/develop/python/data-handling/data-encryption) before it reaches the driver. See the
[data conversion pipeline](/external-storage#data-pipeline) for more details.

Return a `StorageDriverClaim` for each payload with enough information to retrieve it later. The `context.target`
provides identity information (namespace, Workflow ID, or Activity ID) depending on the operation. Consider structuring
your storage keys to include this information so that you can identify which Workflow owns each payload. Within that
scope, content-addressable keys (such as a SHA-256 hash of the payload bytes) can help deduplicate identical payloads.

### 3. Retrieve payloads

In `retrieve()`, download the bytes using the claim data, then reconstruct the Payload protobuf message with
`payload.ParseFromString(data)`. The Payload Converter handles deserializing the application data after the driver
returns the payload.

### 4. Configure the Data Converter

Pass an `ExternalStorage` instance to your `DataConverter` and use the converter when creating your Client and Worker.
You can also package your driver as a [plugin](/develop/plugins-guide) for easier reuse across services:

<!--SNIPSTART python-custom-driver-data-converter-->
[features/snippets/external_storage/custom_driver/custom_driver_data_converter.py](https://github.com/temporalio/features/blob/main/features/snippets/external_storage/custom_driver/custom_driver_data_converter.py)
```py
data_converter = dataclasses.replace(
    DataConverter.default,
    external_storage=ExternalStorage(
        drivers=[LocalDiskStorageDriver()],
    ),
)
```
<!--SNIPEND-->

## Configure payload size threshold

You can configure the payload size threshold that triggers external storage. By default, payloads larger than 256 KiB
are offloaded to external storage. You can adjust this with the `payload_size_threshold` parameter, or set it to 0 to
externalize all payloads regardless of size.

<!--SNIPSTART python-external-storage-threshold-->
[features/snippets/external_storage/threshold/threshold_config.py](https://github.com/temporalio/features/blob/main/features/snippets/external_storage/threshold/threshold_config.py)
```py
data_converter = dataclasses.replace(
    DataConverter.default,
    external_storage=ExternalStorage(
        drivers=[driver],
        payload_size_threshold=0,
    ),
)
```
<!--SNIPEND-->

## Use multiple storage drivers

When you register multiple drivers, you must provide a `driver_selector` function that chooses which driver stores each
payload. Any driver in the list that is not selected for storing is still available for retrieval, which is useful when
migrating between storage backends. Return `None` from the selector to keep a specific payload inline in Event History.

Multiple drivers are useful in scenarios such as:

- Driver migration. Your Worker needs to retrieve payloads created by clients that use a different driver than the one
  you prefer. Register both drivers and use the selector to always pick your preferred driver for new payloads. The old
  driver remains available for retrieving existing claims.
- Multi-cloud storage. Route payloads to different storage backends based on your cloud environment. For example, use S3
  for Workers running on AWS and GCS for Workers running on Google Cloud. The selector chooses the appropriate driver
  based on the runtime environment.

The following example registers two drivers but always selects `preferred_driver` for new payloads. The `legacy_driver`
is only registered so the Worker can retrieve payloads that were previously stored with it:

<!--SNIPSTART python-external-storage-multiple-drivers-->
[features/snippets/external_storage/multiple_drivers/multiple_drivers.py](https://github.com/temporalio/features/blob/main/features/snippets/external_storage/multiple_drivers/multiple_drivers.py)
```py
preferred_driver = S3StorageDriver(client=s3_client, bucket="my-bucket")
legacy_driver = LegacyStorageDriver()

ExternalStorage(
    drivers=[preferred_driver, legacy_driver],
    driver_selector=lambda context, payload: preferred_driver,
)
```
<!--SNIPEND-->

## Multi-region durability

To make your S3-backed External Storage tolerant of regional failures, configure the AWS side with
[Cross-Region Replication (CRR)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html) and an
[S3 Multi-Region Access Point (MRAP)](https://aws.amazon.com/s3/features/multi-region-access-points/), then point the
driver at the MRAP ARN instead of a bucket name. See
[Durable External Storage](/external-storage#durable-external-storage) for the full pattern and trade-offs.

In code, the only change is the value you pass as `bucket`:

```py
session = aioboto3.Session(profile_name=AWS_PROFILE, region_name=AWS_REGION)
async with session.client("s3") as s3_client:
    driver = S3StorageDriver(
        client=new_aioboto3_client(s3_client),
        bucket="arn:aws:s3::123456789012:accesspoint/mfzwi23gnjvgw.mrap",
    )
```

`aioboto3` (via `botocore`) automatically uses [SigV4A](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_sigv-create-signed-request.html)
signing when the bucket value is an MRAP ARN. Make sure your `botocore` version is recent enough to support SigV4A.
