# Serverless Workers

> **Pre-release**
> To request access during Pre-release, create a [support ticket](/cloud/support#support-ticket) or contact your account team. APIs are experimental and may be subject to backwards-incompatible changes. [Sign up for updates](https://temporal.io/pages/serverless-workers-updates) to be notified when Serverless Workers reach Public Preview.

This page covers the following:

- [What is a Serverless Worker?](#serverless-worker)
- [How Serverless invocation works](#how-invocation-works)
- [Autoscaling](#autoscaling)
- [Scaling with long-lived Workers](#scaling-with-long-lived-workers)
- [Worker lifecycle](#worker-lifecycle)
- [Failure handling](#failure-handling)
- [Constraints](#constraints)
- [Compute providers](#compute-providers)

## What is a Serverless Worker? 

A Serverless Worker is a Temporal Worker that runs on serverless compute instead of a long-lived process. There is no
always-on infrastructure to provision or scale. Temporal invokes the Worker when Tasks arrive on a Task Queue, and the
Worker shuts down when the work is done.

A Serverless Worker uses the same Temporal SDKs as a traditional long-lived Worker. It registers Workflows and
Activities the same way. The difference is in the lifecycle: instead of the Worker starting and polling continuously,
Temporal invokes the Serverless Worker on demand, the Worker starts, processes available Tasks, and then shuts down.

Serverless Workers require [Worker Versioning](/worker-versioning). Each Serverless Worker must be associated with a
[Worker Deployment Version](/worker-versioning#deployment-versions) that has a compute provider configured.

To deploy a Serverless Worker, see
[Deploy a Serverless Worker](/production-deployment/worker-deployments/serverless-workers).

## How Serverless invocation works 

With long-lived Workers, you start the Worker process, which connects to Temporal and polls a Task Queue for work.
Temporal does not need to know anything about the Worker's infrastructure.

With Serverless Workers, Temporal starts the Worker.

### Worker Controller Instance 

The Worker Controller Instance (WCI) is a system Workflow that scales Serverless Workers based on Task Queue conditions.
One WCI Workflow runs per Worker Deployment Version that has a compute provider configured. The WCI runs in the same
Namespace as your Worker Deployment.

The WCI responds to two triggers: [sync match failures](#sync-match-failure) and
[Task Queue backlog](#task-queue-backlog). When either trigger fires, the WCI produces a scaling action, such as
invoking the configured compute provider (for example, calling AWS Lambda's `InvokeFunction` API) to start new Workers.
For details on how scaling works, see [Autoscaling](#autoscaling).

You can list WCI Workflows in your Namespace:

```bash
temporal workflow list \
  --namespace <NAMESPACE> \
  --query 'TemporalNamespaceDivision = "TemporalWorkerControllerInstance"'
```

WCI Workflow IDs follow the pattern `temporal-sys-worker-controller-instance:<deployment-name>:<build-id>`. You can
inspect a WCI Workflow's history to see its recent Activity results:

```bash
temporal workflow show \
  --namespace <NAMESPACE> \
  --workflow-id 'temporal-sys-worker-controller-instance:<DEPLOYMENT_NAME>:<BUILD_ID>'
```

The following diagram illustrates the invocation flow of a Serverless Worker.

![Serverless invocation flow](/diagrams/serverless-worker-flow.svg)

The invocation flow works as follows:

1. A Task is submitted (for example, `StartWorkflow` or `ScheduleActivity`).
2. The [Matching Service](/temporal-service/temporal-server#matching-service) attempts to route the Task directly to an
   available Worker (a sync match).
3. If a Worker is available, the Task is routed to that Worker.
4. If no Worker is available (sync match fails), the Matching Service pushes a signal to the WCI, and the WCI invokes
   the configured compute provider.
5. The Serverless Worker starts, creates a Temporal Client, and begins polling the Task Queue.
6. The Worker processes available Tasks until it exits (see [Worker lifecycle](#worker-lifecycle)).

Each invocation is independent. The Worker creates a fresh client connection on every invocation. There is no connection
reuse or shared state across invocations.

## Autoscaling 

The [WCI](#worker-controller-instance) automatically scales Serverless Workers based on Task Queue signals. When Tasks
arrive and no Worker is available, the WCI invokes new Workers. When the Tasks are done, Workers exit and scale to zero.

The WCI uses two signals to decide when to invoke new Workers:

### Sync match failure 

When a Task is submitted, the [Matching Service](/temporal-service/temporal-server#matching-service) attempts to route
it directly to an available Worker. If no Worker is available, the sync match fails, and the Matching Service pushes a
signal to the WCI. The WCI then invokes a new Worker. This is the primary scaling path. Because the Matching Service
pushes match failures to the WCI as they happen rather than the WCI polling on a timer, latency stays low and scaling is
responsive.

### Task Queue backlog 

The WCI monitors Task Queue metadata to determine whether pending Tasks exist without enough Workers to process them. If
there are Tasks on the queue and not enough Workers, the WCI invokes additional Workers.

## Scaling with long-lived Workers 

Serverless Workers can share a Task Queue with long-lived Workers. Because Serverless Workers are only invoked on
[sync match failure](#sync-match-failure), Serverless Workers only pick up Tasks that no long-lived Worker was available
to handle. In practice, the Serverless Workers act as spillover capacity for the long-lived fleet.

> **⚠️ Caution:**
>
> If you configure Serverless and long-lived Workers on the same Task Queue, do not enable dynamic scaling on the
> long-lived Workers. The two groups cannot coordinate their scaling behavior. If both scale dynamically, the long-lived
> Workers may scale up to handle the same Tasks that Temporal is simultaneously invoking Serverless Workers for, leading
> to unnecessary invocations and unpredictable scaling.
>

## Worker lifecycle 

A single Serverless Worker invocation has three phases: init, work, and shutdown.

![Serverless Worker lifecycle](/diagrams/serverless-worker-lifecycle.svg)

During the **init** phase, the Worker initializes and establishes a client connection to Temporal.

During the **work** phase, the Worker polls the Task Queue and processes Tasks.

During the **shutdown** phase, the Worker stops polling, waits for in-flight Tasks to finish, and runs any shutdown
hooks (for example, OpenTelemetry telemetry flushes). Shutdown begins before the invocation deadline so the Worker can
exit cleanly before the compute provider forcibly terminates the execution environment.

### Tuning for long-running Activities

If your Worker handles long-running Activities, set these three values together:

- **Worker stop timeout > longest Activity runtime.** Gives in-flight Activities enough time to finish after polling
  stops.
- **Shutdown deadline buffer > Worker stop timeout + shutdown hook time.** Ensures the drain and any shutdown hooks
  complete before the compute provider terminates the environment.
- **Invocation deadline > longest Activity runtime + shutdown deadline buffer.** Set on the compute provider to give
  each invocation enough total runtime.

> **💡 Tip:**
>
>   If your longest-running Activity runs longer than half the maximum invocation deadline, this constraint may be
>   difficult or impossible to meet. In this case, use
>   [Activity Heartbeats](/encyclopedia/detecting-activity-failures#activity-heartbeat) to record the state of the
>   Activity execution so that the next retry can pick up where it left off.
>

For example, if your longest Activity runtime is 5 minutes, and your shutdown hooks take 3 seconds to run, set the
Worker stop timeout to more than 5 minutes, and the shutdown deadline buffer to more than 303 seconds (5 minutes + 3
seconds). Set your invocation deadline to at least 10 minutes and 3 seconds (5 minutes + 303 seconds).

The Worker stop timeout controls how long the Worker waits for in-flight Tasks to finish after it stops polling. The
shutdown deadline buffer controls how much time before the invocation deadline the Worker stops polling for Tasks.

Raising only the shutdown deadline buffer makes the Worker stop polling earlier, but does not give in-flight Tasks any
more time to complete.

Raising only the Worker stop timeout does not make the Worker stop polling earlier, which means the compute provider
might terminate the Worker before the full stop timeout completes. In-flight Activities then do not get the full stop
timeout to finish, and the shutdown hooks may not run.

## Failure handling 

Serverless Workers rely on Temporal's standard retry and timeout semantics to recover from failures. The following
sections describe common failure scenarios and how they are handled.

### Worker crash 

If a Worker invocation crashes (out of memory, unhandled exception, etc.), the behavior follows standard Temporal retry
semantics:

- The Activity Timeout fires after the configured duration.
- Temporal retries the Activity on a different Worker invocation.
- No manual intervention is required.

### Provider concurrency limit 

If the compute provider's concurrency limit is reached (for example, AWS Lambda account concurrency):

- Further invocations from the WCI fail.
- Tasks remain in the Task Queue backlog. No data loss occurs.
- Processing slows until concurrency frees up.

### Resource exhaustion across Activity slots 

By default, a single Worker invocation may run multiple Activity slots. A crash or resource exhaustion in one Activity
(for example, out-of-memory from a memory-intensive operation) can affect other Activities running in the same
invocation.

To isolate Activities from each other:

- Split Workflow and Activity Workers into separate compute functions.
- Set Activity slots to 1 per invocation.

With single-slot configuration, each Activity gets a dedicated execution environment.

## Constraints 

| Constraint        | Detail                                                                                                                                                             |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Activity duration | Must complete within the compute provider's invocation limit (minus shutdown deadline buffer). For AWS Lambda, the maximum is 15 minutes.                          |
| Workflow duration | No limit. Workflows of any duration work, regardless of the invocation timeout. A Workflow runs across as many invocations as needed.                              |
| Worker code       | Same Temporal SDK Worker code, using the serverless Worker package for your SDK.                                                                                   |
| Versioning        | [Worker Versioning](/worker-versioning) is required. Each Workflow must have an `AutoUpgrade` or `Pinned` behavior, set per-Workflow or as a Worker-level default. |

## Compute providers 

A compute provider is the configuration that tells Temporal how to invoke a Serverless Worker. The compute provider is
set on a [Worker Deployment Version](/worker-versioning#deployment-versions) and specifies the provider type, the
invocation target, and the credentials Temporal needs to trigger the invocation.

For example, an AWS Lambda compute provider includes the Lambda function ARN and the IAM role that Temporal assumes to
invoke the function.

Compute providers are only needed for Serverless Workers. Traditional long-lived Workers do not require a compute
provider because the Worker process manages its own lifecycle.

### Supported providers

| Provider   | Description                                                                   |
| ---------- | ----------------------------------------------------------------------------- |
| AWS Lambda | Temporal assumes an IAM role in your AWS account to invoke a Lambda function. |
