## Autoscaling apps

Flyte apps support autoscaling, allowing them to scale up and down based on traffic. This helps optimize costs by scaling down when there's no traffic and scaling up when needed.

### Scaling configuration

The `scaling` parameter uses a `[[Scaling]]` object to configure autoscaling behavior:

```python
scaling=flyte.app.Scaling(
    replicas=(min_replicas, max_replicas),
    scaledown_after=idle_ttl_seconds,
)
```

#### Parameters

- **`replicas`**: A tuple `(min_replicas, max_replicas)` specifying the minimum and maximum number of replicas.
- **`scaledown_after`**: Time in seconds to wait before scaling down when idle (idle TTL).

### Basic scaling example

Here's a simple example with scaling from 0 to 1 replica:

```
# /// script
# requires-python = "==3.13"
# dependencies = [
#    "flyte>=2.0.0b52",
# ]
# ///

import flyte
import flyte.app

# {{docs-fragment basic-scaling}}
# Basic example: scale from 0 to 1 replica
app_env = flyte.app.AppEnvironment(
    name="autoscaling-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Scale from 0 to 1 replica
        scaledown_after=300,  # Scale down after 5 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment basic-scaling}}

# {{docs-fragment always-on}}
# Always-on app
app_env2 = flyte.app.AppEnvironment(
    name="always-on-api",
    scaling=flyte.app.Scaling(
        replicas=(1, 1),  # Always keep 1 replica running
        # scaledown_after is ignored when min_replicas > 0
    ),
    # ...
)
# {{/docs-fragment always-on}}

# {{docs-fragment scale-to-zero}}
# Scale-to-zero app
app_env3 = flyte.app.AppEnvironment(
    name="scale-to-zero-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Can scale down to 0
        scaledown_after=600,  # Scale down after 10 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment scale-to-zero}}

# {{docs-fragment high-availability}}
# High-availability app
app_env4 = flyte.app.AppEnvironment(
    name="ha-api",
    scaling=flyte.app.Scaling(
        replicas=(2, 5),  # Keep at least 2, scale up to 5
        scaledown_after=300,  # Scale down after 5 minutes
    ),
    # ...
)
# {{/docs-fragment high-availability}}

# {{docs-fragment burstable}}
# Burstable app
app_env5 = flyte.app.AppEnvironment(
    name="bursty-app",
    scaling=flyte.app.Scaling(
        replicas=(1, 10),  # Start with 1, scale up to 10 under load
        scaledown_after=180,  # Scale down quickly after 3 minutes
    ),
    # ...
)
# {{/docs-fragment burstable}}
```

*Source: https://github.com/unionai/unionai-examples/blob/main/v2/user-guide/configure-apps/autoscaling-examples.py*

This configuration:

- Starts with 0 replicas (no running instances)
- Scales up to 1 replica when there's traffic
- Scales back down to 0 after 5 minutes (300 seconds) of no traffic

### Scaling patterns

#### Always-on app

For apps that need to always be running:

```
# /// script
# requires-python = "==3.13"
# dependencies = [
#    "flyte>=2.0.0b52",
# ]
# ///

import flyte
import flyte.app

# {{docs-fragment basic-scaling}}
# Basic example: scale from 0 to 1 replica
app_env = flyte.app.AppEnvironment(
    name="autoscaling-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Scale from 0 to 1 replica
        scaledown_after=300,  # Scale down after 5 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment basic-scaling}}

# {{docs-fragment always-on}}
# Always-on app
app_env2 = flyte.app.AppEnvironment(
    name="always-on-api",
    scaling=flyte.app.Scaling(
        replicas=(1, 1),  # Always keep 1 replica running
        # scaledown_after is ignored when min_replicas > 0
    ),
    # ...
)
# {{/docs-fragment always-on}}

# {{docs-fragment scale-to-zero}}
# Scale-to-zero app
app_env3 = flyte.app.AppEnvironment(
    name="scale-to-zero-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Can scale down to 0
        scaledown_after=600,  # Scale down after 10 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment scale-to-zero}}

# {{docs-fragment high-availability}}
# High-availability app
app_env4 = flyte.app.AppEnvironment(
    name="ha-api",
    scaling=flyte.app.Scaling(
        replicas=(2, 5),  # Keep at least 2, scale up to 5
        scaledown_after=300,  # Scale down after 5 minutes
    ),
    # ...
)
# {{/docs-fragment high-availability}}

# {{docs-fragment burstable}}
# Burstable app
app_env5 = flyte.app.AppEnvironment(
    name="bursty-app",
    scaling=flyte.app.Scaling(
        replicas=(1, 10),  # Start with 1, scale up to 10 under load
        scaledown_after=180,  # Scale down quickly after 3 minutes
    ),
    # ...
)
# {{/docs-fragment burstable}}
```

*Source: https://github.com/unionai/unionai-examples/blob/main/v2/user-guide/configure-apps/autoscaling-examples.py*

#### Scale-to-zero app

For apps that can scale to zero when idle:

```
# /// script
# requires-python = "==3.13"
# dependencies = [
#    "flyte>=2.0.0b52",
# ]
# ///

import flyte
import flyte.app

# {{docs-fragment basic-scaling}}
# Basic example: scale from 0 to 1 replica
app_env = flyte.app.AppEnvironment(
    name="autoscaling-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Scale from 0 to 1 replica
        scaledown_after=300,  # Scale down after 5 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment basic-scaling}}

# {{docs-fragment always-on}}
# Always-on app
app_env2 = flyte.app.AppEnvironment(
    name="always-on-api",
    scaling=flyte.app.Scaling(
        replicas=(1, 1),  # Always keep 1 replica running
        # scaledown_after is ignored when min_replicas > 0
    ),
    # ...
)
# {{/docs-fragment always-on}}

# {{docs-fragment scale-to-zero}}
# Scale-to-zero app
app_env3 = flyte.app.AppEnvironment(
    name="scale-to-zero-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Can scale down to 0
        scaledown_after=600,  # Scale down after 10 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment scale-to-zero}}

# {{docs-fragment high-availability}}
# High-availability app
app_env4 = flyte.app.AppEnvironment(
    name="ha-api",
    scaling=flyte.app.Scaling(
        replicas=(2, 5),  # Keep at least 2, scale up to 5
        scaledown_after=300,  # Scale down after 5 minutes
    ),
    # ...
)
# {{/docs-fragment high-availability}}

# {{docs-fragment burstable}}
# Burstable app
app_env5 = flyte.app.AppEnvironment(
    name="bursty-app",
    scaling=flyte.app.Scaling(
        replicas=(1, 10),  # Start with 1, scale up to 10 under load
        scaledown_after=180,  # Scale down quickly after 3 minutes
    ),
    # ...
)
# {{/docs-fragment burstable}}
```

*Source: https://github.com/unionai/unionai-examples/blob/main/v2/user-guide/configure-apps/autoscaling-examples.py*

#### High-availability app

For apps that need multiple replicas for availability:

```
# /// script
# requires-python = "==3.13"
# dependencies = [
#    "flyte>=2.0.0b52",
# ]
# ///

import flyte
import flyte.app

# {{docs-fragment basic-scaling}}
# Basic example: scale from 0 to 1 replica
app_env = flyte.app.AppEnvironment(
    name="autoscaling-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Scale from 0 to 1 replica
        scaledown_after=300,  # Scale down after 5 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment basic-scaling}}

# {{docs-fragment always-on}}
# Always-on app
app_env2 = flyte.app.AppEnvironment(
    name="always-on-api",
    scaling=flyte.app.Scaling(
        replicas=(1, 1),  # Always keep 1 replica running
        # scaledown_after is ignored when min_replicas > 0
    ),
    # ...
)
# {{/docs-fragment always-on}}

# {{docs-fragment scale-to-zero}}
# Scale-to-zero app
app_env3 = flyte.app.AppEnvironment(
    name="scale-to-zero-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Can scale down to 0
        scaledown_after=600,  # Scale down after 10 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment scale-to-zero}}

# {{docs-fragment high-availability}}
# High-availability app
app_env4 = flyte.app.AppEnvironment(
    name="ha-api",
    scaling=flyte.app.Scaling(
        replicas=(2, 5),  # Keep at least 2, scale up to 5
        scaledown_after=300,  # Scale down after 5 minutes
    ),
    # ...
)
# {{/docs-fragment high-availability}}

# {{docs-fragment burstable}}
# Burstable app
app_env5 = flyte.app.AppEnvironment(
    name="bursty-app",
    scaling=flyte.app.Scaling(
        replicas=(1, 10),  # Start with 1, scale up to 10 under load
        scaledown_after=180,  # Scale down quickly after 3 minutes
    ),
    # ...
)
# {{/docs-fragment burstable}}
```

*Source: https://github.com/unionai/unionai-examples/blob/main/v2/user-guide/configure-apps/autoscaling-examples.py*

#### Burstable app

For apps with variable load:

```
# /// script
# requires-python = "==3.13"
# dependencies = [
#    "flyte>=2.0.0b52",
# ]
# ///

import flyte
import flyte.app

# {{docs-fragment basic-scaling}}
# Basic example: scale from 0 to 1 replica
app_env = flyte.app.AppEnvironment(
    name="autoscaling-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Scale from 0 to 1 replica
        scaledown_after=300,  # Scale down after 5 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment basic-scaling}}

# {{docs-fragment always-on}}
# Always-on app
app_env2 = flyte.app.AppEnvironment(
    name="always-on-api",
    scaling=flyte.app.Scaling(
        replicas=(1, 1),  # Always keep 1 replica running
        # scaledown_after is ignored when min_replicas > 0
    ),
    # ...
)
# {{/docs-fragment always-on}}

# {{docs-fragment scale-to-zero}}
# Scale-to-zero app
app_env3 = flyte.app.AppEnvironment(
    name="scale-to-zero-app",
    scaling=flyte.app.Scaling(
        replicas=(0, 1),  # Can scale down to 0
        scaledown_after=600,  # Scale down after 10 minutes of inactivity
    ),
    # ...
)
# {{/docs-fragment scale-to-zero}}

# {{docs-fragment high-availability}}
# High-availability app
app_env4 = flyte.app.AppEnvironment(
    name="ha-api",
    scaling=flyte.app.Scaling(
        replicas=(2, 5),  # Keep at least 2, scale up to 5
        scaledown_after=300,  # Scale down after 5 minutes
    ),
    # ...
)
# {{/docs-fragment high-availability}}

# {{docs-fragment burstable}}
# Burstable app
app_env5 = flyte.app.AppEnvironment(
    name="bursty-app",
    scaling=flyte.app.Scaling(
        replicas=(1, 10),  # Start with 1, scale up to 10 under load
        scaledown_after=180,  # Scale down quickly after 3 minutes
    ),
    # ...
)
# {{/docs-fragment burstable}}
```

*Source: https://github.com/unionai/unionai-examples/blob/main/v2/user-guide/configure-apps/autoscaling-examples.py*

### Idle TTL (Time To Live)

The `scaledown_after` parameter (idle TTL) determines how long an app instance can be idle before it's scaled down. 

#### Considerations

- **Too short**: May cause frequent scale up/down cycles, leading to cold starts.
- **Too long**: Keeps resources running unnecessarily, increasing costs.
- **Optimal**: Balance between cost and user experience.

#### Common idle TTL values

- **Development/Testing**: 60-180 seconds (1-3 minutes) - quick scale down for cost savings.
- **Production APIs**: 300-600 seconds (5-10 minutes) - balance cost and responsiveness.
- **Batch processing**: 900-1800 seconds (15-30 minutes) - longer to handle bursts.
- **Always-on**: Set `min_replicas > 0` - never scale down.

### Autoscaling best practices

1. **Start conservative**: Begin with longer idle TTL values and adjust based on usage.
2. **Monitor cold starts**: Track how long it takes for your app to become ready after scaling up.
3. **Consider costs**: Balance idle TTL between cost savings and user experience.
4. **Use appropriate min replicas**: Set `min_replicas > 0` for critical apps that need to be always available.
5. **Test scaling behavior**: Verify your app handles scale up/down correctly (for example, state management and connections).

### Autoscaling limitations

- Scaling is based on traffic/request patterns, not CPU/memory utilization.
- Cold starts may occur when scaling from zero.
- Stateful apps need careful design to handle scaling (use external state stores).
- Maximum replicas are limited by your cluster capacity.

### Autoscaling troubleshooting

**App scales down too quickly:**

- Increase `scaledown_after` value.
- Set `min_replicas > 0` if the app needs to stay warm.

**App doesn't scale up fast enough:**

- Ensure your cluster has capacity.
- Check if there are resource constraints.

**Cold starts are too slow:**

- Pre-warm with `min_replicas = 1`.
- Optimize app startup time.
- Consider using faster storage for model loading.

---
**Source**: https://github.com/unionai/unionai-docs/blob/main/content/user-guide/configure-apps/auto-scaling-apps.md
**HTML**: https://www.union.ai/docs/v2/union/user-guide/configure-apps/auto-scaling-apps/
