# Scale your runs

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](https://www.union.ai/docs/v2/union/user-guide/run-scaling/section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

This guide helps you understand and optimize the performance of your Flyte workflows. Whether you're building latency-sensitive applications or high-throughput data pipelines, these docs will help you make the right architectural choices.

## Understanding Flyte execution

Before optimizing performance, it's important to understand how Flyte executes your workflows:

- **[Data flow](https://www.union.ai/docs/v2/union/user-guide/run-scaling/data-flow/page.md)**: Learn how data moves between tasks, including inline vs. reference data types, caching mechanisms, and storage configuration.
- **[Life of a run](https://www.union.ai/docs/v2/union/user-guide/run-scaling/life-of-a-run/page.md)**: Understand what happens when you invoke `flyte.run()`, from code analysis and image building to task execution and state management.

## Performance optimization

Once you understand the fundamentals, dive into performance tuning:

- **[Scale your workflows](https://www.union.ai/docs/v2/union/user-guide/run-scaling/scale-your-workflows/page.md)**: A comprehensive guide to optimizing workflow performance, covering latency vs. throughput, task overhead analysis, batching strategies, reusable containers, and more.

## Key concepts for scaling

When scaling your workflows, keep these principles in mind:

1. **Task overhead matters**: The overhead of creating a task (uploading data, enqueuing, creating containers) should be much smaller than the task runtime.
2. **Batch for throughput**: For large-scale data processing, batch multiple items into single tasks to reduce overhead.
3. **Reusable containers**: Eliminate container startup overhead and enable concurrent execution with reusable containers.
4. **Traces for lightweight ops**: Use traces instead of tasks for lightweight operations that need checkpointing.
5. **Limit fanout**: Keep the total number of actions per run below 50k (target 10k-20k for best performance).
6. **Choose the right data types**: Use reference types (files, directories, DataFrames) for large data and inline types for small data.

For detailed guidance on each of these topics, see [Scale your workflows](https://www.union.ai/docs/v2/union/user-guide/run-scaling/scale-your-workflows/page.md).

## Subpages

- [Data flow](https://www.union.ai/docs/v2/union/user-guide/run-scaling/data-flow/page.md)
  - Overview
  - Data types and transport
  - Passed by reference
  - Passed by value (inline I/O)
  - Task execution and data flow
  - Input download
  - Output upload
  - Task-to-task data flow
  - Caching and data hashing
  - Cache key computation
  - Inline data caching
  - Reference data hashing
  - Cache control
  - Traces and data flow
  - Object stores and latency considerations
  - Configuring data storage
  - Organization and project level
  - Per-run configuration
- [Life of a run](https://www.union.ai/docs/v2/union/user-guide/run-scaling/life-of-a-run/page.md)
  - Overview
  - Phase 1: Code analysis and preparation
  - Phase 2: Image building
  - Phase 3: Code bundling
  - Default: `copy_style="loaded_modules"`
  - Alternative: `copy_style="none"`
  - Phase 4: Upload code bundle
  - Phase 5: Run creation and queuing
  - Phase 6: Task execution in data plane
  - Container startup
  - Invoking downstream tasks
  - Execution flow diagram
  - Action identifiers and crash recovery
  - Downstream task execution
  - Reusable containers
  - Reusable container execution flow
  - State replication and visualization
  - Queue Service to Run Service
  - UI limitations
  - Optimization opportunities
- [Scale your workflows](https://www.union.ai/docs/v2/union/user-guide/run-scaling/scale-your-workflows/page.md)
  - Understanding performance dimensions
  - Latency
  - Throughput
  - Task execution overhead
  - The overhead principle
  - System architecture and data flow
  - Optimization strategies
  - 1. Use reusable containers for concurrency
  - 2. Batch workloads to reduce overhead
  - 3. Use traces for lightweight operations
  - 4. Limit fanout for system stability
  - 5. Optimize data transfer
  - 6. Leverage caching
  - 7. Parallelize with `flyte.map`
  - Performance tuning workflow
  - Real-world example: PyIceberg batch processing
  - Example: Optimizing a data pipeline
  - Before optimization
  - After optimization
  - When to contact the Union team
- [Batch inference](https://www.union.ai/docs/v2/union/user-guide/run-scaling/batch-inference/page.md)
  - Why GPU utilization drops
  - Serving vs in-process batch inference
  - Solution: `DynamicBatcher`
  - Basic usage
  - Cost estimation
  - `TokenBatcher` for LLM inference
  - Combining with reusable containers
  - Example: batch LLM inference with vLLM
  - Monitoring utilization

---
**Source**: https://github.com/unionai/unionai-docs/blob/main/content/user-guide/run-scaling/_index.md
**HTML**: https://www.union.ai/docs/v2/union/user-guide/run-scaling/