Code packaging for remote execution
When you run Flyte tasks remotely, your code needs to be available in the execution environment. Flyte SDK provides two main approaches for packaging your code:
- Code bundling - Bundle code dynamically at runtime
- Container-based deployment - Embed code directly in container images
Quick comparison
| Aspect | Code bundling | Container-based |
|---|---|---|
| Speed | Fast (no image rebuild) | Slower (requires image build) |
| Best for | Rapid development, iteration | Production, immutable deployments |
| Code changes | Immediate effect | Requires image rebuild |
| Setup | Automatic by default | Manual configuration needed |
| Reproducibility | Excellent (hash-based versioning) | Excellent (immutable images) |
| Rollback | Requires version control | Tag-based, straightforward |
Code bundling
Default approach - Automatically bundles and uploads your code to remote storage at runtime.
How it works
When you run flyte run or call flyte.run(), Flyte automatically:
- Scans loaded modules from your codebase
- Creates a tarball (gzipped, without timestamps for consistent hashing)
- Uploads to blob storage (S3, GCS, Azure Blob)
- Deduplicates based on content hashes
- Downloads in containers at runtime
This process happens transparently - every container downloads and extracts the code bundle before execution.
Code bundling is optimized for speed:
- Bundles are created without timestamps for consistent hashing
- Identical code produces identical hashes, enabling deduplication
- Only modified code triggers new uploads
- Containers cache downloaded bundles
Reproducibility: Flyte automatically versions code bundles based on content hash. The same code always produces the same hash, guaranteeing reproducibility without manual versioning. However, version control is still recommended for rollback capabilities.
Automatic code bundling
Default behavior - Bundles all loaded modules automatically.
What gets bundled
Flyte includes modules that are:
- ✅ Loaded when environment is parsed (imported at module level)
- ✅ Part of your codebase (not system packages)
- ✅ Within your project directory
- ❌ NOT lazily loaded (imported inside functions)
- ❌ NOT system-installed packages (e.g., from site-packages)
Example: Basic automatic bundling
# app.py
import flyte
from my_module import helper # ✅ Bundled automatically
env = flyte.TaskEnvironment(
name="default",
image=flyte.Image.from_debian_base().with_pip_packages("pandas", "numpy")
)
@env.task
def process_data(x: int) -> int:
# This import won't be bundled (lazy load)
from another_module import util # ❌ Not bundled automatically
return helper.transform(x)
if __name__ == "__main__":
flyte.init_from_config()
run = flyte.run(process_data, x=42)
print(run.url)When you run this:
flyte run app.py process_data --x 42Flyte automatically:
- Bundles
app.pyandmy_module.py - Preserves the directory structure
- Uploads to blob storage
- Makes it available in the remote container
Project structure example
my_project/
├── app.py # Main entry point
├── tasks/
│ ├── __init__.py
│ ├── data_tasks.py # Flyte tasks
│ └── ml_tasks.py
└── utils/
├── __init__.py
├── preprocessing.py # Business logic
└── models.py# app.py
import flyte
from tasks.data_tasks import load_data # ✅ Bundled
from tasks.ml_tasks import train_model # ✅ Bundled
# utils modules imported in tasks are also bundled
@flyte.task
def pipeline(dataset: str) -> float:
data = load_data(dataset)
accuracy = train_model(data)
return accuracy
if __name__ == "__main__":
flyte.init_from_config()
run = flyte.run(pipeline, dataset="train.csv")All modules are bundled with their directory structure preserved.
Manual code bundling
Control exactly what gets bundled by configuring the copy style.
Copy styles
Three options available:
"auto"(default) - Bundle loaded modules only"all"- Bundle everything in the working directory"none"- Skip bundling entirely (requires code in container)
Using copy_style="all"
Bundle all files under your project directory:
import flyte
flyte.init_from_config()
# Bundle everything in current directory
run = flyte.with_runcontext(copy_style="all").run(
my_task,
input_data="sample.csv"
)Or via CLI:
flyte run --copy-style=all app.py my_task --input-data sample.csvUse when:
- You have data files or configuration that tasks need
- You use dynamic imports or lazy loading
- You want to ensure all project files are available
Using copy_style="none"
Skip code bundling (see Container-based Deployment):
run = flyte.with_runcontext(copy_style="none").run(my_task, x=10)Controlling the root directory
The root_dir parameter controls which directory serves as the bundling root.
Why root directory matters
- Determines what gets bundled - All code paths are relative to root_dir
- Preserves import structure - Python imports must match the bundle structure
- Affects path resolution - Files and modules are located relative to root_dir
Setting root directory
Via CLI
flyte run --root-dir /path/to/project app.py my_taskProgrammatically
import pathlib
import flyte
flyte.init_from_config(
root_dir=pathlib.Path(__file__).parent
)Root directory use cases
Use case 1: Multi-module project
project/
├── src/
│ ├── workflows/
│ │ └── pipeline.py
│ └── utils/
│ └── helpers.py
└── config.yaml# src/workflows/pipeline.py
import pathlib
import flyte
from utils.helpers import process # Relative import from project root
# Set root to project root (not src/)
flyte.init_from_config(
root_dir=pathlib.Path(__file__).parent.parent.parent
)
@flyte.task
def my_task():
return process()Root set to project/ so imports like from utils.helpers work correctly.
Use case 2: Shared utilities
workspace/
├── shared/
│ └── common.py
└── project/
└── app.py# project/app.py
import flyte
import pathlib
from shared.common import shared_function # Import from parent directory
# Set root to workspace/ to include shared/
flyte.init_from_config(
root_dir=pathlib.Path(__file__).parent.parent
)Use case 3: Monorepo
monorepo/
├── libs/
│ ├── data/
│ └── models/
└── services/
└── ml_service/
└── workflows.py# services/ml_service/workflows.py
import flyte
import pathlib
from libs.data import loader # Import from monorepo root
from libs.models import predictor
# Set root to monorepo/ to include libs/
flyte.init_from_config(
root_dir=pathlib.Path(__file__).parent.parent.parent
)Root directory best practices
- Set root_dir at project initialization before importing any task modules
- Use absolute paths with
pathlib.Path(__file__).parentnavigation - Match your import structure - if imports are relative to project root, set root_dir to project root
- Keep consistent - use the same root_dir for both
flyte runandflyte.init()
Code bundling examples
Example: Standard Python package
my_package/
├── pyproject.toml
├── src/
│ └── my_package/
│ ├── __init__.py
│ ├── main.py
│ ├── data/
│ │ ├── loader.py
│ │ └── processor.py
│ └── models/
│ └── analyzer.py# src/my_package/main.py
import flyte
import pathlib
from my_package.data.loader import fetch_data
from my_package.data.processor import clean_data
from my_package.models.analyzer import analyze
env = flyte.TaskEnvironment(
name="pipeline",
image=flyte.Image.from_debian_base().with_uv_project(
pyproject_file=pathlib.Path(__file__).parent.parent.parent / "pyproject.toml"
)
)
@env.task
async def fetch_task(url: str) -> dict:
return await fetch_data(url)
@env.task
def process_task(raw_data: dict) -> list[dict]:
return clean_data(raw_data)
@env.task
def analyze_task(data: list[dict]) -> str:
return analyze(data)
if __name__ == "__main__":
import flyte.git
# Set root to project root for proper imports
flyte.init_from_config(
flyte.git.config_from_root(),
root_dir=pathlib.Path(__file__).parent.parent.parent
)
# All modules bundled automatically
run = flyte.run(analyze_task, data=[{"value": 1}, {"value": 2}])
print(f"Run URL: {run.url}")Run with:
cd my_package
flyte run src/my_package/main.py analyze_task --data '[{"value": 1}]'Example: Dynamic environment based on domain
# environment_picker.py
import flyte
def create_env():
"""Create different environments based on domain."""
if flyte.current_domain() == "development":
return flyte.TaskEnvironment(
name="dev",
image=flyte.Image.from_debian_base(),
env_vars={"ENV": "dev", "DEBUG": "true"}
)
elif flyte.current_domain() == "staging":
return flyte.TaskEnvironment(
name="staging",
image=flyte.Image.from_debian_base(),
env_vars={"ENV": "staging", "DEBUG": "false"}
)
else: # production
return flyte.TaskEnvironment(
name="prod",
image=flyte.Image.from_debian_base(),
env_vars={"ENV": "production", "DEBUG": "false"},
resources=flyte.Resources(cpu="2", memory="4Gi")
)
env = create_env()
@env.task
async def process(n: int) -> int:
import os
print(f"Running in {os.getenv('ENV')} environment")
return n * 2
if __name__ == "__main__":
flyte.init_from_config()
run = flyte.run(process, n=5)
print(run.url)Why this works:
flyte.current_domain()is set correctly when Flyte re-instantiates modules remotely- Environment configuration is deterministic and reproducible
- Code automatically bundled with domain-specific settings
flyte.current_domain() only works after flyte.init() is called:
- ✅ Works with
flyte runandflyte deploy(auto-initialize) - ✅ Works in
if __name__ == "__main__"after explicitflyte.init() - ❌ Does NOT work at module level without initialization
When to use code bundling
✅ Use code bundling when:
- Rapid development and iteration
- Frequently changing code
- Multiple developers testing changes
- Jupyter notebook workflows
- Quick prototyping and experimentation
❌ Consider container-based instead when:
- Need easy rollback to previous versions (container tags are simpler than finding git commits)
- Working with air-gapped environments (no blob storage access)
- Code changes require coordinated dependency updates
Container-based deployment
Advanced approach - Embed code directly in container images for immutable deployments.
How it works
Instead of bundling code at runtime:
- Build container image with code copied inside
- Disable code bundling with
copy_style="none" - Container has everything needed at runtime
Trade-off: Every code change requires a new image build (slower), but provides complete reproducibility.
Configuration
Three key steps:
1. Set copy_style="none"
Disable runtime code bundling:
flyte.with_runcontext(copy_style="none").run(my_task, n=10)Or via CLI:
flyte run --copy-style=none app.py my_task --n 102. Copy Code into Image
Use Image.with_source_file() or Image.with_source_folder():
import pathlib
import flyte
env = flyte.TaskEnvironment(
name="embedded",
image=flyte.Image.from_debian_base().with_source_folder(
src=pathlib.Path(__file__).parent,
copy_contents_only=True
)
)3. Set Correct root_dir
Match your image copy configuration:
flyte.init_from_config(
root_dir=pathlib.Path(__file__).parent
)Image source copying methods
with_source_file() - Copy individual files
Copy a single file into the container:
image = flyte.Image.from_debian_base().with_source_file(
src=pathlib.Path(__file__),
dst="/app/main.py"
)Use for:
- Single-file workflows
- Copying configuration files
- Adding scripts to existing images
with_source_folder() - Copy directories
Copy entire directories into the container:
image = flyte.Image.from_debian_base().with_source_folder(
src=pathlib.Path(__file__).parent,
dst="/app",
copy_contents_only=False # Copy folder itself
)Parameters:
src: Source directory pathdst: Destination path in container (optional, defaults to workdir)copy_contents_only: IfTrue, copies folder contents; ifFalse, copies folder itself
copy_contents_only=True (Recommended)
Copies only the contents of the source folder:
# Project structure:
# my_project/
# ├── app.py
# └── utils.py
image = flyte.Image.from_debian_base().with_source_folder(
src=pathlib.Path(__file__).parent,
copy_contents_only=True
)
# Container will have:
# /app/app.py
# /app/utils.py
# Set root_dir to match:
flyte.init_from_config(root_dir=pathlib.Path(__file__).parent)copy_contents_only=False
Copies the folder itself with its name:
# Project structure:
# workspace/
# └── my_project/
# ├── app.py
# └── utils.py
image = flyte.Image.from_debian_base().with_source_folder(
src=pathlib.Path(__file__).parent, # Points to my_project/
copy_contents_only=False
)
# Container will have:
# /app/my_project/app.py
# /app/my_project/utils.py
# Set root_dir to parent to match:
flyte.init_from_config(root_dir=pathlib.Path(__file__).parent.parent)Complete container-based example
# full_build.py
import pathlib
import flyte
from dep import helper # Local module
# Configure environment with source copying
env = flyte.TaskEnvironment(
name="full_build",
image=flyte.Image.from_debian_base()
.with_pip_packages("numpy", "pandas")
.with_source_folder(
src=pathlib.Path(__file__).parent,
copy_contents_only=True
)
)
@env.task
def square(x: int) -> int:
return x ** helper.get_exponent()
@env.task
def main(n: int) -> list[int]:
return list(flyte.map(square, range(n)))
if __name__ == "__main__":
import flyte.git
# Initialize with matching root_dir
flyte.init_from_config(
flyte.git.config_from_root(),
root_dir=pathlib.Path(__file__).parent
)
# Run with copy_style="none" and explicit version
run = flyte.with_runcontext(
copy_style="none",
version="v1.0.0" # Explicit version for image tagging
).run(main, n=10)
print(f"Run URL: {run.url}")
run.wait()Project structure:
project/
├── full_build.py
├── dep.py # Local dependency
└── .flyte/
└── config.yamlRun with:
python full_build.pyThis will:
- Build a container image with
full_build.pyanddep.pyembedded - Tag it as
v1.0.0 - Push to registry
- Execute remotely without code bundling
Using externally built images
When containers are built outside of Flyte (e.g., in CI/CD), use Image.from_ref_name():
Step 1: Build your image externally
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Copy your code
COPY src/ /app/
# Install dependencies
RUN pip install flyte pandas numpy
# Ensure flyte executable is available
RUN flyte --help# Build in CI/CD
docker build -t myregistry.com/my-app:v1.2.3 .
docker push myregistry.com/my-app:v1.2.3Step 2: Reference image by name
# app.py
import flyte
env = flyte.TaskEnvironment(
name="external",
image=flyte.Image.from_ref_name("my-app-image") # Reference name
)
@env.task
def process(x: int) -> int:
return x * 2
if __name__ == "__main__":
flyte.init_from_config()
# Pass actual image URI at deploy/run time
run = flyte.with_runcontext(
copy_style="none",
images={"my-app-image": "myregistry.com/my-app:v1.2.3"}
).run(process, x=10)Or via CLI:
flyte run \
--copy-style=none \
--image my-app-image=myregistry.com/my-app:v1.2.3 \
app.py process --x 10For deployment:
flyte deploy \
--image my-app-image=myregistry.com/my-app:v1.2.3 \
app.pyWhy use reference names?
- Decouples code from image URIs - Change images without modifying code
- Supports multiple environments - Different images for dev/staging/prod
- Integrates with CI/CD - Build images in pipelines, reference in code
- Enables image reuse - Multiple tasks can reference the same image
Example: Multi-environment deployment
import flyte
import os
# Code references image by name
env = flyte.TaskEnvironment(
name="api",
image=flyte.Image.from_ref_name("api-service")
)
@env.task
def api_call(endpoint: str) -> dict:
# Implementation
return {"status": "success"}
if __name__ == "__main__":
flyte.init_from_config()
# Determine image based on environment
environment = os.getenv("ENV", "dev")
image_uri = {
"dev": "myregistry.com/api-service:dev",
"staging": "myregistry.com/api-service:staging",
"prod": "myregistry.com/api-service:v1.2.3"
}[environment]
run = flyte.with_runcontext(
copy_style="none",
images={"api-service": image_uri}
).run(api_call, endpoint="/health")Container-based best practices
-
Always set explicit versions when using
copy_style="none":flyte.with_runcontext(copy_style="none", version="v1.0.0") -
Match
root_dirtocopy_contents_only:copy_contents_only=True→root_dir=Path(__file__).parentcopy_contents_only=False→root_dir=Path(__file__).parent.parent
-
Ensure
flyteexecutable is in container - Add to PATH or install flyte package -
Use
.dockerignoreto exclude unnecessary files:# .dockerignore __pycache__/ *.pyc .git/ .venv/ *.egg-info/ -
Test containers locally before deploying:
docker run -it myimage:latest /bin/bash python -c "import mymodule" # Verify imports work
When to use container-based deployment
✅ Use container-based when:
- Deploying to production
- Need immutable, reproducible environments
- Working with complex system dependencies
- Deploying to air-gapped or restricted environments
- CI/CD pipelines with automated builds
- Code changes are infrequent
❌ Don’t use container-based when:
- Rapid development and frequent code changes
- Quick prototyping
- Interactive development (Jupyter notebooks)
- Learning and experimentation
Choosing the right approach
Decision tree
Are you iterating quickly on code?
├─ Yes → Use Code Bundling (Default)
│ (Development, prototyping, notebooks)
│ Both approaches are fully reproducible via hash/tag
└─ No → Do you need easy version rollback?
├─ Yes → Use Container-based
│ (Production, CI/CD, straightforward tag-based rollback)
└─ No → Either works
(Code bundling is simpler, container-based for air-gapped)Hybrid approach
You can use different approaches for different tasks:
import flyte
import pathlib
# Fast iteration for development tasks
dev_env = flyte.TaskEnvironment(
name="dev",
image=flyte.Image.from_debian_base().with_pip_packages("pandas")
# Code bundling (default)
)
# Immutable containers for production tasks
prod_env = flyte.TaskEnvironment(
name="prod",
image=flyte.Image.from_debian_base()
.with_pip_packages("pandas")
.with_source_folder(pathlib.Path(__file__).parent, copy_contents_only=True)
# Requires copy_style="none"
)
@dev_env.task
def experimental_task(x: int) -> int:
# Rapid development with code bundling
return x * 2
@prod_env.task
def stable_task(x: int) -> int:
# Production with embedded code
return x ** 2
if __name__ == "__main__":
flyte.init_from_config(root_dir=pathlib.Path(__file__).parent)
# Use code bundling for dev task
dev_run = flyte.run(experimental_task, x=5)
# Use container-based for prod task
prod_run = flyte.with_runcontext(
copy_style="none",
version="v1.0.0"
).run(stable_task, x=5)Troubleshooting
Import errors
Problem: ModuleNotFoundError when task executes remotely
Solutions:
-
Check loaded modules - Ensure modules are imported at module level:
# ✅ Good - bundled automatically from mymodule import helper @flyte.task def my_task(): return helper.process()# ❌ Bad - not bundled (lazy load) @flyte.task def my_task(): from mymodule import helper return helper.process() -
Verify
root_dirmatches your import structure:# If imports are: from mypackage.utils import foo # Then root_dir should be parent of mypackage/ flyte.init_from_config(root_dir=pathlib.Path(__file__).parent.parent) -
Use
copy_style="all"to bundle everything:flyte run --copy-style=all app.py my_task
Code changes not reflected
Problem: Remote execution uses old code despite local changes
This is rare with code bundling - Flyte automatically versions based on content hash, so code changes should be detected automatically. This issue typically occurs with caching problems or when using copy_style="none".
Solutions:
-
Use explicit version bump (mainly for container-based deployments):
run = flyte.with_runcontext(version="v2").run(my_task) -
Check if
copy_style="none"is set - this requires image rebuild:# If using copy_style="none", rebuild image run = flyte.with_runcontext( copy_style="none", version="v2" # Bump version to force rebuild ).run(my_task)
Files missing in container
Problem: Task can’t find data files or configs
Solutions:
-
Use
copy_style="all"to bundle all files:flyte run --copy-style=all app.py my_task -
Copy files explicitly in image:
image = flyte.Image.from_debian_base().with_source_file( src=pathlib.Path("config.yaml"), dst="/app/config.yaml" ) -
Store data in remote storage instead of bundling:
@flyte.task def my_task(): # Read from S3/GCS instead of local files import flyte.io data = flyte.io.File("s3://bucket/data.csv").open().read()
Container build failures
Problem: Image build fails with copy_style="none"
Solutions:
-
Check
root_dirmatchescopy_contents_only:# copy_contents_only=True image = Image.from_debian_base().with_source_folder( src=Path(__file__).parent, copy_contents_only=True ) flyte.init(root_dir=Path(__file__).parent) # Match! -
Ensure
flyteexecutable available:image = Image.from_debian_base() # Has flyte pre-installed -
Check file permissions in source directory:
chmod -R +r project/
Version conflicts
Problem: Multiple versions of same image causing confusion
Solutions:
-
Use explicit versions:
run = flyte.with_runcontext( copy_style="none", version="v1.2.3" # Explicit, not auto-generated ).run(my_task) -
Clean old images:
docker image prune -a -
Use semantic versioning for clarity:
version = "v1.0.0" # Major.Minor.Patch
Further reading
- Image API Reference - Complete Image class documentation
- TaskEnvironment - Environment configuration options
- Configuration Guide - Setting up Flyte config files