Code packaging for remote execution

When you run Flyte tasks remotely, your code needs to be available in the execution environment. Flyte SDK provides two main approaches for packaging your code:

Code bundling - Bundle code dynamically at runtime
Container-based deployment - Embed code directly in container images

Quick comparison

Aspect	Code bundling	Container-based
Speed	Fast (no image rebuild)	Slower (requires image build)
Best for	Rapid development, iteration	Production, immutable deployments
Code changes	Immediate effect	Requires image rebuild
Setup	Automatic by default	Manual configuration needed
Reproducibility	Excellent (hash-based versioning)	Excellent (immutable images)
Rollback	Requires version control	Tag-based, straightforward

Code bundling

Default approach - Automatically bundles and uploads your code to remote storage at runtime.

How it works

When you run flyte run or call flyte.run(), Flyte automatically:

Scans loaded modules from your codebase
Creates a tarball (gzipped, without timestamps for consistent hashing)
Uploads to blob storage (S3, GCS, Azure Blob)
Deduplicates based on content hashes
Downloads in containers at runtime

This process happens transparently - every container downloads and extracts the code bundle before execution.

Code bundling is optimized for speed:

Bundles are created without timestamps for consistent hashing
Identical code produces identical hashes, enabling deduplication
Only modified code triggers new uploads
Containers cache downloaded bundles

Reproducibility: Flyte automatically versions code bundles based on content hash. The same code always produces the same hash, guaranteeing reproducibility without manual versioning. However, version control is still recommended for rollback capabilities.

Automatic code bundling

Default behavior - Bundles all loaded modules automatically.

What gets bundled

Flyte includes modules that are:

✅ Loaded when environment is parsed (imported at module level)
✅ Part of your codebase (not system packages)
✅ Within your project directory
❌ NOT lazily loaded (imported inside functions)
❌ NOT system-installed packages (e.g., from site-packages)

Example: Basic automatic bundling

        
    
# app.py
import flyte
from my_module import helper  # ✅ Bundled automatically

env = flyte.TaskEnvironment(
    name="default",
    image=flyte.Image.from_debian_base().with_pip_packages("pandas", "numpy")
)

@env.task
def process_data(x: int) -> int:
    # This import won't be bundled (lazy load)
    from another_module import util  # ❌ Not bundled automatically
    return helper.transform(x)

if __name__ == "__main__":
    flyte.init_from_config()
    run = flyte.run(process_data, x=42)
    print(run.url)

When you run this:

flyte run app.py process_data --x 42

Flyte automatically:

Bundles app.py and my_module.py
Preserves the directory structure
Uploads to blob storage
Makes it available in the remote container

Project structure example

        
    
my_project/
├── app.py              # Main entry point
├── tasks/
│   ├── __init__.py
│   ├── data_tasks.py   # Flyte tasks
│   └── ml_tasks.py
└── utils/
    ├── __init__.py
    ├── preprocessing.py # Business logic
    └── models.py

        
    
# app.py
import flyte
from tasks.data_tasks import load_data    # ✅ Bundled
from tasks.ml_tasks import train_model    # ✅ Bundled
# utils modules imported in tasks are also bundled

@flyte.task
def pipeline(dataset: str) -> float:
    data = load_data(dataset)
    accuracy = train_model(data)
    return accuracy

if __name__ == "__main__":
    flyte.init_from_config()
    run = flyte.run(pipeline, dataset="train.csv")

All modules are bundled with their directory structure preserved.

Manual code bundling

Control exactly what gets bundled by configuring the copy style.

Copy styles

Three options available:

"auto" (default) - Bundle loaded modules only
"all" - Bundle everything in the working directory
"none" - Skip bundling entirely (requires code in container)

Using `copy_style="all"`

Bundle all files under your project directory:

        
    
import flyte

flyte.init_from_config()

# Bundle everything in current directory
run = flyte.with_runcontext(copy_style="all").run(
    my_task,
    input_data="sample.csv"
)

Or via CLI:

flyte run --copy-style=all app.py my_task --input-data sample.csv

Use when:

You have data files or configuration that tasks need
You use dynamic imports or lazy loading
You want to ensure all project files are available

Using `copy_style="none"`

Skip code bundling (see Container-based Deployment):

run = flyte.with_runcontext(copy_style="none").run(my_task, x=10)

Controlling the root directory

The root_dir parameter controls which directory serves as the bundling root.

Why root directory matters

Determines what gets bundled - All code paths are relative to root_dir
Preserves import structure - Python imports must match the bundle structure
Affects path resolution - Files and modules are located relative to root_dir

Setting root directory

Via CLI

flyte run --root-dir /path/to/project app.py my_task

Programmatically

        
    
import pathlib
import flyte

flyte.init_from_config(
    root_dir=pathlib.Path(__file__).parent
)

Root directory use cases

Use case 1: Multi-module project

        
    
project/
├── src/
│   ├── workflows/
│   │   └── pipeline.py
│   └── utils/
│       └── helpers.py
└── config.yaml

        
    
# src/workflows/pipeline.py
import pathlib
import flyte
from utils.helpers import process  # Relative import from project root

# Set root to project root (not src/)
flyte.init_from_config(
    root_dir=pathlib.Path(__file__).parent.parent.parent
)

@flyte.task
def my_task():
    return process()

Root set to project/ so imports like from utils.helpers work correctly.

Use case 2: Shared utilities

        
    
workspace/
├── shared/
│   └── common.py
└── project/
    └── app.py

        
    
# project/app.py
import flyte
import pathlib
from shared.common import shared_function  # Import from parent directory

# Set root to workspace/ to include shared/
flyte.init_from_config(
    root_dir=pathlib.Path(__file__).parent.parent
)

Use case 3: Monorepo

        
    
monorepo/
├── libs/
│   ├── data/
│   └── models/
└── services/
    └── ml_service/
        └── workflows.py

        
    
# services/ml_service/workflows.py
import flyte
import pathlib
from libs.data import loader  # Import from monorepo root
from libs.models import predictor

# Set root to monorepo/ to include libs/
flyte.init_from_config(
    root_dir=pathlib.Path(__file__).parent.parent.parent
)

Root directory best practices

Set root_dir at project initialization before importing any task modules
Use absolute paths with pathlib.Path(__file__).parent navigation
Match your import structure - if imports are relative to project root, set root_dir to project root
Keep consistent - use the same root_dir for both flyte run and flyte.init()

Code bundling examples

Example: Standard Python package

        
    
my_package/
├── pyproject.toml
├── src/
│   └── my_package/
│       ├── __init__.py
│       ├── main.py
│       ├── data/
│       │   ├── loader.py
│       │   └── processor.py
│       └── models/
│           └── analyzer.py

        
    
# src/my_package/main.py
import flyte
import pathlib
from my_package.data.loader import fetch_data
from my_package.data.processor import clean_data
from my_package.models.analyzer import analyze

env = flyte.TaskEnvironment(
    name="pipeline",
    image=flyte.Image.from_debian_base().with_uv_project(
        pyproject_file=pathlib.Path(__file__).parent.parent.parent / "pyproject.toml"
    )
)

@env.task
async def fetch_task(url: str) -> dict:
    return await fetch_data(url)

@env.task
def process_task(raw_data: dict) -> list[dict]:
    return clean_data(raw_data)

@env.task
def analyze_task(data: list[dict]) -> str:
    return analyze(data)

if __name__ == "__main__":
    import flyte.git

    # Set root to project root for proper imports
    flyte.init_from_config(
        flyte.git.config_from_root(),
        root_dir=pathlib.Path(__file__).parent.parent.parent
    )

    # All modules bundled automatically
    run = flyte.run(analyze_task, data=[{"value": 1}, {"value": 2}])
    print(f"Run URL: {run.url}")

Run with:

        
cd my_package
flyte run src/my_package/main.py analyze_task --data '[{"value": 1}]'

Example: Dynamic environment based on domain

        
    
# environment_picker.py
import flyte

def create_env():
    """Create different environments based on domain."""
    if flyte.current_domain() == "development":
        return flyte.TaskEnvironment(
            name="dev",
            image=flyte.Image.from_debian_base(),
            env_vars={"ENV": "dev", "DEBUG": "true"}
        )
    elif flyte.current_domain() == "staging":
        return flyte.TaskEnvironment(
            name="staging",
            image=flyte.Image.from_debian_base(),
            env_vars={"ENV": "staging", "DEBUG": "false"}
        )
    else:  # production
        return flyte.TaskEnvironment(
            name="prod",
            image=flyte.Image.from_debian_base(),
            env_vars={"ENV": "production", "DEBUG": "false"},
            resources=flyte.Resources(cpu="2", memory="4Gi")
        )

env = create_env()

@env.task
async def process(n: int) -> int:
    import os
    print(f"Running in {os.getenv('ENV')} environment")
    return n * 2

if __name__ == "__main__":
    flyte.init_from_config()
    run = flyte.run(process, n=5)
    print(run.url)

Why this works:

flyte.current_domain() is set correctly when Flyte re-instantiates modules remotely
Environment configuration is deterministic and reproducible
Code automatically bundled with domain-specific settings

flyte.current_domain() only works after flyte.init() is called:

✅ Works with flyte run and flyte deploy (auto-initialize)
✅ Works in if __name__ == "__main__" after explicit flyte.init()
❌ Does NOT work at module level without initialization

When to use code bundling

✅ Use code bundling when:

Rapid development and iteration
Frequently changing code
Multiple developers testing changes
Jupyter notebook workflows
Quick prototyping and experimentation

❌ Consider container-based instead when:

Need easy rollback to previous versions (container tags are simpler than finding git commits)
Working with air-gapped environments (no blob storage access)
Code changes require coordinated dependency updates

Container-based deployment

Advanced approach - Embed code directly in container images for immutable deployments.

How it works

Instead of bundling code at runtime:

Build container image with code copied inside
Disable code bundling with copy_style="none"
Container has everything needed at runtime

Trade-off: Every code change requires a new image build (slower), but provides complete reproducibility.

Configuration

Three key steps:

1. Set `copy_style="none"`

Disable runtime code bundling:

flyte.with_runcontext(copy_style="none").run(my_task, n=10)

Or via CLI:

flyte run --copy-style=none app.py my_task --n 10

2. Copy Code into Image

Use Image.with_source_file() or Image.with_source_folder():

        
    
import pathlib
import flyte

env = flyte.TaskEnvironment(
    name="embedded",
    image=flyte.Image.from_debian_base().with_source_folder(
        src=pathlib.Path(__file__).parent,
        copy_contents_only=True
    )
)

3. Set Correct `root_dir`

Match your image copy configuration:

        
    
flyte.init_from_config(
    root_dir=pathlib.Path(__file__).parent
)

Image source copying methods

`with_source_file()` - Copy individual files

Copy a single file into the container:

        
    
image = flyte.Image.from_debian_base().with_source_file(
    src=pathlib.Path(__file__),
    dst="/app/main.py"
)

Use for:

Single-file workflows
Copying configuration files
Adding scripts to existing images

`with_source_folder()` - Copy directories

Copy entire directories into the container:

        
    
image = flyte.Image.from_debian_base().with_source_folder(
    src=pathlib.Path(__file__).parent,
    dst="/app",
    copy_contents_only=False  # Copy folder itself
)

Parameters:

src: Source directory path
dst: Destination path in container (optional, defaults to workdir)
copy_contents_only: If True, copies folder contents; if False, copies folder itself

`copy_contents_only=True` (Recommended)

Copies only the contents of the source folder:

        
    
# Project structure:
# my_project/
#   ├── app.py
#   └── utils.py

image = flyte.Image.from_debian_base().with_source_folder(
    src=pathlib.Path(__file__).parent,
    copy_contents_only=True
)

# Container will have:
# /app/app.py
# /app/utils.py

# Set root_dir to match:
flyte.init_from_config(root_dir=pathlib.Path(__file__).parent)

`copy_contents_only=False`

Copies the folder itself with its name:

        
    
# Project structure:
# workspace/
#   └── my_project/
#       ├── app.py
#       └── utils.py

image = flyte.Image.from_debian_base().with_source_folder(
    src=pathlib.Path(__file__).parent,  # Points to my_project/
    copy_contents_only=False
)

# Container will have:
# /app/my_project/app.py
# /app/my_project/utils.py

# Set root_dir to parent to match:
flyte.init_from_config(root_dir=pathlib.Path(__file__).parent.parent)

Complete container-based example

        
    
# full_build.py
import pathlib
import flyte
from dep import helper  # Local module

# Configure environment with source copying
env = flyte.TaskEnvironment(
    name="full_build",
    image=flyte.Image.from_debian_base()
        .with_pip_packages("numpy", "pandas")
        .with_source_folder(
            src=pathlib.Path(__file__).parent,
            copy_contents_only=True
        )
)

@env.task
def square(x: int) -> int:
    return x ** helper.get_exponent()

@env.task
def main(n: int) -> list[int]:
    return list(flyte.map(square, range(n)))

if __name__ == "__main__":
    import flyte.git

    # Initialize with matching root_dir
    flyte.init_from_config(
        flyte.git.config_from_root(),
        root_dir=pathlib.Path(__file__).parent
    )

    # Run with copy_style="none" and explicit version
    run = flyte.with_runcontext(
        copy_style="none",
        version="v1.0.0"  # Explicit version for image tagging
    ).run(main, n=10)

    print(f"Run URL: {run.url}")
    run.wait()

Project structure:

        
    
project/
├── full_build.py
├── dep.py          # Local dependency
└── .flyte/
    └── config.yaml

Run with:

python full_build.py

This will:

Build a container image with full_build.py and dep.py embedded
Tag it as v1.0.0
Push to registry
Execute remotely without code bundling

Using externally built images

When containers are built outside of Flyte (e.g., in CI/CD), use Image.from_ref_name():

Step 1: Build your image externally

        
    
# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Copy your code
COPY src/ /app/

# Install dependencies
RUN pip install flyte pandas numpy

# Ensure flyte executable is available
RUN flyte --help

        
    
# Build in CI/CD
docker build -t myregistry.com/my-app:v1.2.3 .
docker push myregistry.com/my-app:v1.2.3

Step 2: Reference image by name

        
    
# app.py
import flyte

env = flyte.TaskEnvironment(
    name="external",
    image=flyte.Image.from_ref_name("my-app-image")  # Reference name
)

@env.task
def process(x: int) -> int:
    return x * 2

if __name__ == "__main__":
    flyte.init_from_config()

    # Pass actual image URI at deploy/run time
    run = flyte.with_runcontext(
        copy_style="none",
        images={"my-app-image": "myregistry.com/my-app:v1.2.3"}
    ).run(process, x=10)

Or via CLI:

        
    
flyte run \
  --copy-style=none \
  --image my-app-image=myregistry.com/my-app:v1.2.3 \
  app.py process --x 10

For deployment:

        
    
flyte deploy \
  --image my-app-image=myregistry.com/my-app:v1.2.3 \
  app.py

Why use reference names?

Decouples code from image URIs - Change images without modifying code
Supports multiple environments - Different images for dev/staging/prod
Integrates with CI/CD - Build images in pipelines, reference in code
Enables image reuse - Multiple tasks can reference the same image

Example: Multi-environment deployment

        
    
import flyte
import os

# Code references image by name
env = flyte.TaskEnvironment(
    name="api",
    image=flyte.Image.from_ref_name("api-service")
)

@env.task
def api_call(endpoint: str) -> dict:
    # Implementation
    return {"status": "success"}

if __name__ == "__main__":
    flyte.init_from_config()

    # Determine image based on environment
    environment = os.getenv("ENV", "dev")
    image_uri = {
        "dev": "myregistry.com/api-service:dev",
        "staging": "myregistry.com/api-service:staging",
        "prod": "myregistry.com/api-service:v1.2.3"
    }[environment]

    run = flyte.with_runcontext(
        copy_style="none",
        images={"api-service": image_uri}
    ).run(api_call, endpoint="/health")

Container-based best practices

Always set explicit versions when using copy_style="none":

flyte.with_runcontext(copy_style="none", version="v1.0.0")

Match root_dir to copy_contents_only:
- copy_contents_only=True → root_dir=Path(__file__).parent
- copy_contents_only=False → root_dir=Path(__file__).parent.parent
Ensure flyte executable is in container - Add to PATH or install flyte package

Use .dockerignore to exclude unnecessary files:

        
    
# .dockerignore
__pycache__/
*.pyc
.git/
.venv/
*.egg-info/

Test containers locally before deploying:

        
docker run -it myimage:latest /bin/bash
python -c "import mymodule"  # Verify imports work

When to use container-based deployment

✅ Use container-based when:

Deploying to production
Need immutable, reproducible environments
Working with complex system dependencies
Deploying to air-gapped or restricted environments
CI/CD pipelines with automated builds
Code changes are infrequent

❌ Don’t use container-based when:

Rapid development and frequent code changes
Quick prototyping
Interactive development (Jupyter notebooks)
Learning and experimentation

Choosing the right approach

Decision tree

        
    
Are you iterating quickly on code?
├─ Yes → Use Code Bundling (Default)
│         (Development, prototyping, notebooks)
│         Both approaches are fully reproducible via hash/tag
└─ No  → Do you need easy version rollback?
          ├─ Yes → Use Container-based
          │         (Production, CI/CD, straightforward tag-based rollback)
          └─ No  → Either works
                    (Code bundling is simpler, container-based for air-gapped)

Hybrid approach

You can use different approaches for different tasks:

        
    
import flyte
import pathlib

# Fast iteration for development tasks
dev_env = flyte.TaskEnvironment(
    name="dev",
    image=flyte.Image.from_debian_base().with_pip_packages("pandas")
    # Code bundling (default)
)

# Immutable containers for production tasks
prod_env = flyte.TaskEnvironment(
    name="prod",
    image=flyte.Image.from_debian_base()
        .with_pip_packages("pandas")
        .with_source_folder(pathlib.Path(__file__).parent, copy_contents_only=True)
    # Requires copy_style="none"
)

@dev_env.task
def experimental_task(x: int) -> int:
    # Rapid development with code bundling
    return x * 2

@prod_env.task
def stable_task(x: int) -> int:
    # Production with embedded code
    return x ** 2

if __name__ == "__main__":
    flyte.init_from_config(root_dir=pathlib.Path(__file__).parent)

    # Use code bundling for dev task
    dev_run = flyte.run(experimental_task, x=5)

    # Use container-based for prod task
    prod_run = flyte.with_runcontext(
        copy_style="none",
        version="v1.0.0"
    ).run(stable_task, x=5)

Troubleshooting

Import errors

Problem: ModuleNotFoundError when task executes remotely

Solutions:

Check loaded modules - Ensure modules are imported at module level:

        
    
# ✅ Good - bundled automatically
from mymodule import helper

@flyte.task
def my_task():
    return helper.process()

        
    
# ❌ Bad - not bundled (lazy load)
@flyte.task
def my_task():
    from mymodule import helper
    return helper.process()

Verify root_dir matches your import structure:

        
    
# If imports are: from mypackage.utils import foo
# Then root_dir should be parent of mypackage/
flyte.init_from_config(root_dir=pathlib.Path(__file__).parent.parent)

Use copy_style="all" to bundle everything:

flyte run --copy-style=all app.py my_task

Code changes not reflected

Problem: Remote execution uses old code despite local changes

This is rare with code bundling - Flyte automatically versions based on content hash, so code changes should be detected automatically. This issue typically occurs with caching problems or when using copy_style="none".

Solutions:

Use explicit version bump (mainly for container-based deployments):
```
run = flyte.with_runcontext(version="v2").run(my_task)
```

Check if copy_style="none" is set - this requires image rebuild:

        
    
# If using copy_style="none", rebuild image
run = flyte.with_runcontext(
    copy_style="none",
    version="v2"  # Bump version to force rebuild
).run(my_task)

Files missing in container

Problem: Task can’t find data files or configs

Solutions:

Use copy_style="all" to bundle all files:

flyte run --copy-style=all app.py my_task

Copy files explicitly in image:

        
    
image = flyte.Image.from_debian_base().with_source_file(
    src=pathlib.Path("config.yaml"),
    dst="/app/config.yaml"
)

Store data in remote storage instead of bundling:

        
    
@flyte.task
def my_task():
    # Read from S3/GCS instead of local files
    import flyte.io
    data = flyte.io.File("s3://bucket/data.csv").open().read()

Container build failures

Problem: Image build fails with copy_style="none"

Solutions:

Check root_dir matches copy_contents_only:

        
    
# copy_contents_only=True
image = Image.from_debian_base().with_source_folder(
    src=Path(__file__).parent,
    copy_contents_only=True
)
flyte.init(root_dir=Path(__file__).parent)  # Match!

Ensure flyte executable available:

image = Image.from_debian_base()  # Has flyte pre-installed

Check file permissions in source directory:
```
chmod -R +r project/
```

Version conflicts

Problem: Multiple versions of same image causing confusion

Solutions:

Use explicit versions:

        
    
run = flyte.with_runcontext(
    copy_style="none",
    version="v1.2.3"  # Explicit, not auto-generated
).run(my_task)

Clean old images:
```
docker image prune -a
```
Use semantic versioning for clarity:
```
version = "v1.0.0"  # Major.Minor.Patch
```