# Self-managed deployment
> This bundle contains all pages in the Self-managed deployment section.
> Source: https://www.union.ai/docs/v2/union/deployment/selfmanaged/

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged ===

# Self-managed deployment

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

In a self-managed deployment, you operate the data plane on your own Kubernetes infrastructure.
Union.ai runs the control plane, but you manage the cluster, upgrades, and operational aspects of the data plane yourself.
Union.ai has no access to your cluster, providing the highest level of data isolation.

## Getting started

1. Review the [architecture](./architecture/_index) to understand the control plane, data plane operators, and security model.
2. Check the **Self-managed deployment > Cluster recommendations** for Kubernetes version, networking, and IP planning requirements.
3. Set up your data plane on your cloud provider:
   - **Self-managed deployment > Data plane setup on generic Kubernetes** (on-premise or any S3-compatible environment)
   - [AWS](./selfmanaged-aws/_index)
   - **Self-managed deployment > Data plane setup on GKE (GCP)**
   - **Self-managed deployment > Data plane setup on Azure**
   - **Self-managed deployment > Data plane setup on OCI**

## Configuration

After initial setup, configure platform features on your cluster:

- **Self-managed deployment > Advanced Configurations > Authentication**
- **Self-managed deployment > Advanced Configurations > Image Builder**
- **Self-managed deployment > Advanced Configurations > Multiple Clusters**
- **Self-managed deployment > Advanced Configurations > Configuring Service and Worker Node Pools**
- **Self-managed deployment > Advanced Configurations > Monitoring**
- **Self-managed deployment > Advanced Configurations > Persistent logs**
- **Self-managed deployment > Advanced Configurations > Data retention policies**
- **Self-managed deployment > Advanced Configurations > Namespace mapping**
- **Self-managed deployment > Advanced Configurations > Secrets**

## Reference

- [Helm chart reference](./helm-chart-reference/_index) for available chart values
- **Self-managed deployment > Architecture > Kubernetes Access Controls** for RBAC configuration details

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture ===

# Architecture

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

This section covers the architecture of the Union.ai data plane.
It provides an overview of the components and their interactions within the system.
Understanding the architecture is crucial for effectively deploying and managing your Union.ai cluster.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture/overview ===

# Overview

The Union.ai architecture consists of two components, referred to as planes — the control plane and the data plane.

![](../../../_static/images/deployment/architecture.svg)

## Control plane

The control plane:
  * Runs within the Union.ai AWS account.
  * Provides the user interface through which users can access authentication, authorization, observation, and management functions.
  * Is responsible for placing executions onto data plane clusters and performing other cluster control and management functions.

## Data plane

Union.ai operates one control plane for each supported region, which supports all data planes within that region. You can choose the region in which to locate your data plane. Currently, Union.ai supports the `us-west`, `us-east`, `eu-west`, and `eu-central` regions, and more are being added.

### Data plane nodes

Worker nodes are responsible for executing your workloads. You have full control over the configuration of your worker nodes. When worker nodes are not in use, they automatically scale down to the configured minimum.

## Union.ai operator

The Union.ai hybrid architecture lets you maintain ultimate ownership and control of your data and compute infrastructure while enabling Union.ai to handle the details of managing that infrastructure.

Management of the data plane is mediated by a dedicated operator (the Union.ai operator) resident on that plane.
This operator is designed to perform its functions with only the very minimum set of required permissions.
It allows the control plane to spin up and down clusters and provides Union.ai's support engineers with access to system-level logs and the ability to apply changes as per customer requests.
It _does not_ provide direct access to secrets or data.

In addition, communication is always initiated by the Union.ai operator in the data plane toward the Union.ai control plane, not the other way around.
This further enhances the security of your data plane.

Union.ai is SOC-2 Type 2 certified. A copy of the audit report is available upon request.

## Registry data

Registry data is comprised of:

* Names of workflows, tasks, launch plans, and artifacts
* Input and output types for workflows and tasks
* Execution status, start time, end time, and duration of workflows and tasks
* Version information for workflows, tasks, launchplans, and artifacts
* Artifact definitions

This type of data is stored in the control plane and is used to manage the execution of your workflows.
This does not include any workflow or task code, nor any data that is processed by your workflows or tasks.

## Execution data

Execution data is comprised of::

* Event data
* Workflow inputs
* Workflow outputs
* Data passed between tasks (task inputs and outputs)

This data is divided into two categories: *raw data* and *literal data*.

### Raw data

Raw data is comprised of:

* Files and directories
* Dataframes
* Models
* Python-pickled types

These are passed by reference between tasks and are always stored in an object store in your data plane.
This type of data is read by (and may be temporarily cached) by the control plane as needed, but is never stored there.

### Literal data

* Primitive execution inputs (int, string... etc.)
* JSON-serializable dataclasses

These are passed by value, not by reference, and may be stored in the Union.ai control plane.

## Data privacy

If you are concerned with maintaining strict data privacy, be sure not to pass private information in literal form between tasks.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture/kubernetes-rbac ===

# Kubernetes Access Controls

## Roles

See the [dataplane helm charts](https://github.com/unionai/helm-charts/tree/main/charts/dataplane) for detailed information about Roles and ClusterRoles.

### Role Permissions Summary

##### `proxy-system-secret`
- Scoped to `union` namespace
- Permissions on secrets: get, list, create, update, delete

##### `operator-system`
- Scoped to `union` namespace
- Permissions on secrets and deployments: get, list, watch, create, update

##### `union-operator-admission` (for webhook)
- Scoped to `union` namespace
- Permissions on secrets: get, create

### ClusterRole Permissions Summary

#### Metrics and Monitoring Roles

##### `release-name-kube-state-metrics`

- **Purpose**: Collects metrics from Kubernetes resources
- **Access Pattern**: Read-only (`list`, `watch`) to numerous resources across multiple API groups
- **Scope**: Comprehensive - covers core resources, workloads, networking, storage, and authentication

##### `prometheus-operator`
- **Access**: Full control (`*`) over Prometheus monitoring resources
- **Key Permissions**:
  - Complete access to monitoring.coreos.com API group resources
  - Full access to statefulsets, configmaps, secrets
  - Pod management (list, delete)
  - Service/endpoint management
  - Read-only for nodes, namespaces, ingresses

##### `union-operator-prometheus`
- **Access**: Read-only access to metrics sources
- **Resources**: nodes, services, endpoints, pods, endpointslices, ingresses
- **Special**: Access to `/metrics` and `/metrics/cadvisor` endpoints

#### Resource Management Roles

##### `clustersync-resource`
- **Access**: Full control (`*`) over core and RBAC resources
- **Resources**:
  - Core: configmaps, namespaces, pods, resourcequotas, secrets, services, serviceaccounts
  - RBAC: roles, rolebindings, clusterrolebindings
- **API Groups**: `""` (core) and `rbac.authorization.k8s.io`

##### `proxy-system`
- **Access**: Read-only (`get`, `list`, `watch`)
- **Resources**: events, flyteworkflows, pods/log, pods, rayjobs, resourcequotas

#### Workflow Management Roles

##### `operator-system`
- **Access**: Full control over Flyte workflows, CRUD for core resources
- **Resources**:
  - Full access to flyteworkflows
  - Management of pods, configmaps, resourcequotas, podtemplates, nodes
  - Access to `/metrics` endpoint

##### `flytepropeller-webhook-role`
- **Access**: Get, create, update, patch
- **Resources**: mutatingwebhookconfigurations, secrets, pods, replicasets/finalizers
##### `flytepropeller-role`
- **Access**: Varied per resource type
- **Key Permissions**:
  - Read-only for pods
  - Event management
  - CRD management
  - Full control over flyteworkflows including finalizers

## Service Access

### `operator/operator-proxy`
Service that provides access to both cluster resources and cloud provider APIs, particularly focused on compute resource management.

#### Kubernetes Resources

##### Core Resources
- Pods: Access via informers to monitor and manage pod lifecycle.
- Nodes: Access to retrieve node information.
- ResourceQuotas: Read access.
- ConfigMaps: Access for configuration management
- Secrets: Access for credentials storage
- Namespaces: Referenced in container/pod identification contexts

##### Custom Resources
- FlyteWorkflows: Management of v1alpha1.FlyteWorkflow resources
- Kueue Resources (optional): Access to ResourceFlavor, ClusterQueue, and other queue resources
- Karpenter NodePools (optional): For AWS-based compute resource management

##### Cloud Provider Resources
- Object Storage: Read/write operations to cloud storage buckets

##### Authentication and Configuration
- OAuth: Uses app ID for authentication with Union cloud services
- Service Account Roles: Configured via UserRoleKey and UserRole
- Cluster Information: Access to cluster metadata and metrics

### `FlytePropeller/PropellerWebhook`
Kubernetes operator that executes Flyte graphs natively on Kubernetes.

#### Kubernetes Resources
- Manages pod creation for executions
- Secret injection

#### Custom Resources
- FlyteWorkflows: Management of v1alpha1.FlyteWorkflow resources

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/cluster-recommendations ===

# Cluster recommendations

Union.ai is capable of running on any Kubernetes cluster.
This includes managed Kubernetes services such as Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), and Amazon Elastic Kubernetes Service (EKS), as well as self-managed Kubernetes clusters.

While many configurations are supported, we have some recommendations to ensure the best performance and reliability of your Union deployment.

## Kubernetes Versions

We recommend running Kubernetes versions that are [actively supported by the Kubernetes community](https://kubernetes.io/releases/).  This
typically means running one of the most recent three minor versions.  For example, if the most recent version is 1.32, we recommend
running 1.32, 1.31, or 1.30.

## Networking Requirements

Many Container Network Interface (CNI) plugins require planning for IP address allocation capacity.
For example, [Amazon's VPC CNI](https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html) and [GKE's Dataplane v2](https://cloud.google.com/kubernetes-engine/docs/concepts/dataplane-v2)
allocate IP addresses to Kubernetes Pods out of one or more or your VPC's subnets.
If you are using one of these CNI plugins, you should ensure that your VPC's subnets have enough available IP addresses to support the number of concurrent tasks you expect to run.

We recommend using at least a `/16` CIDR range (65,536 addresses), you may optionally subdivide this range into smaller subnets to
support multiple availability zones or other network segmentation requirements.

In short, you should aim to have at least 1 IP address available for each task you expect to run concurrently.

# Performance Recommendations

## Node Pools

It is recommended but not required to use separate node pools for the Union services and the Union worker pods.  This allows you to
guard against resource contention between Union services and other tasks running in your cluster.  You can find additional information
in the [Configuring Node Pools](./configuration/node-pools) section.

# AWS

## S3

Each data plane uses an object store (an AWS S3 bucket, GCS bucket or ABS container) that is used to store data used in the execution of workflows.
As a Union.ai administrator, you can specify retention policies for this data when setting up your data plane 
(learn more [about the different types of data categories](./configuration/data-retention) stored by the data plane.)

Union recommends the use of two S3 buckets:

1. metadata bucket: contains workflow execution data such as Task inputs and outputs, etc
2. fast registration bucket: contain local code artifacts that will be copied into the Flyte task container at runtime when using `union register` or `union run --remote --copy-all`.

Note: You can choose to use a single bucket in your dataplane

### Data Retention

Union recommends using Lifecycle Policy on these buckets to manage the storage costs. See [Data retention policy](./configuration/data-retention) for more information.

## IAM

You will need to enable access to your S3 buckets from the cluster.

1. Update the EKS Node IAM role for your cluster to allow the data plane nodes to use your S3 buckets.
   This can be done by creating and attaching a new IAM policy which enables access to your S3 buckets.
   Use `union-flyte-worker` as the name of the new policy.
   The permissions for the policy will be:

   ```json
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Sid": "Statement1",
               "Effect": "Allow",
               "Action": [
                   "s3:DeleteObject*",
                   "s3:GetObject*",
                   "s3:ListBucket",
                   "s3:PutObject*"
               ],
               "Resource": [
                   "arn:aws:s3:::<bucket-name>",
                   "arn:aws:s3:::<bucket-name>/*"
               ]
           }
       ]
   }
   ```

2. Attach this policy to your node group IAM role
3. Create an [IAM OIDC provider for your EKS cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html#_create_oidc_provider_eksctl).
4. Create a new role named `union-flyte-role` to enable applications in a Pod’s containers to make API requests to AWS services using AWS Identity and Access Management (IAM) permissions.

   The Trust Policy for this role will be:

   ```json
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Federated": "arn:aws:iam::$account_id:oidc-provider/$oidc_provider"
               },
               "Action": "sts:AssumeRoleWithWebIdentity",
               "Condition": {
                   "StringLike": {
                       "$oidc_provider:aud": "sts.amazonaws.com",
                       "$oidc_provider:sub": "system:serviceaccount:*:*"
                   }
               }
           }
       ]
   }
   ```
   where `$account_id` is your AWS account ID and `$oidc_provider` is the OIDC provider you created above.

   You can obtain these values using the AWS CLI:

   ```bash
   aws eks describe-cluster --region $cloud_region --name $cluster_name --query "cluster.identity.oidc.issuer" --output text
   ```

5. Attach the `union-flyte-worker` policy created above to this new role.

## EKS configuration

Union recommends installing the following EKS add-ons:
  - CoreDNS
  - Amazon VPC CNI
  - Kube-proxy

Union supports Autoscaling and the use of spot (interruptible) instances.

# AKS

## Secure access

Union recommends using [Microsoft Entra Workload ID](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview) to securely access Azure resources.

Ensure your AKS cluster is [enabled as OIDC Issuer](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer).

Create a User Assigned Managed Identity with Federated Credentials that map to the following Kubernetes Service Accounts:

**Subject Identifier**

- `system:serviceaccount:<NAMESPACE>:flytepropeller-system`
- `system:serviceaccount:<NAMESPACE>:flytepropeller-webhook-system`
- `system:serviceaccount:<NAMESPACE>:operator-system`
- `system:serviceaccount:<NAMESPACE>:proxy-system`
- `system:serviceaccount:<NAMESPACE>:executor`

Where `<NAMESPACE>` is where you plan to install the Union operator (`union` by default)

Assign the `Storage Blob Data Owner` role to this Identity at the Storage Account level.

### Workers

This is the Identity that the Pods created for each execution will use to access Azure resources. Those Pods use the `default` K8s Service Account on each project-domain namespace, unless otherwise specified.

Create a User Assigned Managed Identity with Federated Credentials that map to the `default` K8s Service Account:

**Subject Identifier**

- `system:serviceaccount:development:default`
- `system:serviceaccount:staging:default`
- `system:serviceaccount:production:default`

Assign the `Storage Blob Data Owner` role to this Identity at the Storage Account level.

## Azure Key Vault
Union ships with an embedded secrets manager. Alternatively, you can enable Union to consume secrets from Azure Key Vault adding the following to your Helm values file:

```yaml
config:

  ## Optional integration with Azure Key Vault secrets manager
  core:
    webhook:
      embeddedSecretManagerConfig:
        enabled: true
        type: Azure
        azureConfig:
          vaultURI: ""https://kv-myorg-prod.vault.azure.net/" #full key vault URI
      secretManagerTypes:
        - Azure
        - Embedded

```
## Node pools

By default, the Union installation request the following resources:

|          | CPU (vCPUs)| Memory (GiB) |
|----------|------------|--------------|
| Requests |          14|          27.1|
| Limits   |          17|            32|

For GPU access, Union injects tolerations and label selectors to execution Pods.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-generic ===

# Data plane setup on generic Kubernetes

Union.ai’s modular architecture allows for great flexibility and control.
The customer can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.
The Union architecture is described on the [Architecture](./architecture/_index) page.

> [!NOTE] These instructions cover installing Union.ai in an on-premise Kubernetes cluster.
> If you are installing at a cloud provider, use the cloud provider specific instructions: [AWS](./selfmanaged-aws/_index), [Azure](./selfmanaged-azure), [OCI](./selfmanaged-oci).

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization. (e.g. https://your-org-name.us-east-2.unionai.cloud).
* You have a cluster name provided by or coordinated with Union.
* You have a Kubernetes cluster, running one of the most recent three minor Kubernetes versions. [Learn more](https://kubernetes.io/releases/version-skew-policy/).
* Object storage provided by a vendor or an S3 compatible platform (such as [Minio](https://min.io)).

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the Union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider metal
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file specific to the provider that you specify, in this case `metal`, meaning "bare metal", or generic:

   ```bash
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | ORGANIZATION | HOST                               | CLUSTER                    | CLUSTERAUTHCLIENTID                             | CLUSTERAUTHCLIENTSECRET                                          | PROVIDER |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | xxxxxxxxxxx  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxx    |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   1 rows

   ✅ Generated <ORGNAME>-values.yaml
   ======================================================================
   Installation Instructions
   ======================================================================

   Step 1: Prepare your Kubernetes cluster.

   Step 2: Clone and navigate to helm-charts repository
     git clone https://github.com/unionai/helm-charts && cd helm-charts

   Step 3: Configure your S3-compatible storage endpoint & credentials in the values file

   Step 4: Install the data plane CRDs
     helm upgrade --install unionai-dataplane-crds charts/dataplane-crds

   Step 5: Install the data plane
     helm upgrade --install unionai-dataplane charts/dataplane \
       --namespace union \
       --values <ORGNAME>-values.yaml

   Step 6: Verify installation
     kubectl get pods -n union

   Step 7: Once you have your dataplane up and running, create API keys for your organization. If you have already just call the same command again to propogate the keys to new cluster:
     uctl create apikey --keyName EAGER_API_KEY --org <your-org-name>

   Step 8: You can now trigger v2 executions on this dataplane.
   ```

  * Save the secret that is displayed. Union does not store the credentials, rerunning the same command can be used to show same secret later which stream through the OAuth Apps provider.
  * Create the `EAGER_API_KEY` as instructed in Step 7 of the command output. This step is required for every dataplane you plan to use for V2 executions.

3.  Update the values file correctly:
    For example, `<UNION_FLYTE_ROLE_ARN>` is the ARN of the new IAM role created in the [AWS Cluster Recommendations](./cluster-recommendations#iam)

4. Optionally configure the resource `limits` and `requests` for the different services.
   By default, these will be set minimally, will vary depending on usage, and follow the Kubernetes `ResourceRequirements` specification.

   * `clusterresourcesync.resources`
   * `flytepropeller.resources`
   * `flytepropellerwebhook.resources`
   * `operator.resources`
   * `proxy.resources`

5. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

6. You can then register and run some example workflows through your cluster to ensure that it is working correctly.

   ```bash
   uctl register examples --project=union-health-monitoring --domain=development
   uctl validate snacks --project=union-health-monitoring --domain=development
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | NAME                 | LAUNCH PLAN NAME                  | VERSION  | STARTED AT                     | ELAPSED TIME | RESULT    | ERROR MESSAGE |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | alskkhcd6wx5m6cqjlwm | basics.hello_world.hello_world_wf | v0.3.341 | 2025-05-09T18:30:02.968183352Z | 4.452440953s | SUCCEEDED |               |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   1 rows
   ```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-aws ===

# Data plane setup on AWS

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

To set up your Union.ai data plane on Amazon Web Services (AWS), you provision and manage the compute resources in your own AWS account.

### **Self-managed deployment > Data plane setup on AWS > Manual setup on AWS**

Set up the data plane manually using AWS CloudFormation or the AWS console

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-aws/manual ===

# Manual setup on AWS

Union.ai's modular architecture allows for great flexibility and control.
The customer can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.  The Union architecture is described on the [Architecture](../architecture/_index) page.

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization.
* You have a cluster name provided by or coordinated with Union.
* You have a Kubernetes cluster, running one of the most recent three minor K8s versions.
  [Learn more](https://kubernetes.io/releases/version-skew-policy/)
* You have configured an S3 bucket.
* You have an IAM Role, Trust Policy and OIDC provider configured as indicated in the [AWS section in Cluster Recommendations](../cluster-recommendations#aws) section.

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider aws
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file specific to the provider that you specify, in this case `aws`:

   ```bash
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | ORGANIZATION | HOST                               | CLUSTER                    | CLUSTERAUTHCLIENTID                             | CLUSTERAUTHCLIENTSECRET                                          | PROVIDER |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | xxxxxxxxxxx  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxx    |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   1 rows

   ✅ Generated <ORGNAME>-values.yaml
   ======================================================================
   Installation Instructions
   ======================================================================

   Step 1: Setup the infrastucture on AWS. Our team can share terrform scripts to help with this.

   Step 2: Clone and navigate to helm-charts repository
     git clone https://github.com/unionai/helm-charts && cd helm-charts

   Step 3: Ensure S3 bucket & IAM roles are configured; set role ARN(s) in values

   Step 4: Install the data plane CRDs
     helm upgrade --install unionai-dataplane-crds charts/dataplane-crds

   Step 5: Install the data plane
     helm upgrade --install unionai-dataplane charts/dataplane \
       --namespace union \
       --values <ORGNAME>-values.yaml

   Step 6: Verify installation
     kubectl get pods -n union

   Step 7: Once you have your dataplane up and running, create API keys for your organization. If you have already just call the same command again to propogate the keys to new cluster:
     uctl create apikey --keyName EAGER_API_KEY --org <your-org-name>

   Step 8: You can now trigger v2 executions on this dataplane.
   ```
   * Save the secret that is displayed. Union does not store the credentials, rerunning the same command can be used to show same secret later which stream through the OAuth Apps provider.
   * Create the `EAGER_API_KEY` as instructed in Step 7 of the command output. This step is required for every dataplane you plan to use for v2 executions.

3. Update the values file correctly:
   For example, `<UNION_FLYTE_ROLE_ARN>` is the ARN of the new IAM role created in the [AWS Cluster Recommendations](../cluster-recommendations#iam)

4. Optionally configure the resource `limits` and `requests` for the different services.
   By default, these will be set minimally, will vary depending on usage, and follow the Kubernetes `ResourceRequirements` specification.

   * `clusterresourcesync.resources`
   * `flytepropeller.resources`
   * `flytepropellerwebhook.resources`
   * `operator.resources`
   * `proxy.resources`

5. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

6. You can then register and run some example workflows through your cluster to ensure that it is working correctly.

   ```bash
   uctl register examples --project=union-health-monitoring --domain=development
   uctl validate snacks --project=union-health-monitoring --domain=development
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | NAME                 | LAUNCH PLAN NAME                  | VERSION  | STARTED AT                     | ELAPSED TIME | RESULT    | ERROR MESSAGE |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | alskkhcd6wx5m6cqjlwm | basics.hello_world.hello_world_wf | v0.3.341 | 2025-05-09T18:30:02.968183352Z | 4.452440953s | SUCCEEDED |               |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   1 rows
   ```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-gcp ===

# Data plane setup on GKE (GCP)

Union.ai’s modular architecture allows for great flexibility and control.
The customer can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.
The Union architecture is described on the [Architecture](./architecture/_index) page.

> [!NOTE] These instructions cover installing Union.ai in an on-premise Kubernetes cluster.
> If you are installing at a cloud provider, use the cloud provider specific instructions: [AWS](./selfmanaged-aws/_index), [Azure](./selfmanaged-azure), [OCI](./selfmanaged-oci).

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization. (e.g. https://your-org-name.us-east-2.unionai.cloud).
* You have a Kubernetes cluster, running one of the most recent three minor Kubernetes versions. [Learn more](https://kubernetes.io/releases/version-skew-policy/).
* A GCS Bucket and Google Service Accounts that has access to
* Existing Kubernetes Service Accounts with access to the bucket or permissions to create Service Account bindings

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the Union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider gcp
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file specific to the provider that you specify, in this case `metal`, meaning "bare metal", or generic:

   ```bash
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | ORGANIZATION | HOST                               | CLUSTER                    | CLUSTERAUTHCLIENTID                             | CLUSTERAUTHCLIENTSECRET                                          | PROVIDER |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | xxxxxxxxxxx  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxx    |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   1 rows

   ✅ Generated <ORGNAME>-values.yaml
   ======================================================================
   Installation Instructions
   ======================================================================

   Step 1: Prepare your Kubernetes cluster.

   Step 2: Clone and navigate to helm-charts repository
     git clone https://github.com/unionai/helm-charts && cd helm-charts

   Step 3: Configure your S3-compatible storage endpoint & credentials in the values file

   Step 4: Install the data plane CRDs
     helm upgrade --install unionai-dataplane-crds charts/dataplane-crds

   Step 5: Install the data plane
     helm upgrade --install unionai-dataplane charts/dataplane \
       --namespace union \
       --values <ORGNAME>-values.yaml

   Step 6: Verify installation
     kubectl get pods -n union

   Step 7: Once you have your dataplane up and running, create API keys for your organization. If you have already just call the same command again to propogate the keys to new cluster:
     uctl create apikey --keyName EAGER_API_KEY --org <your-org-name>

   Step 8: You can now trigger v2 executions on this dataplane.
   ```

  * Save the secret that is displayed. Union does not store the credentials, rerunning the same command can be used to show same secret later which stream through the OAuth Apps provider.
  * Create the `EAGER_API_KEY` as instructed in Step 7 of the command output. This step is required for every dataplane you plan to use for V2 executions.

3.  Update the values file correctly:
    For example, `<UNION_FLYTE_ROLE_ARN>` is the ARN of the new IAM role created in the [AWS Cluster Recommendations](./cluster-recommendations#iam)

4. Optionally configure the resource `limits` and `requests` for the different services.
   By default, these will be set minimally, will vary depending on usage, and follow the Kubernetes `ResourceRequirements` specification.

   * `clusterresourcesync.resources`
   * `flytepropeller.resources`
   * `flytepropellerwebhook.resources`
   * `operator.resources`
   * `proxy.resources`

5. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

6. You can then register and run some example workflows through your cluster to ensure that it is working correctly.

   ```bash
   uctl register examples --project=union-health-monitoring --domain=development
   uctl validate snacks --project=union-health-monitoring --domain=development
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | NAME                 | LAUNCH PLAN NAME                  | VERSION  | STARTED AT                     | ELAPSED TIME | RESULT    | ERROR MESSAGE |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | alskkhcd6wx5m6cqjlwm | basics.hello_world.hello_world_wf | v0.3.341 | 2025-05-09T18:30:02.968183352Z | 4.452440953s | SUCCEEDED |               |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   1 rows
   ```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-azure ===

# Data plane setup on Azure

Union.ai’s modular architecture allows for great flexibility and control.
The customer can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.  The Union architecture is described on the [Architecture](./architecture/_index) page.

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization.
* You have a cluster name provided by or coordinated with Union.
* You have a Kubernetes cluster, running one of the most recent three minor K8s versions.
  [Learn more](https://kubernetes.io/releases/version-skew-policy/).
* You have configured a storage bucket.
* You have configured your AKS cluster as indicated in the [Cluster Recommendations](./cluster-recommendations#aks) section.

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the Union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider azure
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file specific to the provider that you specify, in this case `azure`:

   ```bash
     -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | ORGANIZATION | HOST                               | CLUSTER                    | CLUSTERAUTHCLIENTID                             | CLUSTERAUTHCLIENTSECRET                                          | PROVIDER |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | xxxxxxxxxxx  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxx    |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   1 rows

   ✅ Generated <ORGNAME>-values.yaml
   ======================================================================
   Installation Instructions
   ======================================================================

   Step 1: Setup the infrastucture on Azure. Our team can share terrform scripts to help with this.

   Step 2: Clone and navigate to helm-charts repository
     git clone https://github.com/unionai/helm-charts && cd helm-charts

   Step 3: Configure Azure Blob (stow) & Workload Identity client IDs in values

   Step 4: Install the data plane CRDs
     helm upgrade --install unionai-dataplane-crds charts/dataplane-crds

   Step 5: Install the data plane
     helm upgrade --install unionai-dataplane charts/dataplane \
       --namespace union \
       --values <ORGNAME>-values.yaml

   Step 6: Verify installation
     kubectl get pods -n union

   Step 7: Once you have your dataplane up and running, create API keys for your organization. If you have already just call the same command again to propogate the keys to new cluster:
     uctl create apikey --keyName EAGER_API_KEY --org <your-org-name>

   Step 8: You can now trigger v2 executions on this dataplane.
   ```

   * Save the secret that is displayed. Union does not store the credentials, rerunning the same command can be used to show same secret later which stream through the OAuth Apps provider.
   * Create the `EAGER_API_KEY` as instructed in Step 7 of the command output. This step is required for every dataplane you plan to use for V2 executions.

3. Update the values file correctly:
   For example, `<UNION_FLYTE_ROLE_ARN>` is the ARN of the new IAM role created in the [AWS Cluster Recommendations](./cluster-recommendations#iam)

4. Optionally configure the resource `limits` and `requests` for the different services.
   By default, these will be set minimally, will vary depending on usage, and follow the Kubernetes `ResourceRequirements` specification.

   * `clusterresourcesync.resources`
   * `flytepropeller.resources`
   * `flytepropellerwebhook.resources`
   * `operator.resources`
   * `proxy.resources`

5. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

6. You can then register and run some example workflows through your cluster to ensure that it is working correctly.

   ```bash
   uctl register examples --project=union-health-monitoring --domain=development
   uctl validate snacks --project=union-health-monitoring --domain=development
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | NAME                 | LAUNCH PLAN NAME                  | VERSION  | STARTED AT                     | ELAPSED TIME | RESULT    | ERROR MESSAGE |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | alskkhcd6wx5m6cqjlwm | basics.hello_world.hello_world_wf | v0.3.341 | 2025-05-09T18:30:02.968183352Z | 4.452440953s | SUCCEEDED |               |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   1 rows
   ```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-oci ===

# Data plane setup on OCI

Union.ai’s modular architecture allows for great flexibility and control.
The customer can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.
The Union architecture is described on the [Architecture](./architecture/_index) page.

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization.
* You have a cluster name provided by or coordinated with Union.
* You have a Kubernetes cluster, running one of the most recent three minor Kubernetes versions.
  [Learn more](https://kubernetes.io/releases/version-skew-policy/).
* You have configured a storage bucket.
* You have configured your OKE cluster as indicated in [Cluster Recommendations](./cluster-recommendations).

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider oci
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file specific to the provider that you specify, in this case `oci`:

   ```bash
     -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | ORGANIZATION | HOST                               | CLUSTER                    | CLUSTERAUTHCLIENTID                             | CLUSTERAUTHCLIENTSECRET                                          | PROVIDER |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   | xxxxxxxxxxx  | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | xxxxx    |
    -------------- ------------------------------------ ---------------------------- ------------------------------------------------- ------------------------------------------------------------------ ----------
   1 rows

   ✅ Generated <ORGNAME>-values.yaml
   ======================================================================
   Installation Instructions
   ======================================================================

   Step 1: Setup the infrastucture on OCI. Our team can share terrform scripts to help with this.

   Step 2: Clone and navigate to helm-charts repository
     git clone https://github.com/unionai/helm-charts && cd helm-charts

   Step 3: Ensure storage bucket & Access keys are configured; in values

   Step 4: Install the data plane CRDs
     helm upgrade --install unionai-dataplane-crds charts/dataplane-crds

   Step 5: Install the data plane
     helm upgrade --install unionai-dataplane charts/dataplane \
       --namespace union \
       --values <ORGNAME>-values.yaml

   Step 6: Verify installation
     kubectl get pods -n union

   Step 7: Once you have your dataplane up and running, create API keys for your organization. If you have already just call the same command again to propogate the keys to new cluster:
     uctl create apikey --keyName EAGER_API_KEY --org <your-org-name>

   Step 8: You can now trigger v2 executions on this dataplane.
   ```
   * Save the secret that is displayed. Union does not store the credentials, rerunning the same command can be used to show same secret later which stream through the Oauth Apps provider.
   * Create the `EAGER_API_KEY` as instructed in Step 7 of the command output. This step is required for every dataplane you plan to use for V2 executions.

3.  Update the values file correctly:
    For example, `<UNION_FLYTE_ROLE_ARN>` is the ARN of the new IAM role created in the [AWS Cluster Recommendations](./cluster-recommendations#iam)

4. Optionally configure the resource `limits` and `requests` for the different services.
   By default, these will be set minimally, will vary depending on usage, and follow the Kubernetes `ResourceRequirements` specification.

   * `clusterresourcesync.resources`
   * `flytepropeller.resources`
   * `flytepropellerwebhook.resources`
   * `operator.resources`
   * `proxy.resources`

5. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

6. You can then register and run some example workflows through your cluster to ensure that it is working correctly.

   ```bash
   uctl register examples --project=union-health-monitoring --domain=development
   uctl validate snacks --project=union-health-monitoring --domain=development
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | NAME                 | LAUNCH PLAN NAME                  | VERSION  | STARTED AT                     | ELAPSED TIME | RESULT    | ERROR MESSAGE |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   | alskkhcd6wx5m6cqjlwm | basics.hello_world.hello_world_wf | v0.3.341 | 2025-05-09T18:30:02.968183352Z | 4.452440953s | SUCCEEDED |               |
    ---------------------- ----------------------------------- ---------- -------------------------------- -------------- ----------- ---------------
   1 rows
   ```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration ===

# Advanced Configurations

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

This section covers the configuration of union features on your Union.ai cluster.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/node-pools ===

# Configuring Service and Worker Node Pools

As a best practice, we recommend using separate node pools for the Union services and the Union worker pods. This allows
you to guard against resource contention between Union services and other tasks running in your cluster.

Start by creating two node pools in your cluster. One for the Union services and one for the Union worker pods.
Configure the node pool for the Union services with the `union.ai/node-role: services` label.  The worker pool will
be configured with the `union.ai/node-role: worker` label.  You will also need to taint the nodes in the service and
worker pools to ensure that only the appropriate pods are scheduled on them.

The nodes for Union services should be tainted with:

```bash
kubectl taint nodes <node-name> union.ai/node-role=services:NoSchedule
```
The nodes for execution workers should be tainted with:

```bash
kubectl taint nodes <node-name> union.ai/node-role=worker:NoSchedule
```

Vendor interfaces and provisioning tools may support tainting nodes automatically through configuration options.

Set the scheduling constraints for the Union services in your values file:

```yaml
scheduling:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: union.ai/node-role
            operator: In
            values:
            - services
  tolerations:
    - effect: NoSchedule
      key: union.ai/node-role
      operator: Equal
      value: services
```

To ensure that your worker processes are scheduled on the worker node pool, set the following for the Flyte kubernetes plugin:

```yaml
config:
  k8s:
    plugins:
      k8s:
        default-affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: union.ai/node-role
                  operator: In
                  values:
                  - worker
        default-tolerations:
          - effect: NoSchedule
            key: union.ai/node-role
            operator: Equal
            value: worker
```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/authentication ===

# Authentication

Union.ai uses [OpenID Connect (OIDC)](https://openid.net/specs/openid-connect-core-1_0.html) for user authentication and [OAuth 2.0](https://tools.ietf.org/html/rfc6749) for service-to-service authorization. You must configure an external Identity Provider (IdP) to enable authentication on your deployment.

## Overview

Authentication is enforced at two layers:

1. **Ingress layer** — The control plane nginx ingress validates every request to protected routes via an auth subrequest to the `/me` endpoint.
2. **Application layer** — `flyteadmin` manages browser sessions, validates tokens, and exposes OIDC discovery endpoints.

The following diagram shows how these layers interact for browser-based authentication:

```mermaid
sequenceDiagram
    participant B as Browser
    participant N as Nginx Ingress
    participant F as Flyteadmin
    participant IdP as Identity Provider
    B->>N: Request protected route
    N->>F: Auth subrequest (GET /me)
    F-->>N: 401 (no session)
    N-->>B: 302 → /login
    B->>F: GET /login (unprotected)
    F-->>B: 302 → IdP authorize endpoint
    B->>IdP: Authenticate (PKCE)
    IdP-->>B: 302 → /callback?code=...
    B->>F: GET /callback (exchange code)
    F->>IdP: Exchange code for tokens
    F-->>B: Set-Cookie + 302 → original URL
    B->>N: Retry with session cookie
    N->>F: Auth subrequest (GET /me)
    F-->>N: 200 OK
    N-->>B: Forward to backend service
```

## Prerequisites

- A Union.ai deployment with the control plane installed.
- An OIDC-compliant Identity Provider (IdP).
- Access to create OAuth applications in your IdP.
- A secret management solution for delivering client secrets to pods (e.g., External Secrets Operator with AWS Secrets Manager, HashiCorp Vault, or native Kubernetes secrets).

## Configuring your Identity Provider

You must create three OAuth applications in your IdP:

| Application | Type | Grant Types | Purpose |
|---|---|---|---|
| Web app (browser login) | Web | `authorization_code` | Console/web UI authentication |
| Native app (SDK/CLI) | Native (PKCE) | `authorization_code`, `device_code` | SDK and CLI authentication |
| Service app (internal) | Service | `client_credentials` | All service-to-service communication |

> [!NOTE]
> A single service app is shared by both control plane and dataplane services. If your security policy requires separate credentials per component, you can create additional service apps, but the configuration below assumes a single shared client.

### Authorization server setup

1. Create a custom authorization server in your IdP (or use the default).
2. Add a scope named `all`.
3. Add an access policy that allows all registered clients listed above.
4. Add a policy rule that permits `authorization_code`, `client_credentials`, and `device_code` grant types.
5. Note the **Issuer URI** (e.g., `https://your-idp.example.com/oauth2/<server-id>`).
6. Note the **Token endpoint** (e.g., `https://your-idp.example.com/oauth2/<server-id>/v1/token`).

### Application details

#### 1. Web application (browser login)

- **Type**: Web Application
- **Sign-on method**: OIDC
- **Grant types**: `authorization_code`
- **Sign-in redirect URI**: `https://<your-domain>/callback`
- **Sign-out redirect URI**: `https://<your-domain>/logout`
- Note the **Client ID** → used as `OIDC_CLIENT_ID`
- Note the **Client Secret** → stored in `flyte-admin-secrets` (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**)

#### 2. Native application (SDK/CLI)

- **Type**: Native Application
- **Sign-on method**: OIDC
- **Grant types**: `authorization_code`, `urn:ietf:params:oauth:grant-type:device_code`
- **Sign-in redirect URI**: `http://localhost:53593/callback`
- **Require PKCE**: Always
- **Consent**: Trusted (skip consent screen)
- Note the **Client ID** → used as `CLI_CLIENT_ID` (no secret needed for public clients)

#### 3. Service application (internal)

- **Type**: Service (machine-to-machine)
- **Grant types**: `client_credentials`
- Note the **Client ID** → used as `INTERNAL_CLIENT_ID` (control plane) and `AUTH_CLIENT_ID` (dataplane)
- Note the **Client Secret** → stored in multiple Kubernetes secrets (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**)

## Control plane Helm configuration

The control plane Helm chart requires auth configuration in several sections. All examples below use the global variables defined in `values.<cloud>.selfhosted-intracluster.yaml`.

### Global variables

Set these in your customer overrides file:

```yaml
global:
  OIDC_BASE_URL: "<issuer-uri>"             # e.g. "https://your-idp.example.com/oauth2/default"
  OIDC_CLIENT_ID: "<web-app-client-id>"     # Browser login
  CLI_CLIENT_ID: "<native-app-client-id>"   # SDK/CLI
  INTERNAL_CLIENT_ID: "<service-client-id>" # Service-to-service
  AUTH_TOKEN_URL: "<token-endpoint>"         # e.g. "https://your-idp.example.com/oauth2/default/v1/token"
```

### Flyteadmin OIDC configuration

Configure `flyteadmin` to act as the OIDC relying party. This enables the `/login`, `/callback`, `/me`, and `/logout` endpoints:

```yaml
flyte:
  configmap:
    adminServer:
      server:
        security:
          useAuth: true
      auth:
        grpcAuthorizationHeader: flyte-authorization
        httpAuthorizationHeader: flyte-authorization
        authorizedUris:
          - "http://flyteadmin:80"
          - "http://flyteadmin.<namespace>.svc.cluster.local:80"
        appAuth:
          authServerType: External
          externalAuthServer:
            baseUrl: "<issuer-uri>"
          thirdPartyConfig:
            flyteClient:
              clientId: "<native-app-client-id>"
              redirectUri: "http://localhost:53593/callback"
              scopes:
                - all
        userAuth:
          openId:
            baseUrl: "<issuer-uri>"
            clientId: "<web-app-client-id>"
            scopes:
              - profile
              - openid
              - offline_access
          cookieSetting:
            sameSitePolicy: LaxMode
            domain: ""
          idpQueryParameter: idp
```

Key settings:

- `useAuth: true` — registers the `/login`, `/callback`, `/me`, and `/logout` HTTP endpoints. **Required** for auth to function.
- `authServerType: External` — use your IdP as the authorization server (not flyteadmin's built-in server).
- `grpcAuthorizationHeader: flyte-authorization` — the header name used for bearer tokens. Both the SDK and internal services use this header.

### Flyteadmin and scheduler admin SDK client

Flyteadmin and the scheduler use the admin SDK to communicate with other control plane services. Configure client credentials so these calls are authenticated:

```yaml
flyte:
  configmap:
    adminServer:
      admin:
        clientId: "<service-client-id>"
        clientSecretLocation: "/etc/secrets/client_secret"
```

The secret is mounted from the `flyte-admin-secrets` Kubernetes secret (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**).

### Scheduler auth secret

The flyte-scheduler mounts a separate Kubernetes secret (`flyte-secret-auth`) at `/etc/secrets/`. Enable this mount:

```yaml
flyte:
  secrets:
    adminOauthClientCredentials:
      enabled: true
      clientSecret: "placeholder"
```

> [!NOTE]
> Setting `clientSecret: "placeholder"` causes the subchart to render the `flyte-secret-auth` Kubernetes Secret. Use External Secrets Operator with `creationPolicy: Merge` to overwrite the placeholder with the real credential, or create the secret directly before installing the chart.

### Service-to-service authentication

Control plane services communicate through nginx and need OAuth tokens. Configure the admin SDK client credentials and the union service auth:

```yaml
configMap:
  admin:
    clientId: "<service-client-id>"
    clientSecretLocation: "/etc/secrets/union/client_secret"
  union:
    auth:
      enable: true
      type: ClientSecret
      clientId: "<service-client-id>"
      clientSecretLocation: "/etc/secrets/union/client_secret"
      tokenUrl: "<token-endpoint>"
      authorizationMetadataKey: flyte-authorization
      scopes:
        - all
```

The secret is mounted from the control plane service secret (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**).

### Executions service

The executions service has its own admin client connection that also needs auth:

```yaml
services:
  executions:
    configMap:
      executions:
        app:
          adminClient:
            connection:
              authorizationHeader: flyte-authorization
              clientId: "<service-client-id>"
              clientSecretLocation: "/etc/secrets/union/client_secret"
              tokenUrl: "<token-endpoint>"
              scopes:
                - all
```

### Ingress auth annotations

The control plane ingress uses nginx auth subrequests to enforce authentication. These annotations are set on protected ingress routes:

```yaml
ingress:
  protectedIngressAnnotations:
    nginx.ingress.kubernetes.io/auth-url: "https://$host/me"
    nginx.ingress.kubernetes.io/auth-signin: "https://$host/login?redirect_url=$escaped_request_uri"
    nginx.ingress.kubernetes.io/auth-response-headers: "Set-Cookie"
    nginx.ingress.kubernetes.io/auth-cache-key: "$http_flyte_authorization$http_cookie"
  protectedIngressAnnotationsGrpc:
    nginx.ingress.kubernetes.io/auth-url: "https://$host/me"
    nginx.ingress.kubernetes.io/auth-response-headers: "Set-Cookie"
    nginx.ingress.kubernetes.io/auth-cache-key: "$http_authorization$http_flyte_authorization$http_cookie"
```

For every request to a protected route, nginx makes a subrequest to `/me`. If flyteadmin returns 200 (valid session or token), the request is forwarded. If 401, the user is redirected to `/login` for browser clients, or the 401 is returned directly for API clients.

## Dataplane Helm configuration

When the control plane has OIDC enabled, the dataplane must also authenticate. All dataplane services use the same service app credentials (`AUTH_CLIENT_ID`), which is the same client as `INTERNAL_CLIENT_ID` on the control plane.

### Dataplane global variables

```yaml
global:
  AUTH_CLIENT_ID: "<service-client-id>"  # Same as INTERNAL_CLIENT_ID
```

### Cluster resource sync

```yaml
clusterresourcesync:
  config:
    union:
      auth:
        enable: true
        type: ClientSecret
        clientId: "<service-client-id>"
        clientSecretLocation: "/etc/union/secret/client_secret"
        authorizationMetadataKey: flyte-authorization
        tokenRefreshWindow: 5m
```

### Operator (union service auth)

```yaml
config:
  union:
    auth:
      enable: true
      type: ClientSecret
      clientId: "<service-client-id>"
      clientSecretLocation: "/etc/union/secret/client_secret"
      authorizationMetadataKey: flyte-authorization
      tokenRefreshWindow: 5m
```

### Propeller admin client

```yaml
config:
  admin:
    admin:
      clientId: "<service-client-id>"
      clientSecretLocation: "/etc/union/secret/client_secret"
```

### Executor (eager mode)

Injects the `EAGER_API_KEY` secret into task pods for authenticated eager-mode execution:

```yaml
executor:
  config:
    unionAuth:
      injectSecret: true
      secretName: EAGER_API_KEY
```

### Dataplane secrets

Enable the `union-secret-auth` Kubernetes secret mount for dataplane pods:

```yaml
secrets:
  admin:
    enable: true
    create: false
    clientId: "<service-client-id>"
    clientSecret: "placeholder"
```

> [!NOTE]
> `create: false` means the chart does not create the `union-secret-auth` Kubernetes Secret. You must provision it externally (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**). Setting `clientSecret: "placeholder"` with `create: true` is also supported if you want the chart to create the secret and then overwrite it via External Secrets Operator.

## Secret delivery

Client secrets must be delivered to pods as files mounted into the container filesystem. The table below lists the required Kubernetes secrets, their mount paths, and which components use them:

| Kubernetes Secret | Mount Path | Components | Namespace |
| --- | --- | --- | --- |
| `flyte-admin-secrets` | `/etc/secrets/` | flyteadmin | `union-cp` |
| `flyte-secret-auth` | `/etc/secrets/` | flyte-scheduler | `union-cp` |
| Control plane service secret | `/etc/secrets/union/` | executions, cluster, usage, and other CP services | `union-cp` |
| `union-secret-auth` | `/etc/union/secret/` | operator, propeller, CRS | `union` |

All secrets must contain a key named `client_secret` with the service app's OAuth client secret value.

### Option A: External Secrets Operator (recommended)

If you use [External Secrets Operator (ESO)](https://external-secrets.io/) with a cloud secret store, create `ExternalSecret` resources that sync the client secret into each Kubernetes secret:

```yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: flyte-admin-secrets-auth
  namespace: union-cp
spec:
  secretStoreRef:
    name: default
    kind: SecretStore
  refreshInterval: 1h
  target:
    name: flyte-admin-secrets
    creationPolicy: Merge
    deletionPolicy: Retain
  data:
    - secretKey: client_secret
      remoteRef:
        key: "<your-secret-store-key>"
---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: flyte-secret-auth
  namespace: union-cp
spec:
  secretStoreRef:
    name: default
    kind: SecretStore
  refreshInterval: 1h
  target:
    name: flyte-secret-auth
    creationPolicy: Merge
    deletionPolicy: Retain
  data:
    - secretKey: client_secret
      remoteRef:
        key: "<your-secret-store-key>"
---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: union-secret-auth
  namespace: union
spec:
  secretStoreRef:
    name: default
    kind: SecretStore
  refreshInterval: 1h
  target:
    name: union-secret-auth
    creationPolicy: Merge
    deletionPolicy: Retain
  data:
    - secretKey: client_secret
      remoteRef:
        key: "<your-secret-store-key>"
```

> [!NOTE]
> `creationPolicy: Merge` ensures the ExternalSecret adds the `client_secret` key alongside any existing keys in the target secret.

### Option B: Direct Kubernetes secrets

If you manage secrets directly:

```bash
# Control plane — flyteadmin
kubectl create secret generic flyte-admin-secrets \
  --from-literal=client_secret='<SERVICE_CLIENT_SECRET>' \
  -n union-cp

# Control plane — scheduler
kubectl create secret generic flyte-secret-auth \
  --from-literal=client_secret='<SERVICE_CLIENT_SECRET>' \
  -n union-cp

# Control plane — union services (add to existing secret)
kubectl create secret generic union-controlplane-secrets \
  --from-literal=pass.txt='<DB_PASSWORD>' \
  --from-literal=client_secret='<SERVICE_CLIENT_SECRET>' \
  -n union-cp --dry-run=client -o yaml | kubectl apply -f -

# Dataplane — operator, propeller, CRS
kubectl create secret generic union-secret-auth \
  --from-literal=client_secret='<SERVICE_CLIENT_SECRET>' \
  -n union
```

## SDK and CLI authentication

The SDK and CLI use PKCE (Proof Key for Code Exchange) for interactive authentication:

1. The SDK calls `AuthMetadataService/GetPublicClientConfig` (an unprotected endpoint) to discover the `flytectl` client ID and redirect URI.
2. The SDK opens a browser to the IdP's authorize endpoint with a PKCE challenge.
3. The user authenticates in the browser.
4. The IdP redirects to `localhost:53593/callback` with an authorization code.
5. The SDK exchanges the code for tokens and stores them locally.
6. Subsequent requests include the token in the `flyte-authorization` header.

No additional SDK configuration is required beyond the standard `uctl` or Union config:

```yaml
admin:
  endpoint: dns:///<your-domain>
  authType: Pkce
  insecure: false
```

For headless environments (CI/CD), use the **Self-managed deployment > Advanced Configurations > Authentication > SDK and CLI authentication > Client credentials for CI/CD** flow instead.

### Client credentials for CI/CD

For automated pipelines, create a service app in your IdP and configure:

```yaml
admin:
  endpoint: dns:///<your-domain>
  authType: ClientSecret
  clientId: "<your-ci-client-id>"
  clientSecretLocation: "/path/to/client_secret"
```

Or use environment variables:

```bash
export FLYTE_CREDENTIALS_CLIENT_ID="<your-ci-client-id>"
export FLYTE_CREDENTIALS_CLIENT_SECRET="<your-ci-client-secret>"
export FLYTE_CREDENTIALS_AUTH_MODE=basic
```

## Troubleshooting

### Browser login redirects in a loop

Verify that `useAuth: true` is set in `flyte.configmap.adminServer.server.security`. Without this, the `/login`, `/callback`, and `/me` endpoints are not registered.

### SDK gets 401 Unauthenticated

1. Check that the `AuthMetadataService` routes are in the **unprotected** ingress (no auth-url annotation).
2. Verify the SDK can reach the token endpoint. The SDK discovers it via `AuthMetadataService/GetOAuth2Metadata`.
3. Check that `grpcAuthorizationHeader` matches the header name used by the SDK (`flyte-authorization`).

### Internal services get 401

1. Verify that `configMap.union.auth.enable: true` and the `client_secret` file exists at the configured `clientSecretLocation`.
2. Check `ExternalSecret` sync status: `kubectl get externalsecret -n <namespace>`.
3. Verify the secret contains the correct key: `kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.client_secret}' | base64 -d`.

### Operator or propeller cannot authenticate

1. Verify `union-secret-auth` exists in the dataplane namespace and contains `client_secret`.
2. Check operator logs for auth errors: `kubectl logs -n union -l app.kubernetes.io/name=operator --tail=50 | grep -i auth`.
3. Verify the `AUTH_CLIENT_ID` matches the control plane's `INTERNAL_CLIENT_ID`.
4. Verify the service app is included in the authorization server's access policy.

### Scheduler fails to start

1. Verify `flyte-secret-auth` exists in the control plane namespace: `kubectl get secret flyte-secret-auth -n union-cp`.
2. Check that `flyte.secrets.adminOauthClientCredentials.enabled: true` is set.
3. Check scheduler logs: `kubectl logs -n union-cp deploy/flytescheduler --tail=50`.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/code-viewer ===

# Code Viewer

The Union UI allows you to view the exact code that executed a specific task. Union securely transfers the [code bundle](https://www.union.ai/docs/v2/union/user-guide/run-scaling/life-of-a-run) directly to your browser without routing it through the control plane.

![Code Viewer](https://www.union.ai/docs/v2/union/_static/images/deployment/configuration/code-viewer/demo.png)

## Enable CORS policy on your fast registration bucket

To support this feature securely, your bucket must allow CORS access from Union. The configuration steps vary depending on your cloud provider.

### AWS S3 Console

1. Open the AWS Console.
2. Navigate to the S3 dashboard.
3. Select your fast registration bucket. By default, this is the same as the metadata bucket configured during initial deployment.
4. Click the **Permissions** tab and scroll to **Cross-origin resource sharing (CORS)**.
5. Click **Edit** and enter the following policy:
![S3 CORS Policy](https://www.union.ai/docs/v2/union/_static/images/deployment/configuration/code-viewer/s3.png)

```
[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "HEAD",
        ],
        "AllowedOrigins": [
            "https://*.unionai.cloud"
        ],
        "ExposeHeaders": [
            "ETag"
        ],
        "MaxAgeSeconds": 3600
    }
]
```

For more details, see the [AWS S3 CORS documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/cors.html).

### Google GCS

Google Cloud Storage requires CORS configuration via the command line.

1. Create a `cors.json` file with the following content:
    ```json
    [
        {
        "origin": ["https://*.unionai.cloud"],
        "method": ["HEAD", "GET"],
        "responseHeader": ["ETag"],
        "maxAgeSeconds": 3600
        }
    ]
    ```
2. Apply the CORS configuration to your bucket:
    ```bash
    gcloud storage buckets update gs://<fast_registration_bucket> --cors-file=cors.json
    ```
3. Verify the configuration was applied:
   ```bash
   gcloud storage buckets describe gs://<fast_registration_bucket> --format="default(cors_config)"

   cors_config:
   - maxAgeSeconds: 3600
     method:
     - GET
     - HEAD
     origin:
     - https://*.unionai.cloud
     responseHeader:
     - ETag
   ```
For more details, see the [Google Cloud Storage CORS documentation](https://docs.cloud.google.com/storage/docs/using-cors#command-line).

### Azure Storage

For Azure Storage CORS configuration, see the [Azure Storage CORS documentation](https://learn.microsoft.com/en-us/rest/api/storageservices/cross-origin-resource-sharing--cors--support-for-the-azure-storage-services).

## Troubleshooting

| Error Message | Cause | Fix |
|---------------|-------|-----|
| `Not available: No code available for this action.` | The task does not have a code bundle. This occurs when the code is baked into the Docker image or the task is not a code-based task. | This is expected behavior for tasks without code bundles. |
| `Not Found: The code bundle file could not be found. This may be due to your organization's data retention policy.` | The code bundle was deleted from the bucket, likely due to a retention policy. | Review your fast registration bucket's retention policy settings. |
| `Error: Code download is blocked by your storage bucket's configuration. Please contact your administrator to enable access.` | CORS is not configured on the bucket. | Configure CORS on your bucket using the instructions above. |

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/image-builder ===

# Image Builder

Union Image Builder supports the ability to build container images within the dataplane. This enables the use of the `remote` builder type for any defined [Container Image](https://www.union.ai/docs/v2/union/user-guide/task-configuration/container-images).

Configure the use of remote image builder:
```bash
flyte create config --builder=remote --endpoint...
```

Write custom [container images](https://www.union.ai/docs/v2/union/user-guide/task-configuration/container-images):
```python
env = flyte.TaskEnvironment(
    name="hello_v2",
    image=flyte.Image.from_debian_base()
        .with_pip_packages("<package 1>", "<package 2>")
)
```

> By default, Image Builder is disabled and has to be enabled by configuring the builder type to `remote` in flyte config

## Requirements

* The image building process runs in the target run's project and domain. Any image push secrets needed to push images to the registry will need to be accessible from the project & domain where the build happens.

## Configuration

Image Builder is configured directly through Helm values.

```yaml
imageBuilder:

  # Enable Image Builder
  enabled: true

  # -- The config map build-image container task attempts to reference.
  # -- Should not change unless coordinated with Union technical support.
  targetConfigMapName: "build-image-config"

  # -- The URI of the buildkitd service. Used for externally managed buildkitd services.
  # -- Leaving empty and setting imageBuilder.buildkit.enabled to true will create a buildkitd service and configure the Uri appropriately.
  # -- E.g. "tcp://buildkitd.buildkit.svc.cluster.local:1234"
  buildkitUri: ""

  # -- The default repository to publish images to when "registry" is not specified in ImageSpec.
  # -- Note, the build-image task will fail unless "registry" is specified or a default repository is provided.
  defaultRepository: ""

  # -- How build-image task and operator proxy will attempt to authenticate against the default #    repository.
  # -- Supported values are "noop", "google", "aws", "azure"
  # -- "noop" no authentication is attempted
  # -- "google" uses docker-credential-gcr to authenticate to the default registry
  # -- "aws" uses docker-credential-ecr-login to authenticate to the default registry
  # -- "azure" uses az acr login to authenticate to the default registry. Requires Azure Workload Identity to be enabled.
  authenticationType: "noop"

  buildkit:

    # -- Enable buildkit service within this release.
    enabled: true

    # Configuring Union managed buildkitd Kubernetes resources.
    ...
```

## Authentication

### AWS

By default, Union is intended to be configured to use [IAM roles for service accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) for authentication. Setting `authenticationType` to `aws` configures Union image builder related services to use AWS default credential chain. Additionally, Union image builder uses [`docker-credential-ecr-login`](https://github.com/awslabs/amazon-ecr-credential-helper) to authenticate to the ecr repository configured with `defaultRepository`.

`defaultRepository` should be the fully qualified ECR repository name, e.g. `<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/<REPOSITORY_NAME>`.

Therefore, it is necessary to configure the user role with the following permissions.

```json
{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken"
  ],
  "Resource": "*"
},
{
  "Effect": "Allow",
  "Action": [
    "ecr:BatchCheckLayerAvailability",
    "ecr:BatchGetImage",
    "ecr:GetDownloadUrlForLayer"
  ],
  "Resource": "*"
  // Or
  // "Resource": "arn:aws:ecr:<AWS_REGION>:<AWS_ACCOUNT_ID>:repository/<REPOSITORY>"
}
```

Similarly, the `operator-proxy` requires the following permissions

```json
{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken"
  ],
  "Resource": "*"
},
{
  "Effect": "Allow",
  "Action": [
    "ecr:DescribeImages"
  ],
  "Resource": "arn:aws:ecr:<AWS_REGION>:<AWS_ACCOUNT_ID>:repository/<REPOSITORY>"
}
```

#### AWS Cross Account access

Access to repositories that do not exist in the same AWS account as the data plane requires additional ECR resource-based permissions. An ECR policy like the following is required if the configured `defaultRepository` or `ImageSpec`'s `registry` exists in an AWS account different from the dataplane's.

```json
{
  "Statement": [
    {
      "Sid": "AllowPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<user-role>",
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<node-role>",
          // ... Additional roles that require image pulls
        ]
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ]
    },
    {
      "Sid": "AllowDescribeImages",
      "Action": [
        "ecr:DescribeImages"
      ],
      "Principal": {
        "AWS": [
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<operator-proxy-role>",
        ]
      },
      "Effect": "Allow"
    },
    {
      "Sid": "ManageRepositoryContents"
      // ...
    }
  ],
  "Version": "2012-10-17"
}
```

In order to support a private ImageSpec `base_image` the following permissions are required.

```json
{
  "Statement": [
    {
      "Sid": "AllowPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<user-role>",
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<node-role>",
          // ... Additional roles that require image pulls
        ]
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ]
    },
  ]
}
```

### Google Cloud Platform

By default, GCP uses [Kubernetes Service Accounts to GCP IAM](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#kubernetes-sa-to-iam) for authentication. Setting `authenticationType` to `google` configures Union image builder related services to use GCP default credential chain. Additionally, Union image builder uses [`docker-credential-gcr`](https://github.com/GoogleCloudPlatform/docker-credential-gcr) to authenticate to the Google artifact registries referenced by `defaultRepository`.

`defaultRepository` should be the full name to the repository in combination with an optional image name prefix. `<GCP_LOCATION>-docker.pkg.dev/<GCP_PROJECT_ID>/<REPOSITORY_NAME>/<IMAGE_PREFIX>`.

It is necessary to configure the GCP user service account with `iam.serviceAccounts.signBlob` project level permissions.

#### GCP Cross Project access

Access to registries that do not exist in the same GCP project as the data plane requires additional GCP permissions.

* Configure the user "role" service account with the `Artifact Registry Writer`.
* Configure the GCP worker node and union-operator-proxy service accounts with the `Artifact Registry Reader` role.

### Azure

By default, Union is designed to use Azure [Workload Identity Federation](https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) for authentication using [user-assigned managed identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azp) in place of AWS IAM roles.

* Configure the user "role" user-assigned managed identity with the `AcrPush` role.
* Configure the Azure kubelet identity ID and operator-proxy user-assigned managed identities with the `AcrPull` role.

### Private registries

Follow guidance in this section to integrate Image Builder with private registries:

#### GitHub Container Registry

1. Follow the [GitHub guide](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry) to log in to the registry locally.
2. Create a Union secret:
```bash
flyte create secret --type image_pull --from-docker-config --registries ghcr.io SECRET_NAME
```

> This secret will be available to all projects and domains in your tenant. [Learn more about Union Secrets](./union-secrets)
> Check alternative ways to create image pull secrets in the [API reference](https://www.union.ai/docs/v2/union/api-reference/flyte-cli)

3. Reference this secret in the Image object:

```python
env = flyte.TaskEnvironment(
    name="hello_v2",
    # Allow image builder to pull and push from the private registry. `registry` field isn't required if it's configured
    # as the default registry in imagebuilder section in the helm chart values file.
    image=flyte.Image.from_debian_base(registry="<my registry url>", name="private", registry_secret="<YOUR_SECRET_NAME>")
        .with_pip_packages("<package 1>", "<package 2>"),
    # Mount the same secret to allow tasks to pull that image
    secrets=["<YOUR_SECRET_NAME>"]
)
```

This will enable Image Builder to push images and layers to a private GHCR. It'll also allow pods for this task environment to pull
this image at runtime.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/multi-cluster ===

# Multiple Clusters

Union enables you to integrate multiple Kubernetes clusters into a single Union control plane using the `clusterPool` abstraction.

Currently, the clusterPool configuration is performed by Union in the control plane when you provide the mapping between clusterPool name and clusterNames using the following structure:

```yaml
clusterPoolname:
  - clusterName
```
With `clusterName` matching the name you used to install the Union operator Helm chart.

You can have as many cluster pools as needed:

**Example**

```yaml
default: # this is the clusterPool where executions will run, unless another mapping specified
  - my-dev-cluster
development-cp:
  - my-dev-cluster
staging-cp:
  - my-staging-cluster
production-cp:
  - production-cluster-1
  - production-cluster-2
dr-region:
  - dr-site-cluster
```

## Using cluster pools

Once the Union team configures the clusterPools in the control plane, you can proceed to configure mappings:

### project-domain-clusterPool mapping

1. Create a YAML file that includes the project, domain, and clusterPool:

**Example: cpa-dev.yaml**

```yaml
domain: development
project: flytesnacks
clusterPoolName: development-cp
```

2. Update the control plane with this mapping:

```bash
uctl update cluster-pool-attributes --attrFile cpa-dev.yaml
```
3. New executions in `flytesnacks-development` should now run in the `my-dev-cluster`

### project-domain-workflow-clusterPool mapping

1. Create a YAML file that includes the project, domain, and clusterPool:

**Example: cpa-dev.yaml**

```yaml
domain: production
project: flytesnacks
workflow: my_critical_wf
clusterPoolName: production-cp
```

2. Update the control plane with this mapping:

```bash
uctl update cluster-pool-attributes --attrFile cpa-prod.yaml
```
3. New executions of the `my_critical_wf` workflow in `flytesnacks-production` should now run in any of the clusters under `production-cp`

## Data sharing between cluster pools

The sharing of metadata is controlled by the cluster pool to which a cluster belongs. If two clusters are in the same cluster pool, then they must share the same metadata bucket, defined in the Helm values as `storage.bucketName`.

If they are in different cluster pools, then they **must** have different metadata buckets. You could, for example, have a single metadata bucket for all your development clusters, and a separate one for all your production clusters, by grouping the clusters into cluster pools accordingly.

 Alternatively you could have a separate metadata bucket for each cluster, by putting each cluster in its own cluster pool.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/persistent-logs ===

# Persistent logs

Persistent logging is enabled by default. The data plane deploys [FluentBit](https://fluentbit.io/) as a DaemonSet that collects container logs from every node and writes them to the `persisted-logs/` path in the object store configured for your data plane.

FluentBit runs under the `fluentbit-system` Kubernetes service account. This service account must have write access to the storage bucket so FluentBit can push logs. The sections below describe how to grant that access on each cloud provider.

## AWS (IRSA)

On EKS, use [IAM Roles for Service Accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) to grant the FluentBit service account permission to write to S3.

### 1. Create an IAM policy

Create an IAM policy that allows writing to your metadata S3 bucket:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>",
        "arn:aws:s3:::<BUCKET_NAME>/persisted-logs/*"
      ]
    }
  ]
}
```

Replace `<BUCKET_NAME>` with the name of your data plane metadata bucket.

### 2. Create an IAM role with a trust policy

Create an IAM role that trusts the EKS OIDC provider and is scoped to the `fluentbit-system` service account:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/<OIDC_PROVIDER>"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "<OIDC_PROVIDER>:sub": "system:serviceaccount:<NAMESPACE>:fluentbit-system",
          "<OIDC_PROVIDER>:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}
```

Replace:

- `<ACCOUNT_ID>` with your AWS account ID
- `<OIDC_PROVIDER>` with your EKS cluster's OIDC provider (e.g. `oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE`)
- `<NAMESPACE>` with the namespace where the data plane is installed (default: `union`)

You can retrieve the OIDC provider URL with:

```bash
aws eks describe-cluster --name <CLUSTER_NAME> --region <REGION> \
  --query "cluster.identity.oidc.issuer" --output text
```

Attach the IAM policy from step 1 to this role.

### 3. Configure the Helm values

Set the IRSA annotation on the FluentBit service account in your data plane Helm values:

```yaml
fluentbit:
  serviceAccount:
    name: fluentbit-system
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT_ID>:role/<FLUENTBIT_ROLE_NAME>"
```

## Azure (Workload Identity Federation)

On AKS, use [Microsoft Entra Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview) to grant the FluentBit service account access to Azure Blob Storage.

### Azure prerequisites

- Your AKS cluster must be [enabled as an OIDC Issuer](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer)
- The [Azure Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) mutating webhook must be installed on your cluster

### 1. Create or reuse a Managed Identity

Create a User Assigned Managed Identity (or reuse an existing one):

```bash
az identity create \
  --name fluentbit-identity \
  --resource-group <RESOURCE_GROUP> \
  --location <LOCATION>
```

Note the `clientId` from the output.

### 2. Add a federated credential

Create a federated credential that maps the `fluentbit-system` Kubernetes service account to the managed identity:

```bash
az identity federated-credential create \
  --name fluentbit-federated-credential \
  --identity-name fluentbit-identity \
  --resource-group <RESOURCE_GROUP> \
  --issuer <AKS_OIDC_ISSUER_URL> \
  --subject "system:serviceaccount:<NAMESPACE>:fluentbit-system" \
  --audiences "api://AzureADTokenExchange"
```

Replace:

- `<RESOURCE_GROUP>` with your Azure resource group
- `<AKS_OIDC_ISSUER_URL>` with the OIDC issuer URL of your AKS cluster
- `<NAMESPACE>` with the namespace where the data plane is installed (default: `union`)

You can retrieve the OIDC issuer URL with:

```bash
az aks show --name <CLUSTER_NAME> --resource-group <RESOURCE_GROUP> \
  --query "oidcIssuerProfile.issuerUrl" --output tsv
```

### 3. Assign a storage role

Assign the `Storage Blob Data Contributor` role to the managed identity at the storage account level:

```bash
az role assignment create \
  --assignee <CLIENT_ID> \
  --role "Storage Blob Data Contributor" \
  --scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>"
```

### 4. Configure the Azure Helm values

Set the Workload Identity annotation on the FluentBit service account in your data plane Helm values:

```yaml
fluentbit:
  serviceAccount:
    name: fluentbit-system
    annotations:
      azure.workload.identity/client-id: "<CLIENT_ID>"
```

You must also ensure the FluentBit pods have the Workload Identity label. If you have already set `additionalPodLabels` for your data plane, confirm the following label is present:

```yaml
additionalPodLabels:
  azure.workload.identity/use: "true"
```

## GCP (Workload Identity)

On GKE, use [GKE Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) to grant the FluentBit service account access to GCS.

### GCP prerequisites

- [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable) must be enabled on your GKE cluster

### 1. Create or reuse a GCP service account

Create a GCP service account (or reuse an existing one):

```bash
gcloud iam service-accounts create fluentbit-gsa \
  --display-name "FluentBit logging service account" \
  --project <PROJECT_ID>
```

### 2. Grant storage permissions

Grant the service account write access to the metadata bucket:

```bash
gcloud storage buckets add-iam-policy-binding gs://<BUCKET_NAME> \
  --member "serviceAccount:fluentbit-gsa@<PROJECT_ID>.iam.gserviceaccount.com" \
  --role "roles/storage.objectAdmin"
```

### 3. Bind the Kubernetes service account to the GCP service account

Allow the `fluentbit-system` Kubernetes service account to impersonate the GCP service account:

```bash
gcloud iam service-accounts add-iam-policy-binding \
  fluentbit-gsa@<PROJECT_ID>.iam.gserviceaccount.com \
  --role "roles/iam.workloadIdentityUser" \
  --member "serviceAccount:<PROJECT_ID>.svc.id.goog[<NAMESPACE>/fluentbit-system]"
```

Replace:

- `<PROJECT_ID>` with your GCP project ID
- `<BUCKET_NAME>` with the name of your data plane metadata bucket
- `<NAMESPACE>` with the namespace where the data plane is installed (default: `union`)

### 4. Configure the GCP Helm values

Set the Workload Identity annotation on the FluentBit service account in your data plane Helm values:

```yaml
fluentbit:
  serviceAccount:
    name: fluentbit-system
    annotations:
      iam.gke.io/gcp-service-account: "fluentbit-gsa@<PROJECT_ID>.iam.gserviceaccount.com"
```

## Disabling persistent logs

To disable persistent logging entirely, set the following in your Helm values:

```yaml
fluentbit:
  enabled: false
```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/monitoring ===

# Monitoring

The Union.ai data plane deploys a static [Prometheus](https://prometheus.io/) instance that collects metrics required for platform features like cost tracking, task-level resource monitoring, and execution observability. This Prometheus instance is pre-configured and requires no additional setup.

For operational monitoring of the cluster itself (node health, API server metrics, CoreDNS, etc.), the data plane chart includes an optional [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) instance that can be enabled separately.

## Architecture overview

The data plane supports two independent monitoring concerns:

| Concern | What it monitors | How it's deployed | Configurable |
|---------|-----------------|-------------------|--------------|
| **Union features** | Task execution metrics, cost tracking, GPU utilization, container resources | Static Prometheus with pre-built scrape config | Retention, resources, scheduling |
| **Cluster health** (optional) | Kubernetes components, node health, alerting, Grafana dashboards | `kube-prometheus-stack` via `monitoring.enabled` | Full kube-prometheus-stack values |

```
                    ┌─────────────────────────────────────┐
                    │          Data Plane Cluster          │
                    │                                     │
                    │  ┌──────────────────────┐           │
                    │  │  Static Prometheus   │           │
                    │  │  (Union features)    │           │
                    │  │  ┌────────────────┐  │           │
                    │  │  │ Scrape targets │  │           │
                    │  │  │ - kube-state   │  │           │
                    │  │  │ - cAdvisor     │  │           │
                    │  │  │ - propeller    │  │           │
                    │  │  │ - opencost     │  │           │
                    │  │  │ - dcgm (GPU)   │  │           │
                    │  │  │ - envoy        │  │           │
                    │  │  └────────────────┘  │           │
                    │  └─────────────────────-┘           │
                    │                                     │
                    │  ┌──────────────────────┐           │
                    │  │  kube-prometheus     │           │
                    │  │  -stack (optional)   │           │
                    │  │  - Prometheus        │           │
                    │  │  - Alertmanager      │           │
                    │  │  - Grafana           │           │
                    │  │  - node-exporter     │           │
                    │  └──────────────────────┘           │
                    └─────────────────────────────────────┘
```

## Union features Prometheus

The static Prometheus instance is always deployed and pre-configured to scrape the metrics that Union.ai requires. No Prometheus Operator or CRDs are needed. This instance is a platform dependency and should not be replaced or reconfigured.

### Scrape targets

The following targets are scraped automatically:

| Job | Target | Metrics collected |
|-----|--------|------------------|
| `kube-state-metrics` | Pod/node resource requests, limits, status, capacity | Cost calculations, resource tracking |
| `kubernetes-cadvisor` | Container CPU and memory usage via kubelet | Task-level resource monitoring |
| `flytepropeller` | Execution round info, fast task duration | Execution observability |
| `opencost` | Node hourly cost rates (CPU, RAM, GPU) | Cost tracking |
| `gpu-metrics` | DCGM exporter metrics (when `dcgm-exporter.enabled`) | GPU utilization |
| `serving-envoy` | Envoy upstream request counts and latency (when `serving.enabled`) | Inference serving metrics |

### Configuration

The static Prometheus instance is configured under the `prometheus` key in your data plane values:

```yaml
prometheus:
  image:
    repository: prom/prometheus
    tag: v3.3.1
  # Data retention period
  retention: 3d
  # Route prefix for the web UI and API
  routePrefix: /prometheus/
  resources:
    limits:
      cpu: "3"
      memory: "3500Mi"
    requests:
      cpu: "1"
      memory: "1Gi"
  serviceAccount:
    create: true
    annotations: {}
  priorityClassName: system-cluster-critical
  nodeSelector: {}
  tolerations: []
  affinity: {}
```

> [!NOTE] Retention and storage
> The default 3-day retention is sufficient for Union.ai features. Increase `retention` if you query historical feature metrics directly.

### Internal service endpoint

Other data plane components reach Prometheus at:

```
http://union-operator-prometheus.<NAMESPACE>.svc:80/prometheus
```

OpenCost is pre-configured to use this endpoint. You do not need to change it unless you rename the Helm release.

## Enabling cluster health monitoring

To enable operational monitoring with Prometheus Operator, Alertmanager, Grafana, and node-exporter:

```yaml
monitoring:
  enabled: true
```

This deploys a full [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) instance with sensible defaults:

- Prometheus with 7-day retention
- Grafana with admin credentials (override `monitoring.grafana.adminPassword` in production)
- Node exporter, kube-state-metrics, kubelet, CoreDNS, API server, etcd, and scheduler monitoring
- Default alerting and recording rules

### Prometheus Operator CRDs

The `kube-prometheus-stack` uses the Prometheus Operator, which discovers scrape targets and alerting rules through Kubernetes CRDs (ServiceMonitor, PodMonitor, PrometheusRule, etc.). If you prefer to use static scrape configs with your own Prometheus instead, see **Self-managed deployment > Advanced Configurations > Monitoring > Scraping Union services from your own Prometheus**.

To install the CRDs, use the `dataplane-crds` chart:

```yaml
# dataplane-crds values
crds:
  flyte: true
  prometheusOperator: true  # Install Prometheus Operator CRDs
```

Then install or upgrade the CRDs chart before the data plane chart:

```shell
helm upgrade --install union-dataplane-crds unionai/dataplane-crds \
  --namespace union \
  --set crds.prometheusOperator=true
```

> [!NOTE] CRD installation order
> CRDs must be installed before the data plane chart. The `dataplane-crds` chart should be deployed first, and the monitoring stack's own CRD installation is disabled (`monitoring.crds.enabled: false`) to avoid conflicts.

### Customizing the monitoring stack

The monitoring stack accepts all [kube-prometheus-stack values](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#configuration) under the `monitoring` key. Common overrides:

```yaml
monitoring:
  enabled: true

  # Grafana
  grafana:
    enabled: true
    adminPassword: "my-secure-password"
    ingress:
      enabled: true
      ingressClassName: nginx
      hosts:
        - grafana.example.com

  # Prometheus retention and resources
  prometheus:
    prometheusSpec:
      retention: 30d
      resources:
        requests:
          memory: "2Gi"

  # Alertmanager
  alertmanager:
    enabled: true
    # Configure receivers, routes, etc.
```

The monitoring stack's Prometheus supports [remote write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) for forwarding metrics to external time-series databases (Amazon Managed Prometheus, Grafana Cloud, Thanos, etc.):

```yaml
monitoring:
  prometheus:
    prometheusSpec:
      remoteWrite:
        - url: "https://aps-workspaces.<REGION>.amazonaws.com/workspaces/<WORKSPACE_ID>/api/v1/remote_write"
          sigv4:
            region: <REGION>
```

For the full set of configurable values, see the [kube-prometheus-stack chart documentation](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack).

## Scraping Union services from your own Prometheus

If you already run Prometheus in your cluster, you can scrape Union.ai data plane services for operational visibility. All services expose metrics on standard ports.

> [!NOTE] Union features Prometheus
> The built-in static Prometheus handles all metrics required for Union.ai platform features. Scraping from your own Prometheus is for additional operational visibility only -- it does not replace the built-in instance.

### Static scrape configs

Add these jobs to your Prometheus configuration:

```yaml
scrape_configs:
  # Data plane service metrics (operator, propeller, etc.)
  - job_name: union-dataplane-services
    kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [union]
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_instance]
        regex: union-dataplane
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        regex: debug
        action: keep
```

### ServiceMonitor (Prometheus Operator)

If you run the Prometheus Operator, create a ServiceMonitor instead:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: union-dataplane-services
  namespace: union
spec:
  selector:
    matchLabels:
      app.kubernetes.io/instance: union-dataplane
  namespaceSelector:
    matchNames:
      - union
  endpoints:
    - port: debug
      path: /metrics
      interval: 30s
```

This requires the Prometheus Operator CRDs. Install them via the `dataplane-crds` chart with `crds.prometheusOperator: true`.

## Further reading

- [Prometheus documentation](https://prometheus.io/docs/introduction/overview/) -- comprehensive guide to Prometheus configuration, querying, and operation
- [Prometheus remote write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) -- forwarding metrics to external storage
- [Prometheus `kubernetes_sd_config`](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) -- Kubernetes service discovery for scrape targets
- [kube-prometheus-stack chart](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) -- full monitoring stack with Grafana and alerting
- [OpenCost documentation](https://www.opencost.io/docs/) -- cost allocation and tracking

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/union-secrets ===

# Secrets

[Union Secrets](https://www.union.ai/docs/v2/union/user-guide/task-configuration/secrets) are enabled by default. Union Secrets are managed secrets created through the native Kubernetes secret manager.

The only configurable option is the namespace where the secret is stored. To override the default behavior, set `proxy.secretManager.namespace` in the values file used by the helm chart. If this is not specified, the `union` namespace will be used by default.

Example:
```yaml
proxy:
  secretManager:
    # -- Set the namespace for union managed secrets created through the native Kubernetes secret manager. If the namespace is not set,
    # the release namespace will be used.
    namespace: "secret"
```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/data-retention ===

Implications of object storage retention or lifecycle policies on the default bucket and metadata.

# Data retention policies

Union.ai relies on object storage for both **metadata** and **raw data** (your data that is passing through the workflow). Bucket-level retention and lifecycle policies (such as S3 lifecycle rules) that affect the metadata store can cause execution failures, broken history, and data loss.

## How Union.ai uses the default bucket

The platform uses a **default object store bucket** in the data plane for two distinct purposes:

1. **Metadata store** — References, execution state, and pointers to task outputs. The control plane and UI use this metadata to schedule workflows, resolve task dependencies, display execution history, and resolve output locations. This data is required for the correct operation of the platform.

2. **Raw data store** — Large task inputs and outputs or complex types (for example `FlyteFile`, dataframes, etc.). The metadata store holds only pointers to these blobs; the actual bytes live in the raw data store.

Because the **default bucket contains the metadata store**, it must be treated as **durable storage**. Retention or lifecycle policies that delete or overwrite objects in this bucket are **not supported** and can lead to data loss and system failure. There is **no supported way** to recover from metadata loss.

## Impact of metadata loss

| Area | Impact |
|------|--------|
| **UI and APIs** | Execution list or detail views may show errors or "resource not found." Output previews may fail to load. |
| **Execution engine** | In-flight or downstream tasks that depend on a node's output can fail. Retry state may be lost. |
| **Caching** | Pointers to cached outputs may be lost, resulting in cache misses; tasks may re-run or fail. |
| **Traces** | [Trace](https://www.union.ai/docs/v2/union/user-guide/task-programming/traces) checkpoint data (used by `@flyte.trace` for fine-grained recovery from system failures) may be lost, preventing resume-from-checkpoint. |
| **Data** | Raw blobs may still exist, but without metadata the system has no pointers to them. That data becomes **orphaned**. Downstream tasks that consume outputs by reference will fail at runtime. |
| **Operations** | Audit trails and the record of what ran, when, and with what outputs are lost. |

## Retention on a separate raw-data location

If you separate raw data from metadata, you can apply retention policies **only to the raw data location** while keeping metadata durable. This is the only supported approach for applying retention. You can do this either by configuring separate buckets using `configuration.storage.metadataContainer` and `configuration.storage.userDataContainer` in the [data plane chart](https://github.com/unionai/helm-charts/blob/master/charts/dataplane/values.yaml), or by using a metadata prefix within the same bucket (see **Self-managed deployment > Advanced Configurations > Data retention policies > Customizing the metadata path** below).

Be aware of the trade-offs:

- **Historical executions** that reference purged raw data will fail.
- **Cached task outputs** stored as raw data will be lost, causing cache misses and task re-execution.
- **Trace checkpoints** stored in the raw-data location will be purged, preventing resume-from-checkpoint for affected executions.

Data correctness is not silently violated, but the benefits of caching and trace-based recovery are lost for purged data.

## Customizing the metadata path

You can control where metadata is stored within the bucket via the **`config.core.propeller.metadata-prefix`** setting (e.g. `metadata/propeller` in the [data plane chart values](https://github.com/unionai/helm-charts/blob/master/charts/dataplane/values.yaml)). This lets you design lifecycle rules that **exclude** the metadata prefix (for example, in S3 lifecycle rules, apply expiration only to prefixes that do not include the metadata path) so that only non-metadata paths are subject to retention.

Confirm the exact prefix and bucket layout for your deployment from the chart configuration, and validate any retention rules in a non-production environment before applying them broadly.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/namespace-mapping ===

# Namespace mapping

By default, Union.ai maps each project-domain pair to a Kubernetes namespace using the pattern `{project}-{domain}`. For example, the project `flytesnacks` in domain `development` runs workloads in namespace `flytesnacks-development`.

You can customize this mapping by setting the `namespace_mapping.template` value in your Helm configuration.

## Template syntax

The template uses Go template syntax with two variables:

- `{{ project }}` — the project name
- `{{ domain }}` — the domain name (e.g., `development`, `staging`, `production`)

### Examples

| Template | Project | Domain | Resulting namespace |
|----------|---------|--------|---------------------|
| `{{ project }}-{{ domain }}` (default) | `flytesnacks` | `development` | `flytesnacks-development` |
| `{{ domain }}` | `flytesnacks` | `development` | `development` |
| `myorg-{{ project }}-{{ domain }}` | `flytesnacks` | `development` | `myorg-flytesnacks-development` |

> [!WARNING]
> Changing namespace mapping after workflows have run will cause existing data in old namespaces to become inaccessible. Plan your namespace mapping before initial deployment.

## Data plane configuration

Set the `namespace_mapping` value at the top level of your dataplane Helm values. This single value cascades to all services that need it: clusterresourcesync, propeller, operator, and executor.

```yaml
namespace_mapping:
  template: "myorg-{{ '{{' }} project {{ '}}' }}-{{ '{{' }} domain {{ '}}' }}"
```

> [!NOTE]
> The template uses Helm's backtick escaping for Go template delimiters. In your values file, wrap `{{ project }}` and `{{ domain }}` with backtick-escaped `{{` and `}}` delimiters as shown above.

## How it works

Namespace mapping controls several components:

| Component | Role |
|-----------|------|
| **Clusterresourcesync** | Creates Kubernetes namespaces and per-namespace resources (service accounts, resource quotas) based on the mapping |
| **Propeller** | Resolves the target namespace when scheduling workflow pods |
| **Operator** | Resolves the target namespace for operator-managed resources |
| **Executor** | Resolves the target namespace for task execution |
| **Flyteadmin** (control plane) | Determines the target namespace when creating V1 executions |

All components must agree on the mapping. The dataplane chart's top-level `namespace_mapping` value is the canonical source that cascades to clusterresourcesync, propeller, operator, and executor automatically. You should **not** set per-service overrides.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/helm-chart-reference ===

# Helm chart reference

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

A full list of Helm values available for configuration can be found here:

* **Self-managed deployment > Helm chart reference > Page**
* **Self-managed deployment > Helm chart reference > Page**

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/helm-chart-reference/dataplane ===

Deploys the Union dataplane components to onboard a kubernetes cluster to the Union Cloud.

## Chart info

| | |
|---|---|
| **Chart version** | 2026.3.12 |
| **App version** | 2026.3.9 |
| **Kubernetes version** | `>= 1.28.0-0` |

## Dependencies

| Repository | Name | Version |
|------------|------|---------|
| https://fluent.github.io/helm-charts | fluentbit(fluent-bit) | 0.48.9 |
| https://kubernetes-sigs.github.io/metrics-server/ | metrics-server(metrics-server) | 3.12.2 |
| https://kubernetes.github.io/ingress-nginx | ingress-nginx | 4.12.3 |
| https://nvidia.github.io/dcgm-exporter/helm-charts | dcgm-exporter | 4.7.1 |
| https://opencost.github.io/opencost-helm-chart | opencost | 1.42.0 |
| https://prometheus-community.github.io/helm-charts | monitoring(kube-prometheus-stack) | 80.8.0 |
| https://prometheus-community.github.io/helm-charts | kube-state-metrics | 5.30.1 |
| https://unionai.github.io/helm-charts | knative-operator(knative-operator) | 2025.5.0 |

## Values

| Key | Type | Description | Default |
|-----|------|-------------|---------|
| additionalPodAnnotations | object | Define additional pod annotations for all of the Union pods. | `{}` |
| additionalPodEnvVars | object | Define additional pod environment variables for all of the Union pods. | `{}` |
| additionalPodLabels | object | Define additional pod labels for all of the Union pods. | `{}` |
| additionalPodSpec | object | Define additional PodSpec values for all of the Union pods. | `{}` |
| clusterName | string | Cluster name should be shared with Union for proper functionality. | `"{{ .Values.global.CLUSTER_NAME }}"` |
| clusterresourcesync | object | clusterresourcesync contains the configuration information for the syncresources service. | `(see values.yaml)` |
| clusterresourcesync.additionalTemplates | list | Additional cluster resource templates to create per project namespace. Use this instead of overriding `templates` to avoid accidentally removing the default namespace, service account, and resource quota templates. Each entry has a `key` (filename stem) and `value` (Kubernetes manifest). | `[]` |
| clusterresourcesync.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| clusterresourcesync.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| clusterresourcesync.affinity | object | affinity configurations for the syncresources pods | `{}` |
| clusterresourcesync.config | object | Syncresources service configuration | `(see values.yaml)` |
| clusterresourcesync.config.clusterResourcesPrivate | object | Additional configuration for the cluster resources service | `{"app":{"isServerless":false}}` |
| clusterresourcesync.config.clusterResourcesPrivate.app | object | Configuration of app serving services. | `{"isServerless":false}` |
| clusterresourcesync.config.cluster_resources.clusterName | string | The name of the cluster.  This should always be the same as the cluster name in the config. | `"{{ include \"getClusterName\" . }}"` |
| clusterresourcesync.config.cluster_resources.refreshInterval | string | How frequently to sync the cluster resources | `"5m"` |
| clusterresourcesync.config.cluster_resources.standaloneDeployment | bool | Start the cluster resource manager in standalone mode. | `true` |
| clusterresourcesync.config.cluster_resources.templatePath | string | The path to the project the templates used to configure project resource quotas. | `"/etc/flyte/clusterresource/templates"` |
| clusterresourcesync.config.union | object | Connection information for the sync resources service to connect to the Union control plane. | `(see values.yaml)` |
| clusterresourcesync.config.union.connection.host | string | Host to connect to | `"dns:///{{ tpl .Values.host . }}"` |
| clusterresourcesync.enabled | bool | Enable or disable the syncresources service | `true` |
| clusterresourcesync.nodeName | string | nodeName constraints for the syncresources pods | `""` |
| clusterresourcesync.nodeSelector | object | nodeSelector constraints for the syncresources pods | `{}` |
| clusterresourcesync.podAnnotations | object | Additional pod annotations for the syncresources service | `{}` |
| clusterresourcesync.podEnv | object | Additional pod environment variables for the syncresources service | `{}` |
| clusterresourcesync.resources | object | Kubernetes resource configuration for the syncresources service | `{"limits":{"cpu":"1","memory":"500Mi"},"requests":{"cpu":"500m","memory":"100Mi"}}` |
| clusterresourcesync.serviceAccount | object | Override service account values for the syncresources service | `{"annotations":{},"name":""}` |
| clusterresourcesync.serviceAccount.annotations | object | Additional annotations for the syncresources service account | `{}` |
| clusterresourcesync.serviceAccount.name | string | Override the service account name for the syncresources service | `""` |
| clusterresourcesync.templates | list | The templates that are used to create and/or update kubernetes resources for Union projects. | `(see values.yaml)` |
| clusterresourcesync.templates[0] | object | Template for namespaces resources | `(see values.yaml)` |
| clusterresourcesync.templates[1] | object | Patch default service account | `(see values.yaml)` |
| clusterresourcesync.tolerations | list | tolerations for the syncresources pods | `[]` |
| clusterresourcesync.topologySpreadConstraints | object | topologySpreadConstraints for the syncresources pods | `{}` |
| config | object | Global configuration settings for all Union services. | `(see values.yaml)` |
| config.admin | object | Admin Client configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow/launchplan#AdminConfig) | `(see values.yaml)` |
| config.catalog | object | Catalog Client configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/catalog#Config) Additional advanced Catalog configuration [here](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/catalog#Config) | `(see values.yaml)` |
| config.configOverrides | object | Override any configuration settings. | `{"cache":{"identity":{"enabled":false}}}` |
| config.copilot | object | Copilot configuration | `(see values.yaml)` |
| config.copilot.plugins.k8s.co-pilot | object | Structure documented [here](https://pkg.go.dev/github.com/lyft/flyteplugins@v0.5.28/go/tasks/pluginmachinery/flytek8s/config#FlyteCoPilotConfig) | `(see values.yaml)` |
| config.core | object | Core propeller configuration | `(see values.yaml)` |
| config.core.propeller | object | follows the structure specified [here](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/config). | `(see values.yaml)` |
| config.domain | object | Domains configuration for Union projects. This enables the specified number of domains across all projects in Union. | `(see values.yaml)` |
| config.enabled_plugins.tasks | object | Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) | `(see values.yaml)` |
| config.enabled_plugins.tasks.task-plugins | object | Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) | `(see values.yaml)` |
| config.enabled_plugins.tasks.task-plugins.enabled-plugins | list | [Enabled Plugins](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend plugins | `["container","sidecar","k8s-array","echo","fast-task","connector-service"]` |
| config.k8s | object | Kubernetes specific Flyte configuration | `{"plugins":{"k8s":{"default-cpus":"100m","default-env-vars":[],"default-memory":"100Mi"}}}` |
| config.k8s.plugins.k8s | object | Configuration section for all K8s specific plugins [Configuration structure](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config) | `{"default-cpus":"100m","default-env-vars":[],"default-memory":"100Mi"}` |
| config.logger | object | Logging configuration | `{"level":4,"show-source":true}` |
| config.operator | object | Configuration for the Union operator service | `(see values.yaml)` |
| config.operator.apps | object | Enable app serving | `{"enabled":"{{ .Values.serving.enabled }}"}` |
| config.operator.billing | object | Billing model: None, Legacy, or ResourceUsage. | `{"model":"Legacy"}` |
| config.operator.clusterData | object | Dataplane cluster configuration. | `(see values.yaml)` |
| config.operator.clusterData.appId | string | The client id used to authenticate to the control plane.  This will be provided by Union. | `"{{ tpl .Values.secrets.admin.clientId . }}"` |
| config.operator.clusterData.bucketName | string | The bucket name for object storage. | `"{{ tpl .Values.storage.bucketName . }}"` |
| config.operator.clusterData.bucketRegion | string | The bucket region for object storage. | `"{{ tpl .Values.storage.region . }}"` |
| config.operator.clusterData.cloudHostName | string | The hose name for control plane access. This will be provided by Union. | `"{{ tpl .Values.host . }}"` |
| config.operator.clusterData.gcpProjectId | string | For GCP only, the project id for object storage. | `"{{ tpl .Values.storage.gcp.projectId . }}"` |
| config.operator.clusterData.metadataBucketPrefix | string | The prefix for constructing object storage URLs. | `"{{ include \"storage.metadata-prefix\" . }}"` |
| config.operator.clusterId | object | Set the cluster information for the operator service | `{"organization":"{{ tpl .Values.orgName . }}"}` |
| config.operator.clusterId.organization | string | The organization name for the cluster.  This should match your organization name that you were provided. | `"{{ tpl .Values.orgName . }}"` |
| config.operator.collectUsages | object | Configuration for the usage reporting service. | `{"enabled":true}` |
| config.operator.collectUsages.enabled | bool | Enable usage collection in the operator service. | `true` |
| config.operator.dependenciesHeartbeat | object | Heartbeat check configuration. | `(see values.yaml)` |
| config.operator.dependenciesHeartbeat.prometheus | object | Define the prometheus health check endpoint. | `{"endpoint":"{{ include \"prometheus.health.url\" . }}"}` |
| config.operator.dependenciesHeartbeat.propeller | object | Define the propeller health check endpoint. | `{"endpoint":"{{ include \"propeller.health.url\" . }}"}` |
| config.operator.dependenciesHeartbeat.proxy | object | Define the operator proxy health check endpoint. | `{"endpoint":"{{ include \"proxy.health.url\" . }}"}` |
| config.operator.enableTunnelService | bool | Enable the cloudflare tunnel service for secure communication with the control plane. | `true` |
| config.operator.enabled | bool | Enables the operator service | `true` |
| config.operator.syncClusterConfig | object | Sync the configuration from the control plane. This will overwrite any configuration values set as part of the deploy. | `{"enabled":false}` |
| config.proxy | object | Configuration for the operator proxy service. | `(see values.yaml)` |
| config.proxy.smConfig | object | Secret manager configuration | `(see values.yaml)` |
| config.proxy.smConfig.enabled | string | Enable or disable secret manager support for the Union dataplane. | `"{{ .Values.proxy.secretManager.enabled }}"` |
| config.proxy.smConfig.k8sConfig | object | Kubernetes specific secret manager configuration. | `{"namespace":"{{ include \"proxy.secretsNamespace\" . }}"}` |
| config.proxy.smConfig.type | string | The type of secret manager to use. | `"{{ .Values.proxy.secretManager.type }}"` |
| config.resource_manager | object | Resource manager configuration | `{"propeller":{"resourcemanager":{"type":"noop"}}}` |
| config.resource_manager.propeller | object | resource manager configuration | `{"resourcemanager":{"type":"noop"}}` |
| config.sharedService | object | Section that configures shared union services | `{"features":{"gatewayV2":true},"port":8081}` |
| config.task_logs | object | Section that configures how the Task logs are displayed on the UI. This has to be changed based on your actual logging provider. Refer to [structure](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/logs#LogConfig) to understand how to configure various logging engines | `(see values.yaml)` |
| config.task_logs.plugins.logs.cloudwatch-enabled | bool | One option is to enable cloudwatch logging for EKS, update the region and log group accordingly | `false` |
| config.task_resource_defaults | object | Task default resources configuration Refer to the full [structure](https://pkg.go.dev/github.com/lyft/flyteadmin@v0.3.37/pkg/runtime/interfaces#TaskResourceConfiguration). | `(see values.yaml)` |
| config.task_resource_defaults.task_resources | object | Task default resources parameters | `{"defaults":{"cpu":"100m","memory":"500Mi"},"limits":{"cpu":4096,"gpu":256,"memory":"2Ti"}}` |
| config.union.connection | object | Connection information to the union control plane. | `{"host":"dns:///{{ tpl .Values.host . }}"}` |
| config.union.connection.host | string | Host to connect to | `"dns:///{{ tpl .Values.host . }}"` |
| cost.enabled | bool | Enable or disable the cost service resources.  This does not include the opencost or other compatible monitoring services. | `true` |
| cost.serviceMonitor.matchLabels | object | Match labels for the ServiceMonitor. | `{"app.kubernetes.io/name":"opencost"}` |
| cost.serviceMonitor.name | string | The name of the ServiceMonitor. | `"cost"` |
| databricks | object | Databricks integration configuration | `{"enabled":false,"plugin_config":{}}` |
| dcgm-exporter | object | Dcgm exporter configuration | `(see values.yaml)` |
| dcgm-exporter.enabled | bool | Enable or disable the dcgm exporter | `false` |
| dcgm-exporter.serviceMonitor | object | It's common practice to taint and label  to not run dcgm exporter on all nodes, so we can use node selectors and    tolerations to ensure it only runs on GPU nodes. affinity: {} nodeSelector: {} tolerations: [] | `{"enabled":false}` |
| executor.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| executor.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| executor.affinity | object | affinity for executor deployment | `{}` |
| executor.config.cluster | string |  | `"{{ tpl .Values.clusterName . }}"` |
| executor.config.evaluatorCount | int |  | `64` |
| executor.config.maxActions | int |  | `2000` |
| executor.config.organization | string |  | `"{{ tpl .Values.orgName . }}"` |
| executor.config.unionAuth.injectSecret | bool |  | `true` |
| executor.config.unionAuth.secretName | string |  | `"EAGER_API_KEY"` |
| executor.config.workerName | string |  | `"worker1"` |
| executor.enabled | bool |  | `true` |
| executor.idl2Executor | bool |  | `false` |
| executor.nodeName | string | nodeName constraints for executor deployment | `""` |
| executor.nodeSelector | object | nodeSelector for executor deployment | `{}` |
| executor.plugins.fasttask | object | Configuration section for all K8s specific plugins [Configuration structure](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config) | `(see values.yaml)` |
| executor.plugins.ioutils.remoteFileOutputPaths.deckFilename | string |  | `"report.html"` |
| executor.plugins.k8s.disable-inject-owner-references | bool |  | `true` |
| executor.podEnv | list | Appends additional environment variables to the executor container's spec. | `[]` |
| executor.podLabels.app | string |  | `"executor"` |
| executor.propeller.node-config.disable-input-file-writes | bool |  | `true` |
| executor.raw_config | object |  | `{}` |
| executor.resources.limits.cpu | int |  | `4` |
| executor.resources.limits.memory | string |  | `"8Gi"` |
| executor.resources.requests.cpu | int |  | `1` |
| executor.resources.requests.memory | string |  | `"1Gi"` |
| executor.serviceAccount.annotations | object |  | `{}` |
| executor.sharedService.metrics.scope | string |  | `"executor:"` |
| executor.sharedService.security.allowCors | bool |  | `true` |
| executor.sharedService.security.allowLocalhostAccess | bool |  | `true` |
| executor.sharedService.security.allowedHeaders[0] | string |  | `"Content-Type"` |
| executor.sharedService.security.allowedOrigins[0] | string |  | `"*"` |
| executor.sharedService.security.secure | bool |  | `false` |
| executor.sharedService.security.useAuth | bool |  | `false` |
| executor.task_logs.plugins.logs.cloudwatch-enabled | bool | One option is to enable cloudwatch logging for EKS, update the region and log group accordingly | `false` |
| executor.task_logs.plugins.logs.dynamic-log-links[0].vscode.displayName | string |  | `"VS Code Debugger"` |
| executor.task_logs.plugins.logs.dynamic-log-links[0].vscode.linkType | string |  | `"ide"` |
| executor.task_logs.plugins.logs.dynamic-log-links[0].vscode.templateUris[0] | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[1].wandb-execution-id.displayName | string |  | `"Weights & Biases"` |
| executor.task_logs.plugins.logs.dynamic-log-links[1].wandb-execution-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[1].wandb-execution-id.templateUris[0] | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[2].wandb-custom-id.displayName | string |  | `"Weights & Biases"` |
| executor.task_logs.plugins.logs.dynamic-log-links[2].wandb-custom-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[2].wandb-custom-id.templateUris[0] | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[3].comet-ml-execution-id.displayName | string |  | `"Comet"` |
| executor.task_logs.plugins.logs.dynamic-log-links[3].comet-ml-execution-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[3].comet-ml-execution-id.templateUris | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[4].comet-ml-custom-id.displayName | string |  | `"Comet"` |
| executor.task_logs.plugins.logs.dynamic-log-links[4].comet-ml-custom-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[4].comet-ml-custom-id.templateUris | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[5].neptune-scale-run.displayName | string |  | `"Neptune Run"` |
| executor.task_logs.plugins.logs.dynamic-log-links[5].neptune-scale-run.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[5].neptune-scale-run.templateUris[0] | string |  | `"https://scale.neptune.ai/{{`{{ .taskConfig.project }}`}}/-/run/?customId={{`{{ .podName }}`}}"` |
| executor.task_logs.plugins.logs.dynamic-log-links[6].neptune-scale-custom-id.displayName | string |  | `"Neptune Run"` |
| executor.task_logs.plugins.logs.dynamic-log-links[6].neptune-scale-custom-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[6].neptune-scale-custom-id.templateUris[0] | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.kubernetes-enabled | bool |  | `true` |
| executor.tolerations | list | tolerations for executor deployment | `[]` |
| executor.topologySpreadConstraints | object | topologySpreadConstraints for executor deployment | `{}` |
| extraObjects | list |  | `[]` |
| fluentbit | object | Configuration for fluentbit used for the persistent logging feature. FluentBit runs as a DaemonSet and ships container logs to the persisted-logs/ path in the configured object store. The fluentbit-system service account must have write access to the storage bucket.  Grant access using cloud-native identity federation:   AWS (IRSA):     annotations:       eks.amazonaws.com/role-arn: "arn:aws:iam::`<ACCOUNT_ID>`:role/`<ROLE_NAME>`"   Azure (Workload Identity):     annotations:       azure.workload.identity/client-id: "`<CLIENT_ID>`"   GCP (Workload Identity):     annotations:       iam.gke.io/gcp-service-account: "`<GSA_NAME>`@`<PROJECT_ID>`.iam.gserviceaccount.com"  See https://www.union.ai/docs/v1/selfmanaged/deployment/configuration/persistent-logs/ | `(see values.yaml)` |
| flyteagent | object | Flyteagent configuration | `{"enabled":false,"plugin_config":{}}` |
| flyteconnector.additionalContainers | list | Appends additional containers to the deployment spec. May include template values. | `[]` |
| flyteconnector.additionalEnvs | list | Appends additional envs to the deployment spec. May include template values | `[]` |
| flyteconnector.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| flyteconnector.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| flyteconnector.affinity | object | affinity for flyteconnector deployment | `{}` |
| flyteconnector.autoscaling.maxReplicas | int |  | `5` |
| flyteconnector.autoscaling.minReplicas | int |  | `2` |
| flyteconnector.autoscaling.targetCPUUtilizationPercentage | int |  | `80` |
| flyteconnector.autoscaling.targetMemoryUtilizationPercentage | int |  | `80` |
| flyteconnector.configPath | string | Default glob string for searching configuration files | `"/etc/flyteconnector/config/*.yaml"` |
| flyteconnector.enabled | bool |  | `false` |
| flyteconnector.extraArgs | object | Appends extra command line arguments to the main command | `{}` |
| flyteconnector.image.pullPolicy | string | Docker image pull policy | `"IfNotPresent"` |
| flyteconnector.image.repository | string | Docker image for flyteconnector deployment | `"ghcr.io/flyteorg/flyte-connectors"` |
| flyteconnector.image.tag | string |  | `"py3.13-2.0.0b50.dev3-g695bb1db3.d20260122"` |
| flyteconnector.nodeSelector | object | nodeSelector for flyteconnector deployment | `{}` |
| flyteconnector.podAnnotations | object | Annotations for flyteconnector pods | `{}` |
| flyteconnector.ports.containerPort | int |  | `8000` |
| flyteconnector.ports.name | string |  | `"grpc"` |
| flyteconnector.priorityClassName | string | Sets priorityClassName for datacatalog pod(s). | `""` |
| flyteconnector.prometheusPort.containerPort | int |  | `9090` |
| flyteconnector.prometheusPort.name | string |  | `"metric"` |
| flyteconnector.replicaCount | int | Replicas count for flyteconnector deployment | `2` |
| flyteconnector.resources | object | Default resources requests and limits for flyteconnector deployment | `(see values.yaml)` |
| flyteconnector.service | object | Service settings for flyteconnector | `{"clusterIP":"None","type":"ClusterIP"}` |
| flyteconnector.serviceAccount | object | Configuration for service accounts for flyteconnector | `{"annotations":{},"create":true,"imagePullSecrets":[]}` |
| flyteconnector.serviceAccount.annotations | object | Annotations for ServiceAccount attached to flyteconnector pods | `{}` |
| flyteconnector.serviceAccount.create | bool | Should a service account be created for flyteconnector | `true` |
| flyteconnector.serviceAccount.imagePullSecrets | list | ImagePullSecrets to automatically assign to the service account | `[]` |
| flyteconnector.tolerations | list | tolerations for flyteconnector deployment | `[]` |
| flytepropeller | object | Flytepropeller configuration | `(see values.yaml)` |
| flytepropeller.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| flytepropeller.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| flytepropeller.affinity | object | affinity for Flytepropeller deployment | `{}` |
| flytepropeller.configPath | string | Default regex string for searching configuration files | `"/etc/flyte/config/*.yaml"` |
| flytepropeller.extraArgs | object | extra arguments to pass to propeller. | `{}` |
| flytepropeller.nodeName | string | nodeName constraints for Flytepropeller deployment | `""` |
| flytepropeller.nodeSelector | object | nodeSelector for Flytepropeller deployment | `{}` |
| flytepropeller.podAnnotations | object | Annotations for Flytepropeller pods | `{}` |
| flytepropeller.podLabels | object | Labels for the Flytepropeller pods | `{}` |
| flytepropeller.replicaCount | int | Replicas count for Flytepropeller deployment | `1` |
| flytepropeller.resources | object | Default resources requests and limits for Flytepropeller deployment | `{"limits":{"cpu":"3","memory":"3Gi"},"requests":{"cpu":"1","memory":"1Gi"}}` |
| flytepropeller.serviceAccount | object | Configuration for service accounts for FlytePropeller | `{"annotations":{},"imagePullSecrets":[]}` |
| flytepropeller.serviceAccount.annotations | object | Annotations for ServiceAccount attached to FlytePropeller pods | `{}` |
| flytepropeller.serviceAccount.imagePullSecrets | list | ImapgePullSecrets to automatically assign to the service account | `[]` |
| flytepropeller.tolerations | list | tolerations for Flytepropeller deployment | `[]` |
| flytepropeller.topologySpreadConstraints | object | topologySpreadConstraints for Flytepropeller deployment | `{}` |
| flytepropellerwebhook | object | Configuration for the Flytepropeller webhook | `(see values.yaml)` |
| flytepropellerwebhook.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| flytepropellerwebhook.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| flytepropellerwebhook.affinity | object | affinity for webhook deployment | `{}` |
| flytepropellerwebhook.enabled | bool | enable or disable secrets webhook | `true` |
| flytepropellerwebhook.nodeName | string | nodeName constraints for webhook deployment | `""` |
| flytepropellerwebhook.nodeSelector | object | nodeSelector for webhook deployment | `{}` |
| flytepropellerwebhook.podAnnotations | object | Annotations for webhook pods | `{}` |
| flytepropellerwebhook.podEnv | object | Additional webhook container environment variables | `{}` |
| flytepropellerwebhook.podLabels | object | Labels for webhook pods | `{}` |
| flytepropellerwebhook.priorityClassName | string | Sets priorityClassName for webhook pod | `""` |
| flytepropellerwebhook.replicaCount | int | Replicas | `1` |
| flytepropellerwebhook.securityContext | object | Sets securityContext for webhook pod(s). | `(see values.yaml)` |
| flytepropellerwebhook.service | object | Service settings for the webhook | `(see values.yaml)` |
| flytepropellerwebhook.service.port | int | HTTPS port for the webhook service | `443` |
| flytepropellerwebhook.service.targetPort | int | Target port for the webhook service (container port) | `9443` |
| flytepropellerwebhook.serviceAccount | object | Configuration for service accounts for the webhook | `{"imagePullSecrets":[]}` |
| flytepropellerwebhook.serviceAccount.imagePullSecrets | list | ImagePullSecrets to automatically assign to the service account | `[]` |
| flytepropellerwebhook.tolerations | list | tolerations for webhook deployment | `[]` |
| flytepropellerwebhook.topologySpreadConstraints | object | topologySpreadConstraints for webhook deployment | `{}` |
| fullnameOverride | string | Override the chart fullname. | `""` |
| global.CLIENT_ID | string |  | `""` |
| global.CLUSTER_NAME | string |  | `""` |
| global.FAST_REGISTRATION_BUCKET | string |  | `""` |
| global.METADATA_BUCKET | string |  | `""` |
| global.ORG_NAME | string |  | `""` |
| global.UNION_CONTROL_PLANE_HOST | string |  | `""` |
| host | string | Set the control plane host for your Union dataplane installation.  This will be provided by Union. | `"{{ .Values.global.UNION_CONTROL_PLANE_HOST }}"` |
| image.flytecopilot | object | flytecopilot repository and tag. | `{"pullPolicy":"IfNotPresent","repository":"cr.flyte.org/flyteorg/flytecopilot","tag":"v1.14.1"}` |
| image.kubeStateMetrics | object | Kubestatemetrics repository and tag. | `(see values.yaml)` |
| image.union | object | Image repository for the operator and union services | `{"pullPolicy":"IfNotPresent","repository":"public.ecr.aws/p0i0a9q8/unionoperator","tag":""}` |
| imageBuilder.authenticationType | string | "azure" uses az acr login to authenticate to the default registry. Requires Azure Workload Identity to be enabled. | `"noop"` |
| imageBuilder.buildkit.additionalVolumeMounts | list | Additional volume mounts to add to the buildkit container | `[]` |
| imageBuilder.buildkit.additionalVolumes | list | Additional volumes to add to the pod | `[]` |
| imageBuilder.buildkit.autoscaling | object | buildkit HPA configuration | `{"enabled":false,"maxReplicas":2,"minReplicas":1,"targetCPUUtilizationPercentage":60}` |
| imageBuilder.buildkit.autoscaling.targetCPUUtilizationPercentage | int | We can adjust this as needed. | `60` |
| imageBuilder.buildkit.deploymentStrategy | string | deployment strategy for buildkit deployment | `"Recreate"` |
| imageBuilder.buildkit.enabled | bool | Enable buildkit service within this release. | `true` |
| imageBuilder.buildkit.fullnameOverride | string | The name to use for the buildkit deployment, service, configmap, etc. | `""` |
| imageBuilder.buildkit.image.pullPolicy | string | Pull policy | `"IfNotPresent"` |
| imageBuilder.buildkit.image.repository | string | Image name | `"docker.io/moby/buildkit"` |
| imageBuilder.buildkit.image.tag | e.g. "buildx-stable-1" becomes "buildx-stable-1-rootless" | unless the tag already contains "rootless". | `"buildx-stable-1"` |
| imageBuilder.buildkit.log | object | Enable debug logging | `{"debug":false,"format":"text"}` |
| imageBuilder.buildkit.nodeSelector | object | Node selector | `{}` |
| imageBuilder.buildkit.oci | object | Buildkitd service configuration | `{"maxParallelism":0}` |
| imageBuilder.buildkit.oci.maxParallelism | int | maxParalelism limits the number of concurrent builds, default is 0 (unbounded) | `0` |
| imageBuilder.buildkit.pdb.minAvailable | int | Minimum available pods | `1` |
| imageBuilder.buildkit.podAnnotations | object | Pod annotations | `{}` |
| imageBuilder.buildkit.podEnv | list | Appends additional environment variables to the buildkit container's spec. | `[]` |
| imageBuilder.buildkit.replicaCount | int | Replicas count for Buildkit deployment | `1` |
| imageBuilder.buildkit.resources | object | Resource definitions | `{"requests":{"cpu":1,"ephemeral-storage":"20Gi","memory":"1Gi"}}` |
| imageBuilder.buildkit.rootless | bool | kernel >= 5.11 with unprivileged user namespace support. | `true` |
| imageBuilder.buildkit.service.annotations | object | Service annotations | `{}` |
| imageBuilder.buildkit.service.loadbalancerIp | string | Static ip address for load balancer | `""` |
| imageBuilder.buildkit.service.port | int | Service port | `1234` |
| imageBuilder.buildkit.service.type | string | Service type | `"ClusterIP"` |
| imageBuilder.buildkit.serviceAccount | object | Service account configuration for buildkit | `{"annotations":{},"create":true,"imagePullSecret":"","name":"union-imagebuilder"}` |
| imageBuilder.buildkit.tolerations | list | Tolerations | `[]` |
| imageBuilder.buildkitUri | string | E.g. "tcp://buildkitd.buildkit.svc.cluster.local:1234" | `""` |
| imageBuilder.defaultRepository | string | Note, the build-image task will fail unless "registry" is specified or a default repository is provided. | `""` |
| imageBuilder.enabled | bool |  | `true` |
| imageBuilder.targetConfigMapName | string | Should not change unless coordinated with Union technical support. | `"build-image-config"` |
| ingress-nginx.controller.admissionWebhooks.enabled | bool |  | `false` |
| ingress-nginx.controller.allowSnippetAnnotations | bool |  | `true` |
| ingress-nginx.controller.config.annotations-risk-level | string |  | `"Critical"` |
| ingress-nginx.controller.config.grpc-connect-timeout | string |  | `"1200"` |
| ingress-nginx.controller.config.grpc-read-timeout | string |  | `"604800"` |
| ingress-nginx.controller.config.grpc-send-timeout | string |  | `"604800"` |
| ingress-nginx.controller.ingressClassResource.controllerValue | string |  | `"union.ai/dataplane"` |
| ingress-nginx.controller.ingressClassResource.default | bool |  | `false` |
| ingress-nginx.controller.ingressClassResource.enabled | bool |  | `true` |
| ingress-nginx.controller.ingressClassResource.name | string |  | `"dataplane"` |
| ingress-nginx.enabled | bool |  | `false` |
| ingress.dataproxy | object | Dataproxy specific ingress configuration. | `{"annotations":{},"class":"","hostOverride":"","tls":{}}` |
| ingress.dataproxy.annotations | object | Annotations to apply to the ingress resource. | `{}` |
| ingress.dataproxy.class | string | Ingress class name | `""` |
| ingress.dataproxy.hostOverride | string | Ingress host | `""` |
| ingress.dataproxy.tls | object | Ingress TLS configuration | `{}` |
| ingress.enabled | bool |  | `false` |
| ingress.host | string |  | `""` |
| ingress.serving | object | Serving specific ingress configuration. | `{"annotations":{},"class":"","hostOverride":"","tls":{}}` |
| ingress.serving.annotations | object | Annotations to apply to the ingress resource. | `{}` |
| ingress.serving.class | string | Ingress class name | `""` |
| ingress.serving.hostOverride | Optional | Host override for serving ingress rule. Defaults to *.apps.{{ .Values.host }}. | `""` |
| ingress.serving.tls | object | Ingress TLS configuration | `{}` |
| knative-operator.crds.install | bool |  | `true` |
| knative-operator.enabled | bool |  | `false` |
| kube-state-metrics | object | Standalone kube-state-metrics for Union features (cost tracking, pod resource metrics). Metric filtering is handled in the Prometheus static scrape config. | `{}` |
| low_privilege | bool | Scopes the deployment, permissions and actions created into a single namespace | `false` |
| metrics-server.enabled | bool |  | `false` |
| monitoring.alerting.enabled | bool |  | `false` |
| monitoring.alertmanager.enabled | bool |  | `false` |
| monitoring.coreDns.enabled | bool |  | `true` |
| monitoring.crds.enabled | bool |  | `false` |
| monitoring.dashboards.enabled | bool |  | `true` |
| monitoring.dashboards.label | string |  | `"grafana_dashboard"` |
| monitoring.dashboards.labelValue | string |  | `"1"` |
| monitoring.defaultRules.create | bool |  | `true` |
| monitoring.enabled | bool |  | `false` |
| monitoring.fullnameOverride | string |  | `"monitoring"` |
| monitoring.grafana.adminPassword | string |  | `"admin"` |
| monitoring.grafana.enabled | bool |  | `true` |
| monitoring.grafana.fullNameOverride | string |  | `"monitoring-grafana"` |
| monitoring.kube-state-metrics.fullnameOverride | string |  | `"monitoring-kube-state-metrics"` |
| monitoring.kube-state-metrics.nameOverride | string |  | `"monitoring-kube-state-metrics"` |
| monitoring.kubeApiServer.enabled | bool |  | `true` |
| monitoring.kubeControllerManager.enabled | bool |  | `true` |
| monitoring.kubeEtcd.enabled | bool |  | `true` |
| monitoring.kubeProxy.enabled | bool |  | `true` |
| monitoring.kubeScheduler.enabled | bool |  | `true` |
| monitoring.kubeStateMetrics.enabled | bool |  | `true` |
| monitoring.kubelet.enabled | bool |  | `true` |
| monitoring.nameOverride | string |  | `"monitoring"` |
| monitoring.nodeExporter.enabled | bool |  | `true` |
| monitoring.prometheus.agentMode | bool |  | `false` |
| monitoring.prometheus.enabled | bool |  | `true` |
| monitoring.prometheus.prometheusSpec.maximumStartupDurationSeconds | int |  | `600` |
| monitoring.prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues | bool |  | `false` |
| monitoring.prometheus.prometheusSpec.resources.limits.cpu | string |  | `"2"` |
| monitoring.prometheus.prometheusSpec.resources.limits.memory | string |  | `"4Gi"` |
| monitoring.prometheus.prometheusSpec.resources.requests.cpu | string |  | `"500m"` |
| monitoring.prometheus.prometheusSpec.resources.requests.memory | string |  | `"1Gi"` |
| monitoring.prometheus.prometheusSpec.retention | string |  | `"7d"` |
| monitoring.prometheus.prometheusSpec.ruleSelectorNilUsesHelmValues | bool |  | `false` |
| monitoring.prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues | bool |  | `false` |
| monitoring.prometheus.service.port | int |  | `80` |
| monitoring.prometheusOperator.enabled | bool |  | `true` |
| monitoring.prometheusRules.enabled | bool |  | `true` |
| monitoring.serviceMonitors.enabled | bool |  | `true` |
| monitoring.slos.alerting.enabled | bool |  | `false` |
| monitoring.slos.enabled | bool |  | `false` |
| monitoring.slos.targets.availability | float |  | `0.999` |
| monitoring.slos.targets.latencyP99 | int |  | `5` |
| nameOverride | string | Override the chart name. | `""` |
| namespace_mapping | object | Namespace mapping template for mapping Union runs to Kubernetes namespaces. This is the canonical source of truth. All dataplane services (propeller, clusterresourcesync, operator, executor) will inherit this value unless explicitly overridden in their service-specific config sections (config.namespace_config, config.operator.org, executor.raw_config). | `{}` |
| namespaces.enabled | bool |  | `true` |
| nodeobserver | object | nodeobserver contains the configuration information for the node observer service. | `(see values.yaml)` |
| nodeobserver.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| nodeobserver.additionalVolumes | list | Appends additional volumes to the daemonset spec. May include template values. | `[]` |
| nodeobserver.affinity | object | affinity configurations for the pods associated with nodeobserver services | `{}` |
| nodeobserver.enabled | bool | Enable or disable nodeobserver | `false` |
| nodeobserver.nodeName | string | nodeName constraints for the pods associated with nodeobserver services | `""` |
| nodeobserver.nodeSelector | object | nodeSelector constraints for the pods associated with nodeobserver services | `{}` |
| nodeobserver.podAnnotations | object | Additional pod annotations for the nodeobserver services | `{}` |
| nodeobserver.podEnv | list | Additional pod environment variables for the nodeobserver services | `(see values.yaml)` |
| nodeobserver.resources | object | Kubernetes resource configuration for the nodeobserver service | `{"limits":{"cpu":"1","memory":"500Mi"},"requests":{"cpu":"500m","memory":"100Mi"}}` |
| nodeobserver.tolerations | list | tolerations for the pods associated with nodeobserver services | `[{"effect":"NoSchedule","operator":"Exists"}]` |
| nodeobserver.topologySpreadConstraints | object | topologySpreadConstraints for the pods associated with nodeobserver services | `{}` |
| objectStore | object | Union Object Store configuration | `{"service":{"grpcPort":8089,"httpPort":8080}}` |
| opencost.enabled | bool | Enable or disable the opencost installation. | `true` |
| opencost.opencost.exporter.resources.limits.cpu | string |  | `"1000m"` |
| opencost.opencost.exporter.resources.limits.memory | string |  | `"4Gi"` |
| opencost.opencost.exporter.resources.requests.cpu | string |  | `"500m"` |
| opencost.opencost.exporter.resources.requests.memory | string |  | `"1Gi"` |
| opencost.opencost.metrics.serviceMonitor.enabled | bool |  | `false` |
| opencost.opencost.prometheus.external.enabled | bool |  | `true` |
| opencost.opencost.prometheus.external.url | string |  | `"http://union-operator-prometheus.{{.Release.Namespace}}.svc:80/prometheus"` |
| opencost.opencost.prometheus.internal.enabled | bool |  | `false` |
| opencost.opencost.ui.enabled | bool |  | `false` |
| operator.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| operator.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| operator.affinity | object | affinity configurations for the operator pods | `{}` |
| operator.autoscaling.enabled | bool |  | `false` |
| operator.enableTunnelService | bool |  | `true` |
| operator.imagePullSecrets | list |  | `[]` |
| operator.nodeName | string | nodeName constraints for the operator pods | `""` |
| operator.nodeSelector | object | nodeSelector constraints for the operator pods | `{}` |
| operator.podAnnotations | object |  | `{}` |
| operator.podEnv | object |  | `{}` |
| operator.podLabels | object |  | `{}` |
| operator.podSecurityContext | object |  | `{}` |
| operator.priorityClassName | string |  | `""` |
| operator.replicas | int |  | `1` |
| operator.resources.limits.cpu | string |  | `"2"` |
| operator.resources.limits.memory | string |  | `"3Gi"` |
| operator.resources.requests.cpu | string |  | `"1"` |
| operator.resources.requests.memory | string |  | `"1Gi"` |
| operator.secretName | string |  | `"union-secret-auth"` |
| operator.securityContext | object |  | `{}` |
| operator.serviceAccount.annotations | object |  | `{}` |
| operator.serviceAccount.create | bool |  | `true` |
| operator.serviceAccount.name | string |  | `"operator-system"` |
| operator.tolerations | list | tolerations for the operator pods | `[]` |
| operator.topologySpreadConstraints | object | topologySpreadConstraints for the operator pods | `{}` |
| orgName | string | Organization name should be provided by Union. | `"{{ .Values.global.ORG_NAME }}"` |
| prometheus | object | Union features Prometheus configuration. Deploys a static Prometheus instance (no Prometheus Operator required) for Union features like cost tracking and task-level monitoring. | `(see values.yaml)` |
| prometheus.affinity | object | Affinity rules for the Prometheus pod. | `{}` |
| prometheus.nodeSelector | object | Node selector for the Prometheus pod. | `{}` |
| prometheus.priorityClassName | string | Priority class for the Prometheus pod. | `"system-cluster-critical"` |
| prometheus.resources | object | Resource limits and requests. | `{"limits":{"cpu":"3","memory":"3500Mi"},"requests":{"cpu":"1","memory":"1Gi"}}` |
| prometheus.retention | string | Data retention period. | `"3d"` |
| prometheus.routePrefix | string | Route prefix for Prometheus web UI and API. | `"/prometheus/"` |
| prometheus.serviceAccount | object | Service account configuration. | `{"annotations":{},"create":true}` |
| prometheus.tolerations | list | Tolerations for the Prometheus pod. | `[]` |
| proxy | object | Union operator proxy configuration | `(see values.yaml)` |
| proxy.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| proxy.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| proxy.affinity | object | affinity configurations for the proxy pods | `{}` |
| proxy.nodeName | string | nodeName constraint for the proxy pods | `""` |
| proxy.nodeSelector | object | nodeSelector constraints for the proxy pods | `{}` |
| proxy.secretManager.namespace | string | Set the namespace for union managed secrets created through the native Kubernetes secret manager. If the namespace is not set, the release namespace will be used. | `""` |
| proxy.tolerations | list | tolerations for the proxy pods | `[]` |
| proxy.topologySpreadConstraints | object | topologySpreadConstraints for the proxy pods | `{}` |
| resourcequota | object | Create global resource quotas for the cluster. | `{"create":false}` |
| scheduling | object | Global kubernetes scheduling constraints that will be applied to the pods.  Application specific constraints will always take precedence. | `{"affinity":{},"nodeName":"","nodeSelector":{},"tolerations":[],"topologySpreadConstraints":{}}` |
| scheduling.affinity | object | See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node | `{}` |
| scheduling.nodeSelector | object | See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node | `{}` |
| scheduling.tolerations | list | See https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration | `[]` |
| scheduling.topologySpreadConstraints | object | See https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints | `{}` |
| secrets | object | Connection secrets for the Union control plane services. | `{"admin":{"clientId":"dataplane-operator","clientSecret":"","create":true,"enable":true}}` |
| secrets.admin.clientId | string | The client id used to authenticate to the control plane.  This will be provided by Union. | `"dataplane-operator"` |
| secrets.admin.clientSecret | string | The client secret used to authenticate to the control plane.  This will be provided by Union. | `""` |
| secrets.admin.create | bool | Create the secret resource containing the client id and secret.  If set to false the user is responsible for creating the secret before the installation. | `true` |
| secrets.admin.enable | bool | Enable or disable the admin secret.  This is used to authenticate to the control plane. | `true` |
| serving | object | Configure app serving and knative. | `(see values.yaml)` |
| serving.auth | object | Union authentication and authorization configuration. | `{"enabled":true}` |
| serving.auth.enabled | bool | Disabling is common if not leveraging Union Cloud SSO. | `true` |
| serving.enabled | bool | Enables the serving components. Installs Knative Serving. Knative-Operator must be running in the cluster for this to work. Enables app serving in operator. | `false` |
| serving.extraConfig | object | Additional configuration for Knative serving | `{}` |
| serving.metrics | bool | Enables scraping of metrics from the serving component | `true` |
| serving.replicas | int | The number of replicas to create for all components for high availability. | `2` |
| serving.resources | object | Resources for serving components | `(see values.yaml)` |
| sparkoperator.enabled | bool |  | `false` |
| sparkoperator.plugin_config | object |  | `{}` |
| storage | object | Object storage configuration used by all Union services. | `(see values.yaml)` |
| storage.accessKey | string | The access key used for object storage. | `""` |
| storage.authType | string | The authentication type.  Currently supports "accesskey" and "iam". | `"accesskey"` |
| storage.bucketName | string | The bucket name used for object storage. | `"{{ .Values.global.METADATA_BUCKET }}"` |
| storage.cache | object | Cache configuration for objects retrieved from object storage. | `{"maxSizeMBs":0,"targetGCPercent":70}` |
| storage.custom | object | Define custom configurations for the object storage.  Only used if the provider is set to "custom". | `{}` |
| storage.disableSSL | bool | Disable SSL for object storage.  This should only used for local/sandbox installations. | `false` |
| storage.endpoint | string | Define or override the endpoint used for the object storage service. | `""` |
| storage.fastRegistrationBucketName | string | The bucket name used for fast registration uploads. | `"{{ .Values.global.FAST_REGISTRATION_BUCKET }}"` |
| storage.fastRegistrationURL | string | Override the URL for signed fast registration uploads.  This is only used for local/sandbox installations. | `""` |
| storage.gcp | object | Define GCP specific configuration for object storage. | `{"projectId":""}` |
| storage.injectPodEnvVars | bool | Injects the object storage access information into the pod environment variables.  Needed for providers that only support access and secret key based authentication. | `true` |
| storage.limits | object | Internal service limits for object storage access. | `{"maxDownloadMBs":1024}` |
| storage.metadataPrefix | string | Example for Azure: "abfs://my-container@mystorageaccount.dfs.core.windows.net" | `""` |
| storage.provider | string | The storage provider to use.  Currently supports "compat", "aws", "oci", and "custom". | `"compat"` |
| storage.region | string | The bucket region used for object storage. | `"us-east-1"` |
| storage.s3ForcePathStyle | bool | Use path style instead of domain style urls to access the object storage service. | `true` |
| storage.secretKey | string | The secret key used for object storage. | `""` |
| userRoleAnnotationKey | string | This is the annotation key that is added to service accounts.  Used with GCP and AWS. | `"eks.amazonaws.com/role-arn"` |
| userRoleAnnotationValue | string | This is the value of the annotation key that is added to service accounts. Used with GCP and AWS. | `"arn:aws:iam::ACCOUNT_ID:role/flyte_project_role"` |

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/helm-chart-reference/knative-operator ===

Deploys Knative Operator

## Chart info

| | |
|---|---|
| **Chart version** | 2025.6.3 |
| **App version** | 1.16.0 |
| **Kubernetes version** | `>= 1.28.0-0` |

## Values

| Key | Type | Description | Default |
|-----|------|-------------|---------|
| crds.install | bool |  | `true` |