# BYOC deployment
> This bundle contains all pages in the BYOC deployment section.
> Source: https://www.union.ai/docs/v2/union/deployment/byoc/

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc ===

# BYOC deployment

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

In a BYOC (Bring Your Own Cloud) deployment, Union.ai manages the data plane infrastructure in your cloud account.
You provide the cloud account and network configuration; Union.ai handles Kubernetes cluster operations, upgrades, and monitoring.

Your code, data, container images, and logs remain entirely in your data plane.
The Union.ai control plane orchestrates workflow execution but has no access to your proprietary data.

## Getting started

1. Review the **BYOC deployment > Platform architecture** to understand the control plane and data plane split.
2. Set up your data plane on your cloud provider:
   - **BYOC deployment > Data plane setup on AWS**
   - **BYOC deployment > Data plane setup on Azure**
   - **BYOC deployment > Data plane setup on GCP**
3. **BYOC deployment > Configuring your data plane** with your specific requirements (regions, node groups, networking).

## Cloud resource integration

Connect your data plane to cloud-native services:

- [AWS resources](./enabling-aws-resources/_index) (S3, ECR, Secrets Manager)
- [Azure resources](./enabling-azure-resources/_index) (Blob Storage, Container Registry, Key Vault)
- [GCP resources](./enabling-gcp-resources/_index) (Cloud Storage, Artifact Registry, BigQuery)

## Additional configuration

- [Single sign-on setup](./single-sign-on-setup/_index) for OAuth2/OIDC-based authentication
- **BYOC deployment > Multi-cluster and multi-cloud** for domain and project isolation
- **BYOC deployment > Data retention policy** for controlling stored data lifecycle

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/platform-architecture ===

# Platform architecture

The Union.ai architecture consists of two virtual private clouds, referred to as planes—the control plane and the data plane.

![](../../_static/images/user-guide/platform-architecture/union-architecture.png)

## Control plane

The control plane:
  * Runs within the Union.ai AWS account.
  * Provides the user interface through which users can access authentication, authorization, observation, and management functions.
  * Is responsible for placing executions onto data plane clusters and performing other cluster control and management functions.

## Data plane

All your workflow and task executions are performed in the data plane, which runs within your AWS or GCP account. The data plane's clusters are provisioned and managed by the control plane through a resident Union operator with minimal required permissions.

Union.ai operates one control plane for each supported region, which supports all data planes within that region. You can choose the region in which to locate your data plane. Currently, Union.ai supports the `us-west`, `us-east`, `eu-west`, and `eu-central` regions, and more are being added.

### Data plane nodes

Once the data plane is deployed in your AWS or GCP account, there are different kinds of nodes with different responsibilities running in your cluster. In Union.ai, we distinguish between default nodes and worker nodes.

Default nodes guarantee the basic operation of the data plane and are always running. Example services that run on these nodes include autoscaling (worker nodes), monitoring services, union operator, and many more.

Worker nodes are responsible for executing your workloads. You have full control over the configuration of your [worker nodes](./configuring-your-data-plane#worker-node-groups).

When worker nodes are not in use, they automatically scale down to the configured minimum. (The default is zero.)

## Union.ai operator

The Union.ai hybrid architecture lets you maintain ultimate ownership and control of your data and compute infrastructure while enabling Union.ai to handle the details of managing that infrastructure.

Management of the data plane is mediated by a dedicated operator (the Union.ai operator) resident on that plane.
This operator is designed to perform its functions with only the very minimum set of required permissions.
It allows the control plane to spin up and down clusters and provides Union.ai's support engineers with access to system-level logs and the ability to apply changes as per customer requests.
It _does not_ provide direct access to secrets or data.

In addition, communication is always initiated by the Union.ai operator in the data plane toward the Union.ai control plane, not the other way around.
This further enhances the security of your data plane.

Union.ai is SOC-2 Type 2 certified. A copy of the audit report is available upon request.

## Registry data

Registry data is composed of:

* Names of workflows, tasks, launch plans, and artifacts
* Input and output types for workflows and tasks
* Execution status, start time, end time, and duration of workflows and tasks
* Version information for workflows, tasks, launchplans, and artifacts
* Artifact definitions

This type of data is stored in the control plane and is used to manage the execution of your workflows.
This does not include any workflow or task code, nor any data that is processed by your workflows or tasks.

## Execution data

Execution data is composed of::

* Event data
* Workflow inputs
* Workflow outputs
* Data passed between tasks (task inputs and outputs)

This data is divided into two categories: *raw data* and *literal data*.

### Raw data

Raw data is composed of:

* Files and directories
* Dataframes
* Models
* Python-pickled types

These are passed by reference between tasks and are always stored in an object store in your data plane.
This type of data is read by (and may be temporarily cached) by the control plane as needed, but is never stored there.

### Literal data

* Primitive execution inputs (int, string... etc.)
* JSON-serializable dataclasses

These are passed by value, not by reference, and may be stored in the Union.ai control plane.

## Data privacy

If you are concerned with maintaining strict data privacy, be sure not to pass private information in literal form between tasks.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/configuring-your-data-plane ===

# Configuring your data plane

After you set up your data plane account(s), the next step is to specify the infrastructure you want to deploy.
You will need to send the following details to the Union.ai team:

* Which **BYOC deployment > Configuring your data plane > Cloud provider** will you use?
* Will this be a **BYOC deployment > Configuring your data plane > Multi-cluster** setup?
    * If so, how will Flyte domains and/or Flyte projects be mapped to clusters?
    * Additionally, how will clusters be grouped into cluster pools? (Each cluster pool will have its own metadata bucket)
* For each cluster:
    * **BYOC deployment > Configuring your data plane > Account ID** for this cluster (each cluster must be in its own account on your cloud provider)
    * **BYOC deployment > Configuring your data plane > Region** in which the cluster will be deployed.
    * **BYOC deployment > Configuring your data plane > VPC** setup (will you use your own VPC or have Union.ai provision one for you?)
    * **BYOC deployment > Configuring your data plane > Data retention policy** for workflow execution data stored in this cloud provider account.
    * For each **BYOC deployment > Configuring your data plane > Worker node groups > Node group name**:
        * **BYOC deployment > Configuring your data plane > Worker node groups > Node type**
        * **BYOC deployment > Configuring your data plane > Worker node groups > Minimum**
        * **BYOC deployment > Configuring your data plane > Worker node groups > Maximum**
        * **BYOC deployment > Configuring your data plane > Worker node groups > Interruptible instances**
        * **BYOC deployment > Configuring your data plane > Worker node groups > Taints**
        * **BYOC deployment > Configuring your data plane > Worker node groups > Disk**

## Cloud provider

You can choose AWS, Azure, or GCP as your cloud provider.
If you choose to have multiple clusters, they must all be in the same provider.

## Multi-cluster

You can choose a single or multi-cluster configuration.

In a multi-cluster configuration, you have separate clusters for each of your Flyte domains and/or Flyte projects.

A cluster in this context refers to a distinct EKS (in AWS), AKS (in Azure), or GKE (in GCP) instance in its own AWS account, Azure subscription, or GCP project.

The most common set up is to have a separate cluster for each Flyte domain: development, staging, and production.

You can further partition your deployment so that each Flyte domain-project pair has its own cluster in its own account.

In addition, clusters are grouped into cluster pools. Each cluster pool will have its own metadata bucket. You can group your clusters into pools based on your own criteria, for example, by region or by the type of workloads that will run on them.

See [Multi-cluster](./multi-cluster) for more information.

## Account ID

Provide the ID of the AWS account, Azure subscription, or GCP project in which each cluster will reside.

## Region

For each cluster, specify the region. Available regions are `us-west`, `us-east`, `eu-west`, and `eu-central`.

## VPC

Specify whether you want to set up your own VPC or use one provided by Union.ai.
If you are provisioning your own VPC, provide the VPC ID.

## Data retention policy

Each cluster has its own internal object store that is used to store data used in the execution of workflows.
This includes task input-output metadata, task input-output raw data, Flyte Decks data, and fast registration data.
For each cluster, you can choose to enable a data retention policy that defines a maximum time for this data to be stored, after which it will be automatically deleted.
Alternatively, you can set this to `unlimited` to disable automatic data deletion.
See [Data retention policy](./data-retention-policy) for more details.

## Worker node groups

Specify the worker node groups (in AWS) or worker node pools (in Azure and GCP) that you wish to have, with the following details for each. For more information about worker nodes, see [Platform architecture](./platform-architecture).

### Node group name

The name of the node group. This will be used as the node group name in the EKS, AKS, or GKE console.

### Node type

The instance type name, for example, `p3d.4xlarge`. (See [AWS instance types](https://aws.amazon.com/ec2/instance-types), [Azure VM sizes](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes), or [GCP machine types](https://cloud.google.com/compute/docs/machine-types) for more information. Also see **BYOC deployment > Configuring your data plane > Resources held back** below.)

### Minimum

The minimum node number. The default is `0`.

Setting a minimum of `0` means that an execution may take longer to schedule since a node may have to spun up.
If you want to ensure that at least node is always available, set the minimum to `1`.

Note however, that a setting of `1` will only help the `0` to `1` spin-up issue.
It will not help in the case where you have `1` node available but need `2`, and so forth.
Ultimately, the minimum should be determined by the workload pattern that you expect.

### Maximum

The maximum node number. This setting must be explicitly set to a value greater than `0`.

### Interruptible instances

> [!NOTE]
> In AWS, the term *spot instance* is used.
> In Azure, the equivalent term is *spot VM*.
> In GCP, the equivalent term is *spot VM*.
> Here we use the term *interruptible instance* generically for all providers.

Specify whether this will be an **interruptible instance** or an **on-demand instance** node group.

Note that for each interruptible node group, an identical on-demand group will be configured as a fallback.
This fallback group will be identical in all respects to the interruptible group (instance type, taints, disk size, etc.), apart from being on-demand instead of interruptible.
The fallback group will be used when the retries on the interruptible group have been exhausted.

For more information on interruptible instances, see the interruptible instances documentation.

### Taints

Specify whether this node group will be a specialized node group reserved for specific tasks (typically with specialized hardware requirements).

If so, it will be configured with a *taint* so that only tasks configured with a *toleration* for that taint will be able to run on it.

Typically, only GPU node groups fall into this specialized category, and they will always be assigned taints in any case. It is not common to place taints on other types of node groups, but you can do so if you wish.

<!-- TODO ADD: For more detail on how taints and tolerations work see [Taints and tolerations](). -->

### Disk

Specify the disk size for the nodes in GiB. The default is `500 GiB`.

## Resources held back

When specifying node types and other resource parameters, you should keep in mind that the nominally quoted amount of a given resource is not always available to Flyte tasks.
For example, in an node instance rated at `16GiB`, some of that is held back for overhead and will not be available to Flyte task processes.

## Example specification
Values provided by you are in single quotes (').

```yaml
- Cloud provider: 'AWS'
- Multi-cluster: 'True'
    - Mapping: 'domain -> cluster'
- Clusters:
    - 'development'
        - Account ID: 'account-id-1'
        - Region: 'us-west'
        - VPC: 'vpc-id-1'
        - Data retention policy: '30 days'
        - Node groups:
            - 'node-group-1'
                - Node type: 'p3d.4xlarge'
                - Min: '2'
                - Max: '5'
                - Spot: 'True'
                - Taints: 'False'
                - Disk: '1500 GiB'
            - 'node-group2'
                - Node type: 't4.24xlarge'
                - Min: '2'
                - Max: '5'
                - Spot: 'True'
                - Taints: 'False'
                - Disk: '1500 GiB'
    - 'staging'
        - Account ID: 'account-id-2'
        - Region: 'us-west'
        - VPC: 'vpc-id-2'
        - Data retention policy: '30 days'
        - Node groups:
            - 'node-group-1'
                - Node type: 'p3d.4xlarge'
                - Min: '2'
                - Max: '5'
                - Spot: 'True'
                - Taints: 'False'
                - Disk: '1500 GiB'
            - 'node-group-2'
                - Node type: 't4.24xlarge'
                - Min: '2'
                - Max: '5'
                - Spot: 'True'
                - Taints: 'False'
                - Disk: '1500 GiB'
    - 'production'
        - Account ID: 'account-id-3'
        - Region: 'us-west'
        - VPC: 'vpc-id-3'
        - Data retention policy: 'unlimited'
        - Node groups:
            - 'node-group-1'
                - Node type: 'p3d.4xlarge'
                - Min: '2'
                - Max: '5'
                - Spot: 'False'
                - Taints: 'False'
                - Disk: '1500 GiB'
            - 'node-group-2'
                - Node type: 't4.24xlarge'
                - Min: '2'
                - Max: '5'
                - Spot: 'False'
                - Taints: 'False'
                - Disk: '1500 GiB'
```

## After deployment

Once Union.ai has configured and deployed your cluster(s), you will be able to see your data plane setup in **Usage > Compute**.

## Adjusting your configuration

To make changes to your cluster configuration, go to the [Union.ai Support Portal](https://support.union.ai).
This portal also accessible from **Usage > Compute** through the **Adjust Configuration** button.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/multi-cluster ===

# Multi-cluster and multi-cloud

When [configuring your data plane](./configuring-your-data-plane), you can map each domain or project to its own GCP project or AWS subaccount. You can even mix cloud providers: Some of your domains and/or projects can be mapped to AWS subaccounts while others can be mapped to GCP projects.

## Domain isolation

If you choose domain isolation, then you would have one GCP project or AWS subaccount for each domain. For example:

| Domain        | GCP project or AWS subaccount     |
| ------------- | --------------------------------- |
| `development` | `gcp-project-union-development`   |
| `staging`     | `gcp-project-union-staging`       |
| `production`  | `aws-subaccount-union-production` |

## Project isolation

If you choose project isolation, then you would have one GCP project or AWS subaccount for each Union.ai project-domain pair. For example:

| Domain/Project          | GCP Project or AWS Subaccount               |
| ----------------------- | ------------------------------------------- |
| `development/project-1` | `gcp-project-union-development-project-1`   |
| `development/project-2` | `gcp-project-union-development-project-2`   |
| `development/project-3` | `gcp-project-union-development-project-3`   |
| `staging/project-1`     | `gcp-project-union-staging-project-1`       |
| `staging/project-2`     | `gcp-project-union-staging-project-1`       |
| `staging/project-3`     | `gcp-project-union-staging-project-1`       |
| `production/project-1`  | `aws-subaccount-union-production-project-1` |
| `production/project-2`  | `aws-subaccount-union-production-project-1` |
| `production/project-3`  | `aws-subaccount-union-production-project-1` |

The precise set of GCP projects and/or AWS subaccounts depends on the number of Union.ai domains and projects that you have.

> [!NOTE] Limitations of project per GCP project/AWS subaccount
> Note that if you choose to map each Union.ai project to its own GCP project/AWS subaccount,
> you will need to define the set of such projects up front. This is because the Union.ai project will have to be
> created when the GCP project/AWS subaccount is set up.
>
> If you also want the ability to create projects on demand, this can be supported by having an additional
> _default_ GCP project/AWS subaccount. Any projects created _after_ onboarding will be created in that
> default GCP project/AWS subaccount.

## Data and metadata isolation

Each domain or project is isolated within its own AWS account or Google project, and therefore provides the level of compute and data isolation intrinsic to that arrangement. Specifically, execution-time isolation per domain or project is maintained for both compute and user data stored in blob store (or other configured storage).

In addition, metadata specific to the internals of Union.ai can be either isolated or shared across clusters, depending on the configuration you choose.

Specifically, the sharing of metadata is controlled by the cluster pool to which a cluster belongs. If two clusters are in the same cluster pool, then they _must_ share the same metadata bucket. If they are in different cluster pools, then they _must_ have different metadata buckets. You could, for example, have a single metadata bucket for all your development clusters, and a separate one for all your production clusters, by grouping the clusters into cluster pools accordingly. Alternatively you could have a separate metadata bucket for each cluster, by putting each cluster in its own cluster pool.

You specify the cluster pool to which a cluster belongs when you [configure your data plane](./configuring-your-data-plane) with the help of the Union.ai team.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/data-plane-setup-on-aws ===

# Data plane setup on AWS

To set up your data plane on Amazon Web Services (AWS) you must allow Union.ai to provision and maintain compute resources under your AWS account.
You will need to set up an IAM role for Union.ai to use that has sufficient permissions to do this provisioning.
Setting the permissions can be done either through CloudFormation or the AWS console.

Additionally, if you wish to manage your own Virtual Private Cloud (VPC) then you will need to set up the VPC according to the guidelines described below.
If you do not wish to manage your own VPC then no additional configuration is needed.

## Setting permissions through CloudFormation

You can do the setup quickly using AWS CloudFormation.

### Click the Launch Stack button

Ensure that you are logged into the desired AWS account and then select the appropriate region and launch the corresponding CloudFormation stack:

| Region         | Launch Stack                                                                                                                                                                                                                                                                                                                                                                                                         |
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `us-east-1`    | [![Launch AWS CloudFormation Stack](https://www.union.ai/docs/v2/union/_static/images/deployment/data-plane-setup-on-aws/cloudformation-launch-stack.png)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/quickcreate?templateURL=https%3A%2F%2Funion-public.s3.amazonaws.com%2Ftemplates%2Fv0.13%2Funion-ai-admin-role.template.yaml&stackName=UnionCloudAccess&param_CrossAccountRoleName=union-ai-admin)       |
| `us-east-2`    | [![Launch AWS CloudFormation Stack](https://www.union.ai/docs/v2/union/_static/images/deployment/data-plane-setup-on-aws/cloudformation-launch-stack.png)](https://us-east-2.console.aws.amazon.com/cloudformation/home?region=us-east-2#/stacks/quickcreate?templateURL=https%3A%2F%2Funion-public.s3.amazonaws.com%2Ftemplates%2Fv0.13%2Funion-ai-admin-role.template.yaml&stackName=UnionCloudAccess&param_CrossAccountRoleName=union-ai-admin)       |
| `us-west-2`    | [![Launch AWS CloudFormation Stack](https://www.union.ai/docs/v2/union/_static/images/deployment/data-plane-setup-on-aws/cloudformation-launch-stack.png)](https://us-west-2.console.aws.amazon.com/cloudformation/home?region=us-west-2#/stacks/quickcreate?templateURL=https%3A%2F%2Funion-public.s3.amazonaws.com%2Ftemplates%2Fv0.13%2Funion-ai-admin-role.template.yaml&stackName=UnionCloudAccess&param_CrossAccountRoleName=union-ai-admin)       |
| `eu-west-1`    | [![Launch AWS CloudFormation Stack](https://www.union.ai/docs/v2/union/_static/images/deployment/data-plane-setup-on-aws/cloudformation-launch-stack.png)](https://eu-west-1.console.aws.amazon.com/cloudformation/home?region=eu-west-1#/stacks/quickcreate?templateURL=https%3A%2F%2Funion-public.s3.amazonaws.com%2Ftemplates%2Fv0.13%2Funion-ai-admin-role.template.yaml&stackName=UnionCloudAccess&param_CrossAccountRoleName=union-ai-admin)       |
| `eu-west-2`    | [![Launch AWS CloudFormation Stack](https://www.union.ai/docs/v2/union/_static/images/deployment/data-plane-setup-on-aws/cloudformation-launch-stack.png)](https://eu-west-2.console.aws.amazon.com/cloudformation/home?region=eu-west-2#/stacks/quickcreate?templateURL=https%3A%2F%2Funion-public.s3.amazonaws.com%2Ftemplates%2Fv0.13%2Funion-ai-admin-role.template.yaml&stackName=UnionCloudAccess&param_CrossAccountRoleName=union-ai-admin)       |
| `eu-central-1` | [![Launch AWS CloudFormation Stack](https://www.union.ai/docs/v2/union/_static/images/deployment/data-plane-setup-on-aws/cloudformation-launch-stack.png)](https://eu-central-1.console.aws.amazon.com/cloudformation/home?region=eu-central-1#/stacks/quickcreate?templateURL=https%3A%2F%2Funion-public.s3.amazonaws.com%2Ftemplates%2Fv0.13%2Funion-ai-admin-role.template.yaml&stackName=UnionCloudAccess&param_CrossAccountRoleName=union-ai-admin) |

> [!NOTE] CloudFormation template
> All of these buttons launch the same CloudFormation template, just in different regions.
> The CloudFormation template itself is available at this URL:
>
> [https://union-public.s3.amazonaws.com/templates/v0.13/union-ai-admin-role.template.yaml](https://union-public.s3.amazonaws.com/templates/v0.13/union-ai-admin-role.template.yaml)
>
> For details on the functionality enabled by each of the permissions,
> see the [release notes](https://github.com/unionai/union-cloud-infrastructure/releases).

### Confirm the details

Once you have selected **Launch Stack**, you will be taken to the CloudFormation interface. Do the following:

1. Check the profile name in the top right corner to confirm that you are in the correct account.
2. Leave the default values in place:
   - `UnionCloudAccess` for the **Stack Name**.
   - `union-ai-admin` for **Cross Account Role Name**.
3. Enter the `external ID` provided by Union.ai team for **ExternalId**  
4. Select the checkbox indicating that you acknowledge that AWS CloudFormation may create IAM resources with custom names.
5. Select **Create Stack**.

### Share the role ARN

Once the above steps are completed, you will need to get the ARN of the newly created role (`union-ai-admin`) and send it to the Union.ai team:

1. In the navigation pane of the IAM console, choose **Roles**.
1. In the list of roles, choose the `union-ai-admin` role.
1. In the **Summary** section of the details pane, copy the **role ARN** value.
1. Share the ARN with the Union.ai team.
1. The Union.ai team will get back to you to verify that they are able to assume the role.

### Updating permissions through CloudFormation

From time to time Union.ai may need to update the `union-ai-admin` role to support new or improved functionality.

If you used CloudFormation to set up your stack in the first place, you will have to perform the update by replacing your CloudFormation template with a new one.

When an update is required:

- The Union.ai team will inform you that you need to perform the update.
- The URL of the template will be published above, in the **CloudFormation template** info box. This is always kept up to date with the latest template.

To perform the update on your system, copy the template URL and follow the directions here:

### Update your CloudFormation template

1. Log in to the AWS web console and navigate to **CloudFormation** for the region within which your data plane is deployed.
2. Select the `UnionCloudAccess` stack.
3. Select **Stack Actions > Create change set for current stack**.
4. Select **Replace current template**.
5. Input the new CloudFormation template URL provided to you by the Union.ai team (and published above in the **Current template** info box).
6. Select **Next**.
7. On the **Specify stack details** page, accept the defaults and select **Next**.
8. On the **Configure stack options** page, accept the defaults and select **Next**.
9. On the **Review UnionCloudAccess** page, accept the acknowledgment at the bottom of the page and select **Submit**.
10. Wait for the changeset to be generated by AWS (refresh the page if necessary).
11. Select **Execute change set**.

## Setting permissions manually

If you want to perform the setup manually, instead of using the CloudFormation method described above, do the following.

### Prepare the policy documents

First, copy the policy document `UnionIAMPolicy.json` below to an editor and replace`${AWS::Region}` with the correct region and `${AWS::AccountID}` with your account ID.

You will use this policy in a later step.

```json
{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Action":[
            "logs:ListTagsLogGroup",
            "logs:TagLogGroup",
            "logs:UntagLogGroup",
            "logs:DescribeLogGroups",
            "rds:DescribeDBSubnetGroups",
            "logs:DeleteLogGroup",
            "eks:CreateNodegroup",
            "eks:UpdateNodegroupConfig",
            "rds:CreateDBSubnetGroup",
            "logs:CreateLogGroup",
            "ec2:AllocateAddress",
            "eks:DeleteCluster",
            "rds:DeleteDBSubnetGroup",
            "kms:CreateAlias",
            "eks:DescribeCluster",
            "logs:PutRetentionPolicy",
            "kms:DeleteAlias"
         ],
         "Resource":[
            "arn:aws:kms:${AWS::Region}:${AWS::AccountID}:alias/*",
            "arn:aws:rds:${AWS::Region}:${AWS::AccountID}:subgrp:*",
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:elastic-ip/*",
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:cluster/opta-*",
            "arn:aws:logs:${AWS::Region}:${AWS::AccountID}:log-group:opta-*",
            "arn:aws:logs:${AWS::Region}:${AWS::AccountID}:log-group::log-stream*",
            "arn:aws:logs:${AWS::Region}:${AWS::AccountID}:log-group:/aws/eks/opta-*:*"
         ],
         "Effect":"Allow",
         "Sid":"0"
      },
      {
         "Action":[
            "sqs:CreateQueue",
            "sqs:DeleteQueue",
            "sqs:SetQueueAttributes",
            "sqs:TagQueue",
            "sqs:UntagQueue"
         ],
         "Resource":[
            "arn:aws:sqs:${AWS::Region}:${AWS::AccountID}:Karpenter*"
         ],
         "Effect":"Allow"
      },
      {
         "Action":[
            "events:DescribeRule",
            "events:DeleteRule",
            "events:ListTargetsByRule",
            "events:PutRule",
            "events:PutTargets",
            "events:RemoveTargets",
            "events:TagResource"
         ],
         "Resource":[
            "arn:aws:events:${AWS::Region}:${AWS::AccountID}:rule/Karpenter*"
         ],
         "Effect":"Allow"
      },
      {
         "Action":[
            "eks:TagResource",
            "eks:UntagResource",
            "eks:ListTagsForResource",
            "eks:CreateAccessEntry",
            "eks:DescribeAccessEntry",
            "eks:UpdateAccessEntry",
            "eks:DeleteAccessEntry"
         ],
         "Resource":[
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:cluster/opta-*"
         ],
         "Effect":"Allow",
         "Sid":"112"
      },
      {
         "Action":[
            "kms:EnableKeyRotation",
            "kms:PutKeyPolicy",
            "kms:GetKeyPolicy",
            "ec2:AttachInternetGateway",
            "kms:ListResourceTags",
            "kms:TagResource",
            "kms:UntagResource",
            "ec2:DetachInternetGateway",
            "eks:DescribeNodegroup",
            "kms:GetKeyRotationStatus",
            "eks:DeleteNodegroup",
            "ec2:CreateInternetGateway",
            "kms:ScheduleKeyDeletion",
            "kms:CreateAlias",
            "kms:DescribeKey",
            "ec2:DeleteInternetGateway",
            "kms:DeleteAlias",
            "kms:CreateGrant"
         ],
         "Resource":[
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:nodegroup/*",
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:internet-gateway/*",
            "arn:aws:kms:${AWS::Region}:${AWS::AccountID}:key/*"
         ],
         "Effect":"Allow",
         "Sid":"1"
      },
      {
         "Action":[
            "ec2:CreateNatGateway",
            "ec2:DeleteNatGateway"
         ],
         "Resource":[
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:natgateway/*"
         ],
         "Effect":"Allow",
         "Sid":"2"
      },
      {
         "Action":[
            "ec2:CreateRoute",
            "ec2:DeleteRoute",
            "ec2:CreateRouteTable",
            "ec2:DeleteRouteTable",
            "ec2:AssociateRouteTable"
         ],
         "Resource":[
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:route-table/*",
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:subnet/subnet-*"
         ],
         "Effect":"Allow",
         "Sid":"3"
      },
      {
         "Action":[
            "ec2:AuthorizeSecurityGroupEgress",
            "ec2:AuthorizeSecurityGroupIngress"
         ],
         "Resource":[
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:security-group-rule/*"
         ],
         "Effect":"Allow",
         "Sid":"4"
      },
      {
         "Action":[
            "ec2:RevokeSecurityGroupIngress",
            "ec2:AuthorizeSecurityGroupEgress",
            "ec2:AuthorizeSecurityGroupIngress",
            "ec2:CreateSecurityGroup",
            "ec2:RevokeSecurityGroupEgress",
            "ec2:DeleteSecurityGroup"
         ],
         "Resource":[
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:security-group/*",
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:vpc/vpc-*"
         ],
         "Effect":"Allow",
         "Sid":"5"
      },
      {
         "Action":[
            "ec2:DeleteSubnet",
            "ec2:CreateNatGateway",
            "ec2:CreateSubnet",
            "ec2:ModifySubnetAttribute"
         ],
         "Resource":[
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:subnet/*"
         ],
         "Effect":"Allow",
         "Sid":"6"
      },
      {
         "Action":[
            "ec2:CreateNatGateway"
         ],
         "Resource":[
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:elastic-ip/eipalloc-*"
         ],
         "Effect":"Allow",
         "Sid":"7"
      },
      {
         "Action":[
            "ec2:DeleteFlowLogs",
            "ec2:CreateFlowLogs"
         ],
         "Resource":[
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:vpc-flow-log/*",
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:vpc/vpc*"
         ],
         "Effect":"Allow",
         "Sid":"8"
      },
      {
         "Action":[
            "ec2:CreateVpc",
            "ec2:CreateRouteTable",
            "ec2:AttachInternetGateway",
            "ec2:ModifyVpcAttribute",
            "ec2:DetachInternetGateway",
            "ec2:DeleteVpc",
            "ec2:CreateSubnet",
            "ec2:DescribeVpcAttribute",
            "ec2:AssociateVpcCidrBlock"
         ],
         "Resource":[
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:vpc/*"
         ],
         "Effect":"Allow",
         "Sid":"VisualEditor8"
      },
      {
         "Action":[
            "iam:DeleteOpenIDConnectProvider",
            "iam:GetOpenIDConnectProvider",
            "iam:CreateOpenIDConnectProvider",
            "iam:TagOpenIDConnectProvider",
            "iam:UntagOpenIDConnectProvider",
            "iam:ListOpenIDConnectProviderTags"
         ],
         "Resource":[
            "arn:aws:iam::${AWS::AccountID}:oidc-provider/*"
         ],
         "Effect":"Allow",
         "Sid":"VisualEditor9"
      },
      {
         "Action":[
            "iam:CreatePolicy",
            "iam:CreatePolicyVersion",
            "iam:DeletePolicyVersion",
            "iam:GetPolicyVersion",
            "iam:GetPolicy",
            "iam:ListPolicyVersions",
            "iam:DeletePolicy",
            "iam:ListPolicyTags",
            "iam:TagPolicy",
            "iam:UntagPolicy"
         ],
         "Resource":[
            "arn:aws:iam::${AWS::AccountID}:policy/*"
         ],
         "Effect":"Allow",
         "Sid":"VisualEditor10"
      },
      {
         "Action":[
            "iam:GetRole",
            "iam:TagRole",
            "iam:UntagRole",
            "iam:ListRoleTags",
            "iam:CreateRole",
            "iam:DeleteRole",
            "iam:AttachRolePolicy",
            "iam:PutRolePolicy",
            "iam:ListInstanceProfilesForRole",
            "iam:PassRole",
            "iam:CreateServiceLinkedRole",
            "iam:DetachRolePolicy",
            "iam:ListAttachedRolePolicies",
            "iam:DeleteRolePolicy",
            "iam:ListRolePolicies",
            "iam:GetRolePolicy"
         ],
         "Resource":[
            "arn:aws:iam::${AWS::AccountID}:role/*"
         ],
         "Effect":"Allow",
         "Sid":"VisualEditor111"
      },
      {
         "Action":[
            "ec2:DescribeAddresses",
            "ec2:EnableEbsEncryptionByDefault",
            "ec2:GetEbsEncryptionByDefault",
            "ec2:DescribeFlowLogs",
            "ec2:ResetEbsDefaultKmsKeyId",
            "ec2:DescribeInternetGateways",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DescribeAvailabilityZones",
            "ec2:GetEbsDefaultKmsKeyId",
            "ec2:DescribeAccountAttributes",
            "kms:CreateKey",
            "ec2:DescribeNetworkAcls",
            "ec2:DescribeRouteTables",
            "ec2:ModifyEbsDefaultKmsKeyId",
            "eks:CreateCluster",
            "eks:UpdateClusterVersion",
            "eks:UpdateClusterConfig",
            "ec2:ReleaseAddress",
            "rds:AddTagsToResource",
            "rds:RemoveTagsFromResource",
            "rds:ListTagsForResource",
            "ec2:DescribeVpcClassicLinkDnsSupport",
            "ec2:CreateTags",
            "ec2:DescribeNatGateways",
            "ec2:DisassociateRouteTable",
            "ec2:DescribeSecurityGroups",
            "ec2:DescribeVpcClassicLink",
            "ec2:DescribeVpcs",
            "kms:ListAliases",
            "ec2:DisableEbsEncryptionByDefault",
            "sts:GetCallerIdentity",
            "ec2:DescribeSubnets",
            "ec2:DescribeSecurityGroupRules",
            "ec2:AllocateAddress",
            "ec2:AssociateAddress",
            "ec2:DisassociateAddress",
            "ec2:DescribeInstanceTypeOfferings",
            "logs:DescribeLogStreams",
            "iam:ListRoles",
            "iam:ListPolicies",
            "ec2:DescribeInstanceTypes",
            "servicequotas:GetServiceQuota",
            "cloudwatch:GetMetricStatistics"
         ],
         "Resource":"*",
         "Effect":"Allow",
         "Sid":"VisualEditor12"
      },
      {
         "Action":"dynamodb:*",
         "Resource":[
            "arn:aws:dynamodb:${AWS::Region}:${AWS::AccountID}:table/opta-*"
         ],
         "Effect":"Allow",
         "Sid":"VisualEditor13"
      },
      {
         "Action":"s3:*",
         "Resource":[
            "arn:aws:s3:::opta-*",
            "arn:aws:s3:::opta-*/",
            "arn:aws:s3:::union-*",
            "arn:aws:s3:::union-*/"
         ],
         "Effect":"Allow",
         "Sid":"VisualEditor14"
      },
      {
         "Action":[
            "events:DescribeRule",
            "events:ListTargetsByRule",
            "events:ListTagsForResource",
            "events:UntagResource"
         ],
         "Resource":[
            "arn:aws:events:${AWS::Region}:${AWS::AccountID}:rule/Karpenter*"
         ],
         "Effect":"Allow"
      },
      {
         "Action":[
            "sqs:GetQueueAttributes",
            "sqs:ListQueueTags"
         ],
         "Resource":[
            "arn:aws:sqs:${AWS::Region}:${AWS::AccountID}:Karpenter*"
         ],
         "Effect":"Allow"
      },
      {
         "Action":[
            "elasticache:CreateCacheSubnetGroup",
            "elasticache:AddTagsToResource",
            "elasticache:RemoveTagsFromResource",
            "elasticache:ListTagsForResource",
            "elasticache:DescribeCacheSubnetGroups",
            "elasticache:DeleteCacheSubnetGroup"
         ],
         "Resource":[
            "arn:aws:elasticache:${AWS::Region}:${AWS::AccountID}:subnetgroup:opta-*"
         ],
         "Effect":"Allow",
         "Sid":"ElastiCache"
      },
      {
         "Action":[
            "iam:CreateInstanceProfile",
            "iam:AddRoleToInstanceProfile",
            "iam:RemoveRoleFromInstanceProfile",
            "iam:DeleteInstanceProfile",
            "iam:TagInstanceProfile",
            "iam:UntagInstanceProfile",
            "iam:ListInstanceProfileTags",
            "iam:GetInstanceProfile",
            "iam:UpdateAssumeRolePolicy"
         ],
         "Resource":[
            "arn:aws:iam::${AWS::AccountID}:instance-profile/*"
         ],
         "Effect":"Allow",
         "Sid":"self0"
      },
      {
         "Action":[
            "ec2:RunInstances",
            "ec2:CreateTags",
            "ec2:DescribeTags",
            "ec2:DeleteTags",
            "ec2:DescribeImages",
            "ec2:CreateLaunchTemplate",
            "ec2:CreateLaunchTemplateVersion",
            "ec2:DescribeLaunchTemplates",
            "ec2:DescribeLaunchTemplateVersions",
            "ec2:DeleteLaunchTemplate",
            "ec2:DeleteLaunchTemplateVersions",
            "ec2:ModifyLaunchTemplate"
         ],
         "Resource":"*",
         "Effect":"Allow",
         "Sid":"self1"
      },
      {
         "Action":[
            "autoscaling:CreateAutoScalingGroup",
            "autoscaling:DeleteAutoScalingGroup",
            "autoscaling:DescribeAutoScalingGroups",
            "autoscaling:UpdateAutoScalingGroup",
            "autoscaling:CreateLaunchConfiguration",
            "autoscaling:SetInstanceProtection",
            "autoscaling:DescribeScalingActivities",
            "autoscaling:CreateOrUpdateTags",
            "autoscaling:DescribeTags",
            "autoscaling:DeleteTags"
         ],
         "Resource":"*",
         "Effect":"Allow",
         "Sid":"self2"
      },
      {
         "Action":[
            "eks:UpdateNodegroupConfig",
            "eks:ListNodegroups",
            "eks:UpdateNodegroupVersion",
            "eks:TagResource",
            "eks:UntagResource",
            "eks:ListTagsForResource",
            "eks:DescribeUpdate",
            "eks:DeleteNodegroup"
         ],
         "Resource":[
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:nodegroup/opta-*/opta-*/*",
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:nodegroup/opta-*",
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:nodegroup/*",
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:cluster/opta-*",
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:addon/opta-*/*/*"
         ],
         "Effect":"Allow",
         "Sid":"AllowUpdateNodegroupConfig"
      },
      {
         "Action":[
            "eks:CreateAddon",
            "eks:UpdateAddon",
            "eks:DeleteAddon",
            "eks:DescribeAddonVersions",
            "eks:DescribeAddon",
            "eks:ListAddons"
         ],
         "Resource":[
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:cluster/opta-*",
            "arn:aws:eks:${AWS::Region}:${AWS::AccountID}:addon/opta-*/*/*"
         ],
         "Effect":"Allow",
         "Sid":"AllowUpdateEKSAddonConfig"
      },
      {
         "Action":[
            "ec2:CreateVpcEndpoint",
            "ec2:ModifyVpcEndpoint",
            "ec2:DeleteVpcEndpoints"
         ],
         "Resource":[
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:vpc/vpc*",
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:vpc-endpoint/*",
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:route-table/*",
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:subnet/*",
            "arn:aws:ec2:${AWS::Region}:${AWS::AccountID}:security-group/*"
         ],
         "Effect":"Allow",
         "Sid":"AllowVpcEndpoints"
      },
      {
         "Action":[
            "ec2:DescribeVpcEndpoints",
            "ec2:DescribePrefixLists"
         ],
         "Resource":"*",
         "Effect":"Allow",
         "Sid":"AllowVpcEndpointReadPermissions"
      },
      {
         "Action":[
            "ecr:CreateRepository",
            "ecr:DeleteRepository",
            "ecr:TagResource",
            "ecr:UntagResource",
            "ecr:PutLifecyclePolicy",
            "ecr:DeleteLifecyclePolicy",
            "ecr:PutImageTagMutability",
            "ecr:PutImageScanningConfiguration",
            "ecr:BatchDeleteImage",
            "ecr:DeleteRepositoryPolicy",
            "ecr:SetRepositoryPolicy",
            "ecr:GetRepositoryPolicy",
            "ecr:PutReplicationConfiguration",
            "ecr:DescribeRepositories",
            "ecr:ListTagsForResource",
            "ecr:GetLifecyclePolicy",
            "ecr:GetRepositoryPolicy",
            "ecr:DescribeImages"
         ],
         "Resource":[
            "arn:aws:ecr:*:${AWS::AccountID}:repository/union/*"
         ],
         "Effect":"Allow",
         "Sid":"UnionImageBuilderRepoAdmin"
      },
      {
         "Action":[
            "ecr:GetAuthorizationToken"
         ],
         "Resource":"*",
         "Effect":"Allow",
         "Sid":"UnionAdminAuthToken"
      }
   ]
}
```

### Create the role manually

Next, you must create the role. Follow the directions here:

1. Sign in to the **AWS Management Console** as an administrator of your account, and open the **IAM console**.
2. Choose **Roles** and then select **Create role**.
3. Under **Select trusted entity**, choose **AWS account**.
4. Under **An AWS account**, select **Another AWS account**.
5. In the **Account ID** field, enter the Union.ai account ID: `479331373192`.
6. Under **Options,** you will see two items: **Require external ID** and **Require MFA**. At this point in the process, you can leave these unchecked.
7. Select **Next**. This will take you to the **Add permissions** page.
8. Select **Next**. We will setup permissions in a later step.
9. Enter the role name `union-ai-admin`.
10. (Optional) For **Description**, enter a description for the new role.
11. (Optional) Under **Tags**, add tags as key-value pairs. For more information about using tags in IAM, see[ Tagging IAM resources](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_tags.html).
12. After reviewing the role, choose **Create role**.
13. Search for the `union-ai-admin` role in the IAM Roles list and click on it.
14. Click **Add permissions** and select **Create inline policy** from the drop down menu.
15. On the Create policy screen, click the **JSON** tab.
16. Replace the contents of the policy editor with the **UnionIAMPolicy.json** file that you edited earlier.
17. Click **Review policy**.
18. Name the policy **UnionIAMPolicyManual** and click **Create policy**.

### Share the role ARN

Now you must obtain the Amazon Resource Name (ARN) of the role, a unique identifier for the role:

1. In the navigation pane of the IAM console, choose **Roles**.
2. In the list of roles, choose the `union-ai-admin` role.
3. In the **Summary** section of the details pane, copy the **role ARN** value.

Share the ARN with the Union.ai team.
The Union.ai team will get back to you to verify that they are able to assume the role.

### Updating permissions manually

From time to time Union.ai may need to update the `union-ai-admin` role to support new or improved functionality.
If you set up your role manually in the first place (as opposed to using CloudFormation), you will have to perform the update manually as well.
follow the directions here:

1. Sign in to the **AWS Management Console** as an administrator of your account, and open the **IAM console**.
2. Choose **Roles**
3. Search for the `union-ai-admin` role in the IAM Roles list and click on it.
4. Under **Permissions policies**, select the previously created policy (if you followed the above directions, it should be called **UnionIAMPolicyManual**).
5. The next screen will display the JSON for current policy.
6. Replace the current policy JSON with the updated copy of **UnionIAMPolicy.json** and click **Next**.
7. On the next page, review the new policy and click **Save changes**.

## Setting up and managing your own VPC (optional)

If you decide to manage your own VPC, instead of leaving it to Union.ai, then you will need to set it up yourself.
The VPC should be configured with the following characteristics.

- **Multiple availability zones**:
  - We recommend a minimum of 3.
- **A sufficiently large CIDR range**:
  - We recommend a /16 for the VPC, /28 for each public subnet, and /18 for each private subnet.
  - With most CNIs, a safe assumption is one IP allocated per pod. Small subnets can limit the number of pods that can be spun up when projects scale.
- **A public subnet** with:
  - An internet gateway configured for internet access.
- **A private subnet** with:
  - A NAT gateway setup for internet access.
- Enable **(Recommended) VPC Endpoints** to mitigate unnecessary NAT gateway network traffic:
  - Enable [S3 VPC gateway endpoint with appropriate route table association](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html).
  - Enable [VPC interface endpoints](https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html) for the following services `com.amazonaws.<REGION>.logs`, `com.amazonaws.<REGION>.ecr.dkr`, `com.amazonaws.<REGION>.ec2`
    - Ensure the service names include the region that contains the aforementioned availability zones.
    - Ensure the subnet IDs are configured to include all the aforementioned availability zones.
    - Ensure the security groups allow all traffic from within the VPC.
    - Enable [Private DNS](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html#private-dns-s3) to support out of the box compatibility with data plane services.

Once your VPC is set up, you will need to provide the Union.ai team with the following information:

- **VPC ID**
  - Example: `vpc-8580ec61d96caf837`
- **Public subnet IDs** (one per availability zone)
  - Example: `subnet-d7d3ce57d1a546401`
- **Private subnet IDs** (one per availability zone)
  - Example: `subnet-bc2eafd5c11180be0`

## Private EKS endpoint

The requirements described so far, enable Union to operate with a `Public` or `Public and Private` EKS endpoint. 

To deploy the Union operator in your EKS cluster and to perform troubleshooting at the Kubernetes layer, Union requires access to the [EKS endpoint](https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html). 

> This connection is not used for executions, only for cluster onboarding, upgrades and support.

For additional security, the EKS endpoint can be configured as `Private` only. In such case, Union implements a VPC Endpoint connection over [Private Link](https://docs.aws.amazon.com/vpc/latest/userguide/endpoint-services-overview.html), a lightweight yet robust mechanism to ensure management traffic doesn't leave the AWS network.

When AWS rolls out changes to the EKS endpoint, its IP address might change. To handle this and prevent any disconnect, the Union automation sets up a "jumper" ECS container in the customer account which forwards the incoming requests to the EKS Endpoint, acting as a reverse proxy, while a Network Load Balancer exposes an stable endpoint address. In this way, you get the security of a fully private connection and a reliable channel for Union staff to manage your cluster proactively or troubleshoot issues when needed.

![](../../_static/images/deployment/data-plane-setup-on-aws/aws_private_link_architecture.png)

For this setup, there are additional requirements you'll need to complete in your AWS account:

### Create additional roles for ECS

#### ECS Task Execution role
- **Role name**: `unionai-access-<REGION>-ecs-execution-role` 
- **Attached policy**: `AmazonECSTaskExecutionRolePolicy` (built-in policy)
- **Trust Relationship**:
```json
 {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "ecs-tasks.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```

#### ECS Task Definition role
- **Role name**: `unionai-access-<REGION>-ecs-task-role`  
- **Attached policy**:

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowSSMMessageChannels",
            "Effect": "Allow",
            "Action": [
                "ssmmessages:OpenDataChannel",
                "ssmmessages:OpenControlChannel",
                "ssmmessages:CreateDataChannel",
                "ssmmessages:CreateControlChannel"
            ],
            "Resource": "*"
        },
        {
            "Sid": "UpdateInstanceInfo",
            "Effect": "Allow",
            "Action": "ssm:UpdateInstanceInformation",
            "Resource": "*"
        }
    ]
}
```
- **Trust Relationship**:
```json
 {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "ecs-tasks.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
```
### Attach a new IAM policy to the Union role

Add the following permissions as a new IAM policy attached to the `union-ai-admin` role (described in the **BYOC deployment > Data plane setup on AWS > Setting permissions manually > Prepare the policy documents** section) , replacing `REGION` and `ACCOUNT_ID` to match your environment:

```json
{
    "Statement": [
        {
            "Action": [
                "iam:GetRole"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:iam::<<ACCOUNT_ID>>:role/unionai-access-<<REGION>>-ecs-execution-role",
                "arn:aws:iam::<<ACCOUNT_ID>>:role/unionai-access-<<REGION>>-ecs-task-role"
            ],
            "Sid": "ECSTaskRoles"
        },
        {
            "Action": [
                "application-autoscaling:DescribeScalableTargets",
                "application-autoscaling:DescribeScalingActivities",
                "application-autoscaling:DescribeScalingPolicies",
                "cloudwatch:GetMetricData",
                "cloudwatch:GetMetricStatistics",
                "cloudwatch:ListMetrics",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcAttribute",
                "ec2:DescribeVpcEndpoints",
                "ec2:DescribeVpcEndpointConnections",
                "ec2:DescribeVpcEndpointServiceConfigurations",
                "ec2:DescribeVpcs",
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceStatus",
                "ec2:GetConsoleOutput",
                "ecs:DeregisterTaskDefinition",
                "ecs:DescribeContainerInstances",
                "ecs:DescribeServiceDeployments",
                "ecs:DescribeServices",
                "ecs:DescribeTaskDefinition",
                "ecs:DescribeTasks",
                "ecs:GetTaskProtection",
                "ecs:ListClusters",
                "ecs:ListServices",
                "ecs:ListTaskDefinitionFamilies",
                "ecs:ListTaskDefinitions",
                "ecs:ListTasks",
                "eks:DescribeClusterVersions",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeLoadBalancerAttributes",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTags",
                "elasticloadbalancing:DescribeTargetGroupAttributes",
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeTargetHealth",
                "logs:DescribeLogGroups",
                "servicediscovery:ListNamespaces",
                "iam:SimulatePrincipalPolicy",
                "ssm:StartSession"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Sid": "GlobalPermissions"
        },
        {
            "Action": [
                "ec2:AcceptVpcEndpointConnections",
                "ec2:CreateTags",
                "ec2:CreateVpcEndpointServiceConfiguration",
                "ec2:DeleteVpcEndpointServiceConfigurations",
                "ec2:DescribeVpcEndpointServicePermissions",
                "ec2:ModifyVpcEndpointServiceConfiguration",
                "ec2:ModifyVpcEndpointServicePermissions",
                "ec2:RejectVpcEndpointConnections",
                "ec2:StartVpcEndpointServicePrivateDnsVerification",
                "vpce:AllowMultiRegion"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:ec2:<<REGION>>:<<ACCOUNT_ID>>:vpc-endpoint-service/*",
            "Sid": "EC2ResourceSpecific"
        },
        {
            "Action": [
                "ec2:AuthorizeSecurityGroupEgress",
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:CreateSecurityGroup",
                "ec2:CreateTags",
                "ec2:DeleteSecurityGroup",
                "ec2:RevokeSecurityGroupEgress"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:ec2:<<REGION>>:<<ACCOUNT_ID>>:security-group/*",
                "arn:aws:ec2:<<REGION>>:<<ACCOUNT_ID>>:vpc/*"
            ],
            "Sid": "EC2SecurityGroups"
        },
        {
            "Action": [
                "eks:AccessKubernetesApi",
                "eks:DeleteNodegroup",
                "eks:DescribeCluster",
                "eks:DescribeNodegroup"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:eks:<<REGION>>:<<ACCOUNT_ID>>:cluster/*",
            "Sid": "EKSClusters"
        },
        {
            "Action": [
                "acm:AddTagsToCertificate",
                "acm:DeleteCertificate",
                "acm:DescribeCertificate",
                "acm:ListTagsForCertificate",
                "acm:RequestCertificate"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:acm:<<REGION>>:<<ACCOUNT_ID>>:certificate/*",
            "Sid": "ACMCertificates"
        },
        {
            "Action": [
                "logs:CreateLogGroup",
                "logs:DeleteLogGroup",
                "logs:DescribeLogGroups",
                "logs:FilterLogEvents",
                "logs:GetLogEvents",
                "logs:ListTagsForResource",
                "logs:PutRetentionPolicy",
                "logs:TagResource",
                "logs:UntagResource"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:logs:<<REGION>>:<<ACCOUNT_ID>>:log-group:/ecs/unionai/proxy-*",
                "arn:aws:logs:<<REGION>>:<<ACCOUNT_ID>>:log-group::log-stream"
            ],
            "Sid": "LogGroups"
        },
        {
            "Action": [
                "elasticloadbalancing:AddTags",
                "elasticloadbalancing:CreateListener",
                "elasticloadbalancing:CreateLoadBalancer",
                "elasticloadbalancing:CreateTargetGroup",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeLoadBalancerAttributes",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeTargetGroupAttributes",
                "elasticloadbalancing:DescribeTags",
                "elasticloadbalancing:DeleteListener",
                "elasticloadbalancing:DeleteLoadBalancer",
                "elasticloadbalancing:DeleteTargetGroup",
                "elasticloadbalancing:ModifyLoadBalancerAttributes",
                "elasticloadbalancing:ModifyTargetGroup",
                "elasticloadbalancing:ModifyTargetGroupAttributes"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:elasticloadbalancing:<<REGION>>:<<ACCOUNT_ID>>:loadbalancer/net/unionai-access-*/*",
                "arn:aws:elasticloadbalancing:<<REGION>>:<<ACCOUNT_ID>>:targetgroup/unionai-access-*/*",
                "arn:aws:elasticloadbalancing:<<REGION>>:<<ACCOUNT_ID>>:listener/net/unionai-access-*/*"
            ],
            "Sid": "LoadBalancer"
        },
        {
            "Action": [
                "ecs:CreateCluster",
                "ecs:CreateService",
                "ecs:DeleteCluster",
                "ecs:DeleteService",
                "ecs:DescribeClusters",
                "ecs:DescribeContainerInstances",
                "ecs:DescribeServices",
                "ecs:DescribeServiceDeployments",
                "ecs:DescribeServiceRevisions",
                "ecs:DescribeTaskDefinition",
                "ecs:ExecuteCommand",
                "ecs:ListClusters",
                "ecs:ListTagsForResource",
                "ecs:ListTaskDefinitions",
                "ecs:ListServices",
                "ecs:RegisterTaskDefinition",
                "ecs:TagResource",
                "ecs:UntagResource",
                "ecs:UpdateService",
                "ecs:StartTask",
                "ecs:StopTask"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:ecs:<<REGION>>:<<ACCOUNT_ID>>:cluster/unionai-access-*",
                "arn:aws:ecs:<<REGION>>:<<ACCOUNT_ID>>:service/unionai-access-*/*",
                "arn:aws:ecs:<<REGION>>:<<ACCOUNT_ID>>:task/unionai-access-*/*",
                "arn:aws:ecs:<<REGION>>:<<ACCOUNT_ID>>:task-definition/unionai-access-*:*"
            ],
            "Sid": "ECSClusterServiceTask"
        },
        {
            "Action": [
                "logs:DescribeLogGroups",
                "logs:DescribeLogStreams",
                "logs:GetLogEvents",
                "logs:GetQueryResults",
                "logs:StartQuery",
                "logs:StopQuery"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:logs:<<REGION>>:<<ACCOUNT_ID>>:log-group:/aws/ecs/containerinsights/unionai-access-*/*",
            "Sid": "ContainerInsights"
        }
    ],
    "Version": "2012-10-17"
}
```
Share the ARN of the two roles with the Union.ai team.
The Union.ai team will get back to you to verify that they are able to assume the role.

### Configure VPC Endpoints

Ensure your VPC include these endpoints so when the Union stack needs to connect to the corresponding AWS services, it does so without leaving the AWS network:

- `com.amazonaws.<REGION>.autoscaling`
- `com.amazonaws.<REGION>.xray`
- `com.amazonaws.<REGION>.s3`
- `com.amazonaws.<REGION>.sts`
- `com.amazonaws.<REGION>.ecr.api`
- `com.amazonaws.<REGION>.ssm`
- `com.amazonaws.<REGION>.ec2messages`
- `com.amazonaws.<REGION>.ec2`
- `com.amazonaws.<REGION>.ssmmessages`
- `com.amazonaws.<REGION>.ecr.dkr`
- `com.amazonaws.<REGION>.logs`
- `com.amazonaws.<REGION>.eks-auth`
- `com.amazonaws.<REGION>.eks`
- `com.amazonaws.<REGION>.elasticloadbalancing`

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/data-plane-setup-on-gcp ===

# Data plane setup on GCP

To set up your data plane on Google Cloud Platform (GCP) you must allow Union.ai to provision and maintain compute resources under your GCP account.
To do this you will need to provision a service account with sufficient permissions to perform these tasks.

## Select or create a project

The first step is to select an existing project or create a new one.
This is where Union.ai will provision all resources for your data plane.
Below, we use the placeholder `<ProjectID>` for the project ID.
The actual ID can be whatever you choose.
In addition, you will need the project number associated with your project.
Below we use the placeholder `<ProjectNumber>`.
The project number is visible on your project's [welcome page](https://console.cloud.google.com/welcome).

## Ensure billing is linked

Before your data plane can be deployed, you need to make sure that a billing account is linked to your project:
Go to the [billing page](https://console.cloud.google.com/billing/linkedaccount) of your `<ProjectId>` project and confirm that a billing account is linked.

## Create a workload identity pool and provider

Though your data plane will be in your project in GCP, the Union.ai control plane is still run in AWS.
To allow the control plane to interact with your data plane you must create a _workload identity pool_ and add Union.ai's AWS account as a workload provider.
For more details see the Google Cloud guide for [setting up workload identity federation](https://cloud.google.com/iam/docs/configuring-workload-identity-federation).

### In the GCP web console

1. In your project `<ProjectId>`, under **IAM & Admin > Workload Identity Federation**, select **+ CREATE POOL** to [create a new workload provider and pool](https://console.cloud.google.com/iam-admin/workload-identity-pools/create).
If you have not done so already, you will be guided to [enable the required APIs](https://console.cloud.google.com/flows/enableapi?apiid=iam.googleapis.com,cloudresourcemanager.googleapis.com,iamcredentials.googleapis.com,sts.googleapis.com).
2. **Pool Name**: `unionai` (you can also fill in the description if you like).
3. Under **Add a provider to pool**:
  * For **Select a provider**, choose **AWS**.
  * For **Provider name**, enter `unionai-aws`.
  * The **Provider ID** should be automatically set to `unionai-aws` as well. If not, select **EDIT** and enter it manually.
4. For **AWS Account ID**, enter `479331373192` (Union.ai's management account ID)
5. **Continue** with the default attribute mappings and conditions.

### On the command line using `gcloud`

Assuming you have the [`gcloud` tool ](https://cloud.google.com/sdk/gcloud)installed locally and are logged into `<UnionDataPlaneProjectID>`, you can check the existing workflow identity pools in your project with:

```bash
gcloud iam workload-identity-pools list --location="global"
```

To create the workload identity pool, do:

```bash
gcloud iam workload-identity-pools create unionai \
    --location="global" \
    --description="Union AI WIF" \
    --display-name="unionai"
```

To add the provider, do:

```bash
gcloud iam workload-identity-pools providers create-aws unionai-aws \
    --location="global"  \
    --workload-identity-pool="unionai" \
    --account-id="479331373192"
```

## Create a role for Union.ai admin

To ensure that the Union.ai team has all the privileges needed to deploy the data plane, _but no more than strictly necessary_, you will need to create a custom role that the Union.ai service account will assume.

To avoid having to manually select each separate required privilege we recommend that you perform this step on the command-line with `gcloud`.

First, you will need to download the following YAML file to the directory where you are running your `gcloud` commands.
This file is the role definition. It is a list of the privileges that will make up the new role.

- [`union-ai-admin-role.yaml`](https://github.com/unionai/union-cloud-infrastructure/blob/main/union-ai-admin/gcp/union-ai-admin-role.yaml)

Assuming you have the above file (`union-ai-admin-role.yaml`) in your current directory and substituting your project ID, do:

```bash
gcloud iam roles create UnionaiAdministrator \
    --project=<ProjectId> \
    --file=union-ai-admin-role.yaml
```

## Create the Union.ai admin service account

### In the GCP web console

1. Go to **IAM & Admin >** [**Service Accounts**](https://console.cloud.google.com/iam-admin/serviceaccounts).
2. Select **Create Service Account**
3. For **Name**, enter `Union.ai Administrator`.
4. For **ID**, enter `unionai-administrator`.
_Note that setup process used by the Union.ai team depends on the ID being this precise string_.
_If you use a different ID (though this is not recommended) then you must inform the Union.ai team of this change._
5. You can enter a **Description** if you wish.
6. Grant this service account access to your project `<ProjectId>` with the role create above, `UnionaiAdministrator`.

### On the command line using `gcloud`

Create the service account like this:

```bash
gcloud iam service-accounts create unionai-administrator \
    --project <ProjectId>
```

Bind the service account to the project and add the Union.ai Administrator role like this (again, substituting your project ID):

```bash
gcloud projects add-iam-policy-binding <ProjectId> \
    --member="serviceAccount:unionai-administrator@<ProjectId>.iam.gserviceaccount.com" \
    --role="projects/<ProjectId>/roles/UnionaiAdministrator"
```

## Grant access for the Workflow Identity Pool to the Service Account

### In the GCP web console

1. Go to the newly created [workload identity pool](https://console.cloud.google.com/iam-admin/workload-identity-pools/pool/unionai) page.
2. Select **Grant Access**.
3. Choose the newly created service account.
4. Select **Save**.

### On the command line using `gcloud`

To grant the WIP access to the service account, do the following.
Notice that you must substitute your `<ProjectId>` and your `<ProjectNumber>`.

```bash
gcloud iam service-accounts add-iam-policy-binding unionai-administrator@<ProjectId>.iam.gserviceaccount.com \
      --project=<ProjectId> \
      --role="roles/iam.workloadIdentityUser" \
      --member="principalSet://iam.googleapis.com/projects/<ProjectNumber>/locations/global/workloadIdentityPools/unionai/*"
```

## Enable services API

You will need to enable the following service APIs.

| Name | Endpoint |
|------|----------|
| Artifact Registry API | `artifactregistry.googleapis.com` |
| Cloud Autoscaling API | `autoscaling.googleapis.com` |
| Cloud Key Management Service (KMS) API | `cloudkms.googleapis.com` |
| Cloud Resource Manager API | `cloudresourcemanager.googleapis.com` |
| Compute Engine API | `compute.googleapis.com` |
| Kubernetes Engine API | `container.googleapis.com` |
| Container File System API | `containerfilesystem.googleapis.com` |
| Container Registry API | `containerregistry.googleapis.com` |
| Identity and Access Management (IAM) APIs | `iam.googleapis.com` |
| IAM Service Account Credentials API | `iamcredentials.googleapis.com` |
| Cloud Logging API | `logging.googleapis.com` |
| Cloud Monitoring API | `monitoring.googleapis.com` |
| Secret Manager API | `secretmanager.googleapis.com` |
| Service Networking API | `servicenetworking.googleapis.com` |
| Security Token Service API | `sts.googleapis.com` |
| Cloud SQL Admin API | `sqladmin.googleapis.com` |
| Cloud Storage Services API | `storage-api.googleapis.com` |

### In the GCP web console

Go to [Google Cloud API library](https://console.cloud.google.com/apis/library) and enable each of these by searching for it and clicking **ENABLE**.

### On the command line using `gcloud`

Perform the following `gcloud` commands:

```bash
gcloud services enable artifactregistry.googleapis.com
gcloud services enable autoscaling.googleapis.com
gcloud services enable cloudkms.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable compute.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable containerfilesystem.googleapis.com
gcloud services enable containerregistry.googleapis.com
gcloud services enable iam.googleapis.com
gcloud services enable iamcredentials.googleapis.com
gcloud services enable logging.googleapis.com
gcloud services enable monitoring.googleapis.com
gcloud services enable secretmanager.googleapis.com
gcloud services enable servicenetworking.googleapis.com
gcloud services enable sts.googleapis.com
gcloud services enable sqladmin.googleapis.com
gcloud services enable storage-api.googleapis.com
```

## Setting up and managing your own VPC (optional)

If you decide to manage your own VPC instead of leaving it to Union.ai, then you will need to set it up yourself.
The VPC should be configured with the following characteristics:

* We recommend using a VPC that resides in the same project as the Union.ai Data Plane Kubernetes cluster. If you want to use a [shared VPC](https://cloud.google.com/vpc/docs/shared-vpc), contact Union.ai support.
* Create a single VPC subnet with:
  * A primary IPv4 range with /18 CIDR mask. This is used for cluster node IP addresses.
  * A secondary range with /15 CIDR mask. This is used for Kubernetes Pod IP addresses. We recommend associating the name with pods, e.g. `gke-pods`.
  * A secondary range with /18 CIDR mask. This is used for Kubernetes service IP address. We recommend associating the name with services, e.g. `gke-services`.
  * Identify a /28 CIDR block that will be used for the Kubernetes Master IP addresses. Note this CIDR block is not reserved within the subnet. Google Kubernetes Engine requires this /28 block to be available.

Once your VPC is set up, provide the following to Union.ai:

* VPC name
* Subnet region and name
* The secondary range name for the /15 CIDR mask and /16 CIDR mask
* The /18 CIDR block that was left unallocated for the Kubernetes Master

### Example VPC CIDR Block allocation

* 10.0.0.0/18 Subnet 1 primary IPv4 range → Used for GCP Nodes
* 10.32.0.0/14 Cluster secondary IPv4 range named `gke-pods` → Used for Kubernetes Pods
* 10.64.0.0/18 Service secondary IPv4 range named `gke-services` → Used for Kubernetes Services
* 10.65.0.0/28 Unallocated for Kubernetes Master

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/data-plane-setup-on-azure ===

# Data plane setup on Azure

To set up your data plane on Azure, you must allow Union.ai to provision and maintain compute resources under your Azure subscription. To do this, you will need to provision an Azure app registration with sufficient permissions to an Azure subscription.

## Selecting Azure tenant and subscription

- Select the tenant ID for your organization. Refer to [Microsoft Entra ID service page](https://portal.azure.com/#view/Microsoft_AAD_IAM/ActiveDirectoryMenuBlade/~/Overview) from the Azure portal.
- We highly recommend creating a new subscription for Union.ai-specific services. This helps isolate permissions, service quotas, and costs for Union.ai managed Azure resources.
  - Ensure the subscription is tied to an active billing account.
- Provide the Tenant and Subscription ID to Union.ai.

## Create a Microsoft Entra Application Registration

Union.ai requires permissions to manage Azure and Microsoft Entra resources to create a dataplane. This step involves
creating a Union.ai specific App and granting it sufficient permission to manage the dataplane.

### Create a Microsoft Entra ID Application for Union.ai Access

Union.ai manages Azure resources through a [Microsoft Entra ID Application](https://learn.microsoft.com/en-us/entra/identity-platform/quickstart-register-app) via [Workload Identity Federation](https://learn.microsoft.com/en-us/entra/workload-id/workload-identity-federation-create-trust?pivots=identity-wif-apps-methods-azp).

1. Navigate to the [Application Registrations](https://entra.microsoft.com/#view/Microsoft_AAD_RegisteredApps/ApplicationsListBlade/quickStartType~/null/sourceType/Microsoft_AAD_IAM) page.
2. Create a new registration.
3. Create a new application. The name is your choice, but we recommend `union`. Leave it at the "Single Tenant" account type and do not add any registration URIs.
4. Navigate to your target [Azure Subscription](https://portal.azure.com/#view/Microsoft_Azure_Billing/SubscriptionsBladeV2).
5. Within the Subscription page select Access Control (IAM). Select Add Role Assignment and add the following roles scoped against the subscription:

- Contributor
- Role Based Access Control Administrator

6. Provide the Application Client ID to Union.ai.
7. Go the application registration page for the app you created.
8. Select "Certificates & secrets."
9. Select the "Federated Credentials" tab, then select "Add credential", and choose "Other issuer".
10. Set "Issuer" to `https://cognito-identity.amazonaws.com`
11. Set "Subject identifier" to `us-east-2:6f9a6050-887a-c4cc-0625-120a4805bc34`
12. "Name" is your choice, but we recommend `union-access`
13. Set "Audience" to `us-east-2:ad71bce5-161b-4430-85a5-7ea84a941e6a`

### Create Microsoft Entra ID Applications for Union.ai cost allocation

Union.ai requires new roles and applications to support Union's cost allocation feature.
This can be done by providing the `union` application additional permissions or you can choose to create the roles and applications yourself.

#### Union managed cost allocation roles

- Assign `User Access Administrator` role to the `union` application against the subscription. This enables Union.ai role creation.
- Assign `Application Administrator` role to the `union` application within Microsoft Entra ID. This allows Union to create applications.

#### Create cost allocation roles and applications manually

Union.ai requires a role and service principal for the internal OpenCost subsystem.

Create the OpenCost role for retrieving pricing data (name and subscription can be changed):

```bash
az role definition create --role-definition '{
  "Name": "UnionOpenCostRole",
  "Description": "Role used by OpenCost pod",
  "Actions": [
    "Microsoft.Compute/virtualMachines/vmSizes/read",
    "Microsoft.Resources/subscriptions/locations/read",
    "Microsoft.Resources/providers/read",
    "Microsoft.ContainerService/containerServices/read",
    "Microsoft.Commerce/RateCard/read"
  ],
  "NotActions": [],
  "AssignableScopes": [
    "/subscriptions/YOUR_SUBSCRIPTION_ID"
  ]
}'
```

Create the OpenCost service principal. This creates an application registration, service principal, client secret, and role assignment:

```bash
az ad sp create-for-rbac \
  --name "UnionOpenCost" \
  --role "UnionOpenCostRole" \
  --scopes "/subscriptions/YOUR_SUBSCRIPTION_ID" \
  --years 2
```

Share the output of the above `az ad sp create-for-rbac` command with Union.ai.

## (Recommended) Create a Microsoft Entra group for cluster administration

We recommend [creating a Microsoft Entra group](https://learn.microsoft.com/en-us/training/modules/create-users-and-groups-in-azure-active-directory/) for AKS cluster admin access.
AKS Cluster admin access is commonly provided to individuals that need direct (e.g. `kubectl`) access to the cluster.

Provide the group `Object ID` to Union.ai.

## (Optional) Setting up and managing your own VNet

If you decide to manage your own VNet instead of leaving it to Union.ai, you will need to set it up yourself.

### Required Union.ai VNet permissions

Union.ai requires permissions to read Azure network resources and assign the `Network Contributor` role to the underlying Union.ai Kubernetes cluster.

[Create a role assignment](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal) to allow Union.ai to read VNet resources and assign roles. These permissions should be scoped to the target Virtual Network (VNet). Follow these steps to set up the required access:

1. Navigate to the Azure portal and locate the target VNet.
2. In the VNet's access control (IAM) section, create a new role assignment.
3. For the 'Assigned to' field, select the Union.ai application's service principal.
4. For the 'Role' field, you have two options:
   - Simplest approach: Assign the built-in Azure roles `Reader` and `User Access Administrator`.
   - Advanced approach: Create a custom role with the following specific permissions:
     - `Microsoft.Network/*/read`
     - `Microsoft.Authorization/roleAssignments/write`
     - `Microsoft.Authorization/roleAssignments/delete`
     - `Microsoft.Authorization/roleAssignments/read`
     - `Microsoft.Authorization/roleDefinitions/read`
5. Ensure the 'Scope' is set to the target VNet.
6. Complete the role assignment process.

This configuration will provide the Union.ai application with the necessary permissions to interact with and manage resources within the specified VNet.

> [!NOTE] Creating Azure role assignments
>
> For more detailed instructions on creating role assignments, refer to the
> [official Azure documentation](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal).

### Required VNet properties

We recommend using a VNet within the same Azure tenant as your Union.ai data plane. It should be configured with the following characteristics:

- A single subnet with an address prefix with `/19` CIDR mask. This is used for Kubernetes nodes.
- One to five subnets with an address prefix with `/14` to `/18` CIDR mask. This is used for Kubernetes pods. `/14` is preferable to mitigate IP exhaustion. It is common to start with one subnet for initial clusters and add more subnets as workloads scale.
- An non-allocated (i.e., no subnet) `/19` CIDR range that will be retained for service CIDRs.
- Within the CIDR range, choose a single IP address that will be used for internal DNS. This IP address should not be the first address within the CIDR range.
- (Recommended): Enable [virtual network service endpoints](https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview) `Microsoft.Storage`, `Microsoft.ContainerRegistry`, and `Microsoft.KeyVault`.
- (Recommended) Create a [NAT gateway for virtual network](https://learn.microsoft.com/en-us/azure/nat-gateway/quickstart-create-nat-gateway-portal) egress traffic. This allows scaling out public IP addresses and limit potential external rate limiting scenarios.

Once your VPC is set up, provide the following to Union.ai:

- The Virtual Network's subscription ID.
- The Virtual Network's name.
- The Virtual Network's resource group name.
- The Virtual Network's subnet name used for Kubernetes nodes.
- The Virtual Network's subnet names used for Kubernetes pods.
- The CIDR range intended to use for Kubernetes services.
- The IP address to be used for internal DNS.

### Example VPC CIDR Block allocation

- `10.0.0.0/8` for the VPC CIDR block.
- `10.0.0.0/19` for the Kubernetes node specific subnet.
- `10.4.0.0/14` for the initial Kubernetes pods specific subnet.
  - `10.8.0.0/14`, `10.12.0.0/14`, `10.16.0.0/14`, `10.20.0.0/14` for any future Kubernetes pod specific subnets.
- `10.0.96.0/19` unallocated for Kubernetes services.
- `10.0.96.10` for internal DNS.

## Union.ai Maintenance Windows

Union.ai configures a four hour maintainence window to run monthly on the first Sunday at 3AM with respect to the Azure location's timezone.

> [!NOTE] Setting up Tasks for Fault Tolerance
> During this time window Flyte execution pods could be potentially interrupted.
> We recommend leveraging
> [Flyte fault tolerance](https://docs.flyte.org/en/latest/concepts/tasks.html#fault-tolerance) and
> [checkpointing](https://docs.flyte.org/en/latest/user_guide/advanced_composition/intratask_checkpoints.html)
> to efficiently minimize failed executions.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/data-retention-policy ===

# Data retention policy

Data retention polices allow you to control what data is stored in your data plane and for how long.
This allows you to reduce costs by ensuring that you only keep data that you actually need.

Each data plane has its own Union.ai-internal object store (an AWS S3 bucket, GCS bucket or ABS container) that is used to store data used in the execution of workflows.
As a Union.ai administrator, you can specify retention policies for this data when setting up your data plane.
The policies are specified in discussion with the Union.ai team when you set up your Union.ai instance.
They are not adjustable through the UI or CLI.

## Data categories

The retention policy system distinguishes three categories of data:

1. Workflow execution data:
   - Task inputs and outputs (that is, primitive type literals)
   - `FlyteFile`/`FlyteDirectory` and other large offloaded data objects (like `DataFrame`s) both in their default locations and in any custom `raw-data-prefix` locations that may have been specified at execution time
   - Flyte `Deck` data.
   - Artifact data.
   - Internal metadata used by Union.ai.
2. Fast-registered code:
   - Local code artifacts that will be copied into the Flyte task container at runtime when using `union register` or `union run --remote --copy-all`.
3. Flyte plugin metadata (for example, Spark history server data).

Each category of data is stored in a separate Union.ai-managed object store bucket and versioning is enabled on these buckets.
This means that two separate retention policies can be specified for each data category: one for current versions and one for non-current versions.
The result is that there are four distinct retention policies to specify (though in most cases you can stick with the defaults, see below).

> [!NOTE] Object versions are not the same as Union.ai entity versions
> The versions discussed here are at the object level and are not related to the versions of workflows,
> tasks and other Union.ai entities that you see in the Union.ai UI.

## How policies are specified

A policy determines how long data in a given category and version-state (current vs. non-current) will be retained in the object store before it is automatically deleted.

A policy is specified as a time period in days, or `unlimited` (in which case automatic data deletion is disabled for that category and version-state).

## Deletion of current versions

For current version, deletion due to a retention period running out means moving the object to a non-current version, which we refer to as _soft-deletion_.

## Deletion of non-current versions

For non-current versions, deletion due to a retention period running out means permanent deletion.

## Defaults

|                     | Workflow execution data | Fast-registered code | Flyte-plugin metadata |
| ------------------- | ----------------------- | -------------------- | --------------------- |
| Current version     | unlimited               | unlimited            | unlimited             |
| Non-current version | 7 days                  | 7 days               | 7 days                |

By default:

- The retention policy for _current versions in all categories_ is `unlimited`, meaning that auto-deletion is disabled.

  - If you change this to a specified number of days, then auto-deletion will occur after that time period, but because it applies to current versions the data object will be soft-deleted (that is, moved to a non-current version), not permanently deleted.

- The retention policy for _non-current versions in all categories_ is `7 days`, meaning that auto-deletion will occur after 7 days and that the data will be permanently deleted.

## Attempting to access deleted data

If you attempt to access deleted data, you will receive an error:

- When workflow node input/output data is deleted, the Input/Output tabs in the UI will display a _Not Found_ error.
- When `Deck` data is deleted, the `Deck` view in the UI will display a _Not Found_ error.
- When artifacts are deleted, the artifacts UI will work, but it will display a URL that points to no longer existing artifact.

To remedy these types of errors, you will have to re-run the workflow that generated the data in question.

- When fast registered code data is deleted, the workflow execution will fail.

To remedy this type of error, you will have to both re-register and re-run the workflow.

## Separate sets of policies per cluster

If you have a multi-cluster set up, you can specify a different set of retention policies (one per category) for each cluster.

## Data retention and task caching

When enabling data retention, task caching will be adjusted accordingly. To avoid attempts to retrieve cache data that has already been deleted, the `age` of the cache will always be configured to be less than the sum of both retention periods.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-aws-resources ===

# Enabling AWS resources

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

Components of your Union.ai data plane will need to connect to and communicate with other resources in your cloud environment such as **BYOC deployment > Enabling AWS resources > Enabling AWS S3**, **BYOC deployment > Enabling AWS resources > Enabling AWS ECR**, and so forth.

> [!NOTE] Secret management
> We strongly recommend using the [Union.ai secrets manager](https://www.union.ai/docs/v2/union/user-guide/task-configuration/secrets/page.md) to manage secrets rather than AWS Secrets Manager. If your organization must use AWS Secrets Manager, however, see **BYOC deployment > Enabling AWS resources > Enabling AWS Secrets Manager**.

As much as possible, access to the resources you need will be pre-configured by the Union.ai team when they set up your data plane.
For example, if you want your task code to have access to a specific S3 bucket or database, this can be pre-configured.
**You just have to inform the team of your specific requirements before the setup process begins**.

As your projects evolve, your needs may change.
You can always contact the Union.ai team for help enabling additional resources as required.

**There are also some cases where you may want to configure things on your own.**
**Below we give a general overview of these self-configuration options.**
**The sub-pages of this section give examples for specific resources.**

## Types of access

Broadly speaking, there are two categories of access that you are likely to have to deal with:

* **Infrastructure access**:
  Enabling access to a resource for your data plane infrastructure.
  The most common case occurs when you are using **BYOC deployment > Enabling AWS resources > Enabling AWS ECR** for your task container images, and it resides in an AWS account other than the one containing your data plane.
  In that case, some configuration is required to enable the Union.ai operator on your data plane to pull images from the registry when registering your workflows and tasks.
  **If you are using an ECR instance within the same AWS account as your data plane, then access is enabled by default and no further configuration is needed.**
* **Task code access**:
  Enabling access to a resource for your task code.
  For example, your task code might need to access **BYOC deployment > Enabling AWS resources > Enabling AWS S3** or **BYOC deployment > Enabling AWS resources > Enabling AWS Secrets Manager** at runtime.
  This involves granting permission to roles that are attached to the Kubernetes cluster within which your task code runs.

## Infrastructure-level access

The only infrastructure-level access issue you are likely to encounter is around access to an AWS Elastic Container Registry (ECR) _in an AWS account other than the one in which your data plane resides_.

**If your task container images are stored in an AWS Elastic Container Registry in the same AWS account as your data plane, then access is already enabled. You do not have to do anything.**

If your task container images reside in an ECR instance in **another AWS account** you will need configure that ECR instance to allow access from your data plane.
See **BYOC deployment > Enabling AWS resources > Enabling AWS ECR** for details.

## Task code access

When your task code runs, it executes within a pod in the Kubernetes cluster in your data plane.
To enable your task code to access cloud resources you must grant the appropriate permissions to a role that is attached to the Kubernetes cluster.

There are two main options for setting this up:

* **Project-domain-scoped access**: With this arrangement, you define the permissions you want to grant to your task code, and those permissions are applied only to specific project-domain pairs.
* **Global access**: With this arrangement, you define the permissions you want to grant to your task code, and those permissions are then applied to code in all your projects and domains.

Global access is recommended for most use cases since it is simpler, but if you have a compelling reason to restrict access, then the project-domain-scoped access is available, at the cost of some additional complexity in setup.

> [!NOTE] Relationship with RBAC
> The permissions being discussed here are attached to a project and domain.
> This is independent of the permissions granted to users and machine applications through Union.ai's role-based access control (see the user management documentation).
> But, the two types of permissions are related.
>
> For example, for a user (or machine application) to have read access to an S3 bucket, two things are required:
>
> * The user (or machine application) must have **execute** permission for the project and domain where the code that does the reading resides.
> * The project and domain must have read permission for the S3 bucket.

## Background

As you know, your workflows and tasks run in a Kubernetes cluster within your data plane.
Within that cluster, the Kubernetes pods allocated to run your task code are organized as follows:

* The set of task pods is partitioned into namespaces where each namespace corresponds to a project-domain pair.
* All workflows running in a given project and domain are run on pods within that namespace.
  For example, code in the `development` domain of project `foo` runs in the namespace `foo-development` while code in the `staging` domain of project `bar` runs in the namespace `bar-staging`, and so forth.
* By default, all project-domain namespaces are bound to a common IAM role which we will refer to as `<UserFlyteRole>`.
  Its actual name differs from organization to organization. **The actual name will have the form `<YourOrgPrefix>-userflyterole`**.
* The role `<UserFlyteRole>` has an attached policy called `userflyterole`.
  This policy contains all the permissions granted to your task code when your data plane was set up.
  If you requested permissions for resources specific to your organization at set up time, they will have been added here.

> [!NOTE] `<UserFlyteRole>` vs `userflyterole`
> The entity that we refer to here as `<UserFlyteRole>` is an IAM role.
> As mentioned the actual name of this role in your system will be of the form `<YourOrgPrefix>-userflyterole.`
>
> By default, this role has an attached IAM policy called `userflyterole`.
> This is the literal name used in all AWS-based data planes.
>
> **Be aware of the difference and don't get these two things confused!**

> [!NOTE] `<UserFlyteRole>`vs `<AdminFlyteRole>`
> In addition to the task pods, your cluster also contains pods that run Union.ai services, which are used to manage tasks and to connect your cluster to the control plane.
> These pods are bound to a different default role, `<AdminFlyteRole>` (again, its actual name differs from organization to organization).
> The separation of this role from `<UserFlyteRole>` serves to provide isolation between Union.ai administrative logic and your workflow logic.
>
> **You should not alter any settings associated with `<AdminFlyteRole>`**.

## Enabling access

To enable your task code to access a resource:

* **BYOC deployment > Enabling AWS resources > Enabling access > Creating a custom policy** that grants the appropriate permissions for your resource.
  This is the step where you define exactly which permissions you want to grant (read-only, read/write, list, etc.).
  The name of this policy is yours to determine.
  Here we will refer to it as `<CustomPolicy>`.

You can then choose whether to enable **global access** or **project-domain-scoped access**:

* **BYOC deployment > Enabling AWS resources > Enabling access > Setting up global access** to the resource, you simply attach `<CustomPolicy>` to the existing `<UserFlyteRole>`.
* **BYOC deployment > Enabling AWS resources > Enabling access > Setting up project-domain-scoped access** to your resource:
  * Create your own custom role (let's refer to it `<CustomRole>`)
  * Attach `<CustomPolicy>` to `<CustomRole>`.
  * Also, attach the policy called `userflyterole` to `<CustomRole>` (this will ensure that `<CustomRole>` has all the default permissions needed to allow tasks to run).
  * Attach `<CustomRole>` to the desired project-domain namespace.

![](../../../_static/images/user-guide/integrations/enabling-aws-resources/union-roles.png)

### Creating a custom policy

Regardless of which route you take (global vs project-domain-scoped) the first step is to create a policy that grants the desired permissions to your resource.

To create a new policy:

* Go to **IAM > Access management > Policies**.
* Select **Create policy**.
* Go through the sections of the visual editor to define the permissions you wish to grant.
  * Alternatively, you can paste a JSON definition directly into the JSON editor.
  * The details of what permissions to grant depend on the resource in question and the access you wish to grant.
    Specific examples are covered in **BYOC deployment > Enabling AWS resources > Enabling AWS S3** and **BYOC deployment > Enabling AWS resources > Enabling AWS Secrets Manager**.
* Proceed through the steps of the wizard, give your policy a name (which we will call `<CustomPolicy>`), and select **Create policy**.
* Record the name and ARN of your policy.
  Here we will refer to the ARN is `<CustomPolicyArn>`.

### Setting up global access

To set up global access, you must bind the `<CustomPolicy>` that you created above to the role `<UserFlyteRole>`.

> [!NOTE]
> As mentioned above, the actual name of `<UserFlyteRole>` has the form:
>
> **`<YourOrgPrefix>-userflyterole`**
>
> You should be able to find the role by searching in your AWS IAM console for roles with names that follow that pattern.

* Go to **IAM > Access management > Roles**.
* Find `<UserFlyteRole>` and select the checkbox beside it.
* In the **Add Permissions** drop-down menu, select **Attach Policies**.
* In the displayed list find `<CustomPolicy>` and select its checkbox, then select **Add permissions**.

> [!NOTE]
> Alternatively, you can perform the binding from the command line like this:
>
> ```bash
> $ aws iam attach-role-policy \
>  --policy-arn <CustomPolicyArn> \
>  --role-name <UserFlyteRole>
> ```
>
> Notice that in this case, you have to use `<CustomPolicyArn>` here instead of `<CustomPolicy>`.

**At this point, all task code in your organization will have access to the cloud resource as defined by your custom policy.**

### Setting up project-domain-scoped access

To set up project-domain-scoped access, you do this:

In AWS:

* Create the IAM role, `<CustomRole>`.
* Add the `userflyterole` policy to `<CustomRole>`.
* Add `<CustomPolicy>` to `<CustomRole>`.

In Union.ai (using `uctl`):

* Bind `<CustomRole>` to the project-domain pair desired.

### Create the IAM role

1. Sign in to the AWS Management Console as an administrator of your account, and open the IAM console.
2. In the navigation pane, choose **Roles** and then choose **Create role**.
3. Choose the **Web identity** role type.
4. In the **Identity provider** dropdown select `oidc.eks.<Suffix>.`Record this name.
5. Choose `sts.amazonaws.com` as the **Audience** and select **Next**.
6. On the **Add permissions** page, search for the `userflyterole` policy and check the box beside it and select **Next**.
7. Enter a name and description for this role.
8. Under **Step 1: Select trusted entities**, click edit and _replace_ the `Condition` block with the following, where `oidc.eks.<Suffix>` is the value from step 4, and `<Project>`, and `<Domain>` are the Union.ai project and domain pairs you want to set custom permissions for. Repeat for each project-domain pair.

```json
"Condition": {
    "StringEquals": {
        "oidc.eks.<Suffix>:sub": [
            "system:serviceaccount:<Project>-<Domain1>:default",
            "system:serviceaccount:<Project>-<Domain2>:default"
        ]
    }
}
```

9. Add additional permissions as needed, following [these steps](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).
10. Select **Create role**.
11. In the **Summary** section of the new role's details pane, note the ARN value.

### Configure the cluster to use the new IAM role

Repeat the following steps for each project-domain pair:

1.  Create a file named `cluster_resource_attributes.yaml` with the following contents:

```yaml
attributes:
defaultUserRoleValue: <ARN from step 11 above>
domain: <domain>
project: <project>
```

2.  Run the following command to override the IAM role used for Union.ai Tasks in this Project-Domain:

```bash
uctl update cluster-resource-attribute --attrFile cluster_resource_attributes.yaml
```

3.  You can verify the overrides by running:

```bash
uctl get cluster-resource-attribute -p <project> -d <domain>
```

**At this point, only code in your chosen project-domain pairs will have access to the cloud resource as defined by your custom policy.**

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-aws-resources/enabling-aws-s3 ===

# Enabling AWS S3

For Union.ai customers whose data plane is in AWS, we walk through setting up access to your own AWS S3 bucket.

> [!NOTE] AWS S3 in the Union.ai environment
> Your data plane is set up with a Kubernetes cluster and other resources.
> Among these are a number of S3 buckets used internally by the Union.ai operator running in the cluster (see [Platform architecture](../platform-architecture)) to store things like workflow metadata.
>
> **These **_**are not**_** the S3 bucket we are talking about in this section.**
>
> **We are discussing the case where you have **_**your own S3 bucket**_** that you set up to store input and output data used by your workflows.**

## Add permissions to your custom policy

In order to enable access to an AWS resource (in this case S3) you need to create a custom policy in AWS IAM with the required permissions and attach it to either the existing _User Flyte Role_ associated with your data plane Kubernetes cluster or to a custom role which you have created and attached to the cluster.
The general procedure is covered in **BYOC deployment > Enabling AWS resources > Enabling AWS S3**.

_In order to enable S3 access in particular, in the step_ [#add-permissions-to-your-custom-policy](./enabling-aws-s3#add-permissions-to-your-custom-policy) _you must specify the needed permissions. For example:_

- `s3:ListBucket` - This permission allows you to list the objects in the bucket.
- `s3:GetObject` - This permission allows you to retrieve objects from the bucket.
- `s3:PutObject` - This permission allows you to upload objects to the bucket.

Here is a sample JSON policy document that grants these permissions:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowReadWriteBucket",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::<BucketName>/*",
        "arn:aws:s3:::<BucketName>"
      ]
    }
  ]
}
```

In the `Resource` field, replace `<BucketName>` with the actual name of your S3 bucket.

## Accessing S3 from your task code

Once you have enabled access to your S3 bucket, you can use the standard [AWS SDK for Python (Boto3)](https://aws.amazon.com/sdk-for-python/) in your task code to read and write to it.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-aws-resources/enabling-aws-ecr ===

# Enabling AWS ECR

## Access to ECR in the same account is enabled by default

When registering tasks and workflows, the Union.ai infrastructure in your data plane must have access to the container registry that holds the task container images you will be using.
If your data plane is on AWS then you may want to use AWS Elastic Container Registry (ECR) to store these images.

For details on how to use ECR when building and deploying your workflows, see the ImageSpec with ECR documentation.

**In most cases, you will be using an ECR instance in the same AWS account as your data plane.**
**If this is the case, then you do not need to configure anything.**
**Access to ECR in the same account is enabled by default.**

## Enabling cross-account access to ECR

If you want to store your task container images in an ECR instance in an AWS account _other than the one that holds your data plane_, then you will have to configure that ECR instance to permit access from your data plane.
Here are the details:

* Your Union.ai data plane comes pre-configured with a specific role, which we will refer to here as `<FlyteWorkerNodeGroupRole>`.
* The actual name of this role depends on your organization's name. It will be of the form `unionai-<YourOrganizationName>-flyteworker-node-group`.

To enable access to the ECR instance in the other account, do the following:

* In your data plane AWS account, Go to **IAM > Roles**.
Find the role `<FlyteWorkerNodeGroupRole>` and copy the ARN of that role.
We will call this `<FlyteWorkerNodeGroupRoleARN>`.
* In the other AWS account (the one that contains the ECR instance), go to **Amazon ECR > Repositories**,
* Find the ECR repository you want to enable and under **Permissions**, select **Edit,** then **Add Statement**.
* Specify the `<FlyteWorkerNodeGroupRoleARN>` as a **Principal** and add (at least) the following permissions:
  * `ecr:BatchCheckLayerAvailability`: This permission allows your data plane to check the availability of image layers in the registry.
  * `ecr:GetDownloadUrlForLayer`: This permission allows your data plane to retrieve a pre-signed URL that is required to download the image layers.
  * `ecr:BatchGetImage`: This permission allows your data plane to retrieve image manifests and image layer information from the registry.
* To specify the above parameters via JSON, select **Edit policy JSON** and use the following policy document:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": "<FlyteWorkerNodeGroupRoleARN>"
      },
      "Action": [
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "ecr:BatchCheckLayerAvailability"
      ]
    }
  ]
}
```

* Select **Save**.

Your Union.ai data plane infrastructure should now be able to pull images from the ECR instance. For more information see [How can I allow a secondary account to push or pull images in my Amazon ECR image repository?](https://repost.aws/knowledge-center/secondary-account-access-ecr)

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-aws-resources/enabling-aws-secrets-manager ===

# Enabling AWS Secrets Manager

> [!NOTE]
> This documentation is for customers who must use AWS Secrets Manager for organizational reasons. For everyone else, we strongly recommend using the
> [Union.ai secrets manager](https://www.union.ai/docs/v2/union/user-guide/task-configuration/secrets) to manage secrets rather than AWS Secrets Manager.

To enable your code to access secrets from AWS Secrets Manager you will need to

* Make sure AWS Secrets Manager is enabled.
* Create your secrets in AWS Secrets Manager.
* Create an AWS policy granting access to your secrets.
* Bind that policy to the User Flyte Role in your Union.ai data plane.
* Retrieve your secrets from within your workflow code.

## Ensure that AWS Secrets Manager is enabled

The first step is to make sure that AWS Secrets Manager is enabled in your AWS environment.
Contact the Union.ai team if you are unsure.

## Create your secrets

> [!NOTE]
> Secrets must be defined within the same region as your Union.ai data plane.
> For example, if your Union.ai data plane is located in `us-west-2`, ensure that the secrets are also in `us-west-2`.

Create your secrets in **AWS Secrets Manager** (see the [AWS documentation](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) for details):

* Go to **AWS Secrets Manager**.
* Select **Store a new secret**.
* Under **Choose Secret type**:
  * Select **Other type of secret**.
  * Select **Plaintext** (**Key/value** is not supported).
  * Enter your **secret value**.
  * For **Encryption key,** leave the default setting: `aws/secretmanager`.
  * Select **Next**.
* Under **Configure secret**:
  * For **Secret name**, enter a string (this string will form part of the `SECRET_KEY` that you will use to access your secret from within your code).
  * Select **Next**.
* Under **Configure rotation** adjust the settings if needed, or skip the section if not. Then select **Next**.
* Under **Review** check that everything is correct and then select **Store**.

## Get the secret ARN

Once you have created a secret, navigate to **AWS Secrets Manager > Secrets** and select the secret you just created.
From there select **Secret ARN** and record the ARN.
Do this for each secret that you create.

A secret ARN looks like this:

```bash
arn:aws:secretsmanager:<Region>:<AccountId>:secret:<SecretName>-<SixRandomCharacters>
```

> [!NOTE]
> You will need your secret ARN when you access your secret from within your code.
> Specifically, you will need to divide it into two strings:
>
> * **`SECRET_GROUP`**: The part of the ARN up to and including `:secret:`
> Above, it is `arn:aws:secretsmanager:<Region>:<AccountId>:secret:`.
>
> * **`SECRET_KEY`**: The part of the ARN after `:secret:`
> Above, it is `<SecretName>-<SixRandomCharacters>`.
>
> See [Using AWS secrets in your code](./enabling-aws-secrets-manager#using-aws-secrets-in-your-task-code) for details on how these are used.

## Create a policy providing access to your secrets

To provide access to your newly created secrets in your code, you will first need to create a policy that grants read access to those secrets:

* Go to **IAM > Access management > Policies**.
* Select **Create Policy**.
* Open the **JSON** tab and paste in the following definition:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "arn:aws:secretsmanager:<Region>:<AccountId>:secret:*"
    }
  ]
}
```

> [!NOTE]
> The`Resource`entry takes a wildcard string that must match the ARNs of the secrets in your environment that you want to grant access to.
> This can be all the secrets in your environment (as shown above) or some subset (achieved by making the wildcard match more specific).
> Be sure to substitute the appropriate`<Region>`and`<AccountNumber>`.

* Select **Next: Tags** and add tags if you wish.
* Select **Next: Review** and enter a **Name** for the policy
* Select **Create Policy**.
* Find your newly created policy in the policy list that comes up next and select it.
* Record the **Policy Name** and **Policy ARN** of your newly created policy.
It should be at the top of the policy summary page.
We will refer to the name as `<SecretManagerPolicyName>` and the ARN as `<SecretManagerPolicyArn>`.

> [!NOTE]
> Alternatively, you can create the policy from the command line like this (remember to substitute the`<Region>`and`<AccountId>`appropriately):
>
> ```bash
> $ aws iam create-policy \
>       --policy-name <YourPolicyName> \
>       --policy-document \
>       { \
>         "Version": "2012-10-17", \
>         "Statement": [ \
>           { \
>             "Effect": "Allow", \
>             "Action": "secretsmanager:GetSecretValue", \
>             "Resource": "arn:aws:secretsmanager:<Region>:<AccountId>:secret:*" \
>           } \
>         ]\
>       }
> ```

## Bind the policy to the User Flyte Role

To grant your code the permissions defined in the policy above, you must bind that policy to the `<UserFlyteRole>` used in your Union.ai data plane.
The precise name of this role differs by organization.
You will need this name as well as the ARN of the policy (`<SecretManagerPolicyArn>`, above) to perform the binding.
See **BYOC deployment > Enabling AWS resources > Enabling AWS Secrets Manager** for directions. Once the binding is done, your secrets are now accessible from within your Flyte code.

## Using AWS secrets in your task code

To use an AWS secret in your task code, do the following:

* Define a `Secret` class using the `SECRET_GROUP` and `SECRET_KEY` derived from the secret ARN, above, and pass it in the `secret_requests` parameter of the `@union.task` decorator.
* Inside the task code, retrieve the value of the secret with a call to\
  `union.current_context().secrets.get(SECRET_GROUP, SECRET_KEY)`.

Here is an example:

```python
import union

SECRET_GROUP = "arn:aws:secretsmanager:<Region>:<AccountId>:secret:"
SECRET_KEY = "<SecretName>-<SixRandomCharacters>"
SECRET_REQUEST = union.Secret(
  group=SECRET_GROUP,
  key=SECRET_KEY,
  mount_requirement=union.Secret.MountType.FILE
)

@union.task(secret_requests=[SECRET_REQUEST])
def t1():
    secret_val = union.current_context().secrets.get(
        SECRET_GROUP,
        group_version=SECRET_GROUP_VERSION
    )
    # do something with the secret. For example, communication with an external API.
    ...
```

> [!WARNING]
> Do not return secret values from tasks, as this will expose secrets to the control plane.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-gcp-resources ===

# Enabling GCP resources

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

Components of your Union.ai data plane will need to connect to and communicate with other resources in your cloud environment such as **BYOC deployment > Enabling GCP resources > Enabling Google Cloud Storage**, **BYOC deployment > Enabling GCP resources > Enabling Google Artifact Registry**, **BYOC deployment > Enabling GCP resources > Enabling BigQuery**, and so forth.

> [!NOTE] Secret management
> We strongly recommend using the [Union.ai secrets manager](https://www.union.ai/docs/v2/union/user-guide/task-configuration/secrets/page.md) to manage secrets rather than Google Secret Manager. If your organization must use Google Secret Manager, however, see **BYOC deployment > Enabling GCP resources > Enabling Google Secret Manager**.

As much as possible, access to the resources you need will be pre-configured by the Union.ai team when they set up your data plane.
For example, if you want your task code to have access to a specific Cloud Storage bucket or BigQuery, this can be pre-configured.
**You just have to inform the team of your specific requirements before the setup process begins**.

As your projects evolve, your needs may change.
You can always contact the Union.ai team for help enabling additional resources as required.

**There are also some cases where you may want to configure things on your own.**
**Below we give a general overview of these self-configuration options.**
**The sub-pages of this section give examples for specific resources.**

## Types of access

Broadly speaking, there are two categories of access that you are likely to have to deal with:

* **Infrastructure access**:
  Enabling access to a resource for your data plane infrastructure.
  The most common case occurs when you are using Artifact Registry for your task container images and it resides in a project other than the one containing your data plane.
  In that case, some configuration is required to enable the Union.ai operator on your data plane to pull images from the registry when registering your workflows and tasks.
  **If you are using an Artifact Registry instance within the same project as your data plane, then access is enabled by default and no further configuration is needed.**
* **Task code access**:
  Enabling access to a resource for your task code.
  For example, your task code might need to access Cloud Storage or Secret Manager at runtime.
  This involves granting permission to a Google Service Account (GSA) that is attached to the Kubernetes cluster within which your task code runs.

## Infrastructure-level access

The only infrastructure-level access issue you are likely to encounter is around access to an Artifact Registry _in a GCP project other than the one in which your data plane resides_.

**If your task container images are stored in an Artifact Registry in the same GCP project as your data plane, then access is already enabled. You do not have to do anything.**

If your task container images reside in an Artifact Registry instance in **another GCP project** you will need to configure that instance to allow access from your data plane.
See Enabling Artifact Registry for details.

## Task code access

When your task code runs, it executes within a pod in the Kubernetes cluster in your data plane.
To enable your task code to access cloud resources you must grant the appropriate permissions to the Google Service Account (GSA) attached to the Kubernetes cluster.

There are two main options for setting this up:

* **Domain-scoped access**: With this arrangement, you define the permissions you want to grant to your task code, and those permissions are applied only to a specific domain.
* **Global access**: With this arrangement, you define the permissions you want to grant to your task code, and those permissions are then applied to code in all your projects and domains.

> [!NOTE] GCP only supports scoping by domain
> In AWS-based data planes, scoping by both project _and_ domain is supported.
> However, due to intrinsic architectural constraints, GCP-based data planes only support scoping by domain.

Global access is recommended for most use cases since it is simpler, but if you have a compelling reason to restrict access, then the project-domain-scoped access is available, at the cost of some additional complexity in setup.

> [!NOTE] Relationship with RBAC
> The permissions being discussed here are attached to a domain.
> This is independent of the permissions granted to users and machine applications through Union.ai's role-based access control (see [User management](https://www.union.ai/docs/v2/union/user-guide/user-management/page.md)).
> But, the two types of permissions are related.
>
> For example, for a user (or machine application) to have read access to a Cloud Storage bucket, two things are required:
>
> * The user (or machine application) must have **execute** permission for the project and domain where the code that does the reading resides.
> * The domain must have read permission for the Cloud Storage bucket.

## Domain-scoped access

**Because of the way that GCP works internally, domain-scoped access can only be configured by the Union.ai team.**

Please work directly with the Union.ai team if you have requirements that involve domain-scoped access to cloud resources.

If you need to add or change domain-scoped access after your data plane has been set up, you should also contact the team.

## Globally-scoped access

You can manage the configuration of globally-scoped access to GCP resources yourself without involving the Union.ai team.

In a GCP-based Union.ai data plane, globally-scoped access to resources is mediated by a single Google Service Account (GSA) that is configured as part of the data plane setup.
We refer to it as `<UserFlyteGSA>`.

`<UserFlyteGSA>` is bound to all the pods in your data plane's Kubernetes cluster that run your Flyte code.

To enable access to a resource in GCP you grant `<UserFlyteGSA>`access to that resource and assign it a role that includes the permissions that you want your code to have.

> [!NOTE] `<UserFlyteGSA>`
> Here we refer to the default global-access GSA as`<UserFlyteGSA>`because the precise name differs across installations.
> This GSA is identified by name and email of the following form:
>
> * Name: `<OrgName>-userflyterol-<Suffix>`
> * Email: `<OrgName>-userflyterol-<Suffix>@<OrgName>-gcp-dataplane.iam.gserviceaccount.com`

> [!NOTE] Google Service Account (GSA)
> We use the term Google Service Account (GSA) to refer to the accounts that are managed in the GCP console under **IAM & Admin > Service Accounts**.
> This is to distinguish them from Kubernetes Service Accounts (KSAs).
> KSAs are a distinct type of service account managed _within_ the Kubernetes cluster. You will not normally encounter these at the data plane level.

## Find the actual name of `<UserFlyteGSA>`

In this section we refer to the default global-access GSA as`<UserFlyteGSA>`because the precise name differs across installations. The actual name and email of this GSA have the following forms:

* Name: `<OrgName>-userflyterol-<Suffix>`
* Email: `<OrgName>-userflyterol-<Suffix>@<OrgName>-gcp-dataplane.iam.gserviceaccount.com`

**You will need to have the email identifier of this role on hand when you enable access to resources for your task code.**

To find the actual name of this GSA do the following:

* In the GCP data plane project, go to **IAM & Admin > Service accounts**.
* In the list of service account, find the one whose name and email match the pattern above. For example:

![](../../../_static/images/user-guide/integrations/enabling-gcp-resources/user-flyte-gsa.png)

* Copy this name to document in an editor.
  You will need it later to configure each specific resource.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-gcp-resources/enabling-google-cloud-storage ===

# Enabling Google Cloud Storage

For Union.ai customers whose data plane is in GCP, we walk through setting up access to your own Google Cloud Storage bucket.

> [!NOTE] Google Cloud Storage in the Union.ai environment
> Your data plane is set up with a Kubernetes cluster and other resources.
> Among these are a number of Google Cloud Storage (GCS) buckets used internally by the Union.ai operator running in the cluster (see [Platform architecture](../platform-architecture)) to store things like workflow metadata.
>
> **These are not the GCS buckets we are talking about in this section.**
>
> **We are discussing the case where you have **_**your own GCS bucket**_** that you set up to store input and output data used by your workflows.**

## Grant `<UserFlyteGSA>` access to the bucket

To enable access to a GCS bucket you have to add the `<UserFlyteGSA>` Google Service Account as a principal to that bucket and assign it a role that includes the permissions that you want your code to have.

* Find the actual name and email of the `<UserFlyteGSA>` in your Union.ai data plane GCP project (See [Find the actual name of `<UserFlyteGSA>`](_index#find-the-actual-name-of-userflytegsa))
* Go to **Cloud Storage > Buckets** and select the bucket to which you want to grant access.
* In the **Bucket details** view select the **Permissions** tab and then select **GRANT ACCESS**:

![](../../../_static/images/user-guide/integrations/enabling-gcp-resources/enabling-google-cloud-storage/bucket-details.png)

* In the **Grant access** panel:
  * Under **Add principals**, paste the actual name (in email form) of the `<UserFlyteGSA>` into the **New principals** field.
  * Under **Assign roles** add as many roles as you need.
    In the example below we add the roles enabling reading and writing: **Storage Object Viewer** and **Storage Object Creator**.

![](../../../_static/images/user-guide/integrations/enabling-gcp-resources/enabling-google-cloud-storage/grant-access-to-bucket.png)

* Click **SAVE**.

Your bucket should now be **globally accessible** to task code in all Flyte projects and domains in your Union.ai organization.

> [!NOTE] Domain-scoped permissions are not self-service
> If you want to assign permissions in a more fine-grained way, per project and/or domain, you need to contact the Union.ai team.
> See [Domain-scoped access](_index#domain-scoped-access).

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-gcp-resources/enabling-google-artifact-registry ===

# Enabling Google Artifact Registry

## Access to Artifact Registry in the same project is enabled by default

When registering tasks and workflows, the Union.ai infrastructure in your data plane must have access to the container registry that holds the task container images you will be using.
If your data plane is on GCP then you may want to use Google Artifact Registry (GAR) to store these images.

**In most cases, you will be using a GAR repository in the same GCP project as your data plane.**
**If this is the case, then you do not need to configure anything.**
**Access to GAR in the same project is enabled by default.**

## Enabling cross-project access to Artifact Registry

If you want to store your task container images in a GAR repository in a GCP project _other than the one that holds your data plane_, you must enable the node pool of your data plane to access that GAR.
This is the infrastructure-level access that we discussed [earlier](_index#infrastructure-level-access).
It is mediated by the a specific Google Service Account (GSA) which we will refer to here as `<FlyteWorkerGSA>`
(recall that this is in contrast to the task code access, which is mediated by a different default GSA, `<UserFlyteGSA>`).

> [!NOTE] `<FlyteWorkerGSA>`
> Here we refer to the default global-access GSA as`<FlyteWorkerGSA>`because the precise name differs across installations.
> This GSA is identified by name and email of the following form:
>
> * Name: `<OrgName>-flyteworker-<Suffix>`
> * Email: `<OrgName>-flyteworker-<Suffix>@<OrgName>-gcp-dataplane.iam.gserviceaccount.com`

To enable access to the GAR repository in the other account, do the following:

* In your data plane GCP project, go to **IAM > Service Accounts**.
  Find the GSA `<FlyteWorkerGSA>` and copy its email.
  We will call this `<FlyteWorkerGSAEmail>`.
* In the other GCP project account (the one that contains the GAR instance), go to **Artifact Registry > Repositories**.
* Find the GAR repository you want to enable and select the checkbox beside it.
* Under **Permissions** in the side panel, select **Add Principal**.
* Specify the `<FlyteWorkerGSAEmail>` as a **Principal** and assign (at least) the role **Artifact Registry Reader**.
* Select **Save**.

Your Union.ai data plane infrastructure should now be able to pull images from the GAR repository.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-gcp-resources/enabling-google-secret-manager ===

# Enabling Google Secret Manager

> [!NOTE]
> This documentation exists for customers who must use Google Secret Manager for organizational reasons. For everyone else, we strongly recommend using the
> [Union.ai secrets manager](https://www.union.ai/docs/v2/union/user-guide/task-configuration/secrets) to manage secrets rather than Google Secret Manager.

Access to a secret stored in Secret Manager in the same GCP project as the data plane is enabled by default.
All you need to do is:

* Create your secrets in Secret Manager.
* Retrieve your secrets from within your task code.

To access a secret stored in Secret Manager in a GCP project _other than the one that holds your data plane_ requires one additional step:
Granting the `<UserFlyteGSA>` (see **BYOC deployment > Enabling GCP resources > Enabling Google Secret Manager**) access to top the secret in the other projects.

## Create your secrets

Create your secrets in **Secret Manager** (see the [Secret Manager documentation](https://cloud.google.com/secret-manager/docs) for details):

* Go to **Security > Secret Manager**.
* Select **CREATE SECRET** at the top of the page.
* Fill in the **Name**, **Value,** and (optionally) the other parameters.
* Select **CREATE SECRET** at the bottom of the page.

Your secret should now be on the secrets list:

![](../../../_static/images/user-guide/integrations/enabling-gcp-resources/enabling-google-secret-manager/secret-manager.png)

Above we see a secret named `example-secret`.
Clicking on it will bring us to the **Secret details** page:

![](../../../_static/images/user-guide/integrations/enabling-gcp-resources/enabling-google-secret-manager/secret-details.png)

The secret has three important identifiers:

* The **GCP secret name**, in this case `example-secret`.
  You will need this if you are accessing a secret in the same project as your data plane.
* The **GCP secret path**, in this case `projects/956281974034/secrets/example-secret`.
  You will need this if you are accessing a secret in a different project from your data plane project.
* The **GCP secret version**, in this case `1`.
  This is required for both same- and cross-project cases.

## Same-project secrets

If your secret is stored in the Secret Manager of the same project as your data plane then the `<UserFlyteGSA>` will have access to it out-of-the-box.
No further configuration is necessary.

To use a same-project GCP secret in your task code, do the following:

* Define a `Secret` object where
  * `Secret.group` is the **GCP secret name**, in this case `example-secret`(optionally, you can use the **GCP secret path** instead, but the simple name is sufficient).
  * `Secret.group_version` is the **GCP secret version** (in this case `1`)
  * `Secret.mount_requirement` is `Secret.MountType.FILE`
* Pass that `Secret` object in the `secret_requests` parameter of the `@union.task` decorator.
* Inside the task code, retrieve the value of the secret with a call to
  `union.current_context().secrets.get(SECRET_GROUP, group_version=SECRET_GROUP_VERSION)`.

Here is an example:

```python
import union

SECRET_GROUP = "example-secret"
SECRET_GROUP_VERSION = "1"
SECRET_REQUEST = Secret(
            group=SECRET_GROUP,
            group_version=SECRET_GROUP_VERSION,
            mount_requirement=union.Secret.MountType.FILE
        )

@union.task(secret_requests=[SECRET_REQUEST])
def t1():
    secret_val = union.current_context().secrets.get(
        SECRET_GROUP,
        group_version=SECRET_GROUP_VERSION
    )
```

## Cross-project secrets

If your secret is stored in the Secret Manager of a project other than the one containing your data plane, then you will first need to grant the `<UserFlyteGSA>` permission to access it:

* Find the **email identifier** of the `<UserFlyteGSA>` in your data plane GCP project (see **BYOC deployment > Enabling GCP resources > Enabling Google Secret Manager** for details).
* Go to **Security > Secret Manager** in the GCP project that contains your secret.
* Select the secret that you want to access and select **GRANT ACCESS**.
* In the subsequent panel, under **Add principals**, paste in the email identifier of the `<UserFlyteGSA>` that you found above.
* Under **Assign roles** add at least the role **Secret Manager Secret Accessor**.
* Save the changes.

At this point, your task code will have access to the secret in the other project. To use that secret in your task code, do the following:

* Define a `union.Secret` object where
  * `union.Secret.group` is the **GCP secret path** (in this case, `projects/956281974034/secrets/example-secret`)
  * `union.Secret.group_version` is the **GCP secret version** (in this case `1`)
  * `union.Secret.mount_requirement` is `union.Secret.MountType.FILE`
* Pass that `union.Secret` object in the `secret_requests` parameter of the `@union.task` decorator.
* Inside the task code, retrieve the value of the secret with a call to\
`union.current_context().secrets.get(SECRET_GROUP, group_version=SECRET_GROUP_VERSION)`

> [!NOTE] GCP secret name vs GCP secret path
> In your task code, the only difference between using a same-project secret and a cross-project secret is
>
> * With a _same-project secret,_ you can use either the **GCP secret name** or the **GCP secret path** as the value of the parameter `union.Secret.group`.
> * With a _cross-project secret,_ you must use the **GCP secret path** as the value of the parameter `union.Secret.group`.

Here is an example:

```python
import union

SECRET_GROUP = "projects/956281974034/secrets/example-secret"
SECRET_GROUP_VERSION = "1"
SECRET_REQUEST = union.Secret(
            group=SECRET_GROUP,
            group_version=SECRET_GROUP_VERSION,
            mount_requirement=union.Secret.MountType.FILE
        )

@union.task(secret_requests=[SECRET_REQUEST])
def t1():
    secret_val = union.current_context().secrets.get(
        SECRET_GROUP,
        group_version=SECRET_GROUP_VERSION
    )
    # do something with the secret. For example, communication with an external API.
    ...
```

> [!WARNING]
> Do not return secret values from tasks, as this will expose secrets to the control plane.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-gcp-resources/enabling-bigquery ===

# Enabling BigQuery

For customers using the Google Cloud Platform as the data plane, Union.ai lets you easily pull data from BigQuery into your workflows. For most users on GCP, access to BigQuery should be enabled by default and bound to the service account used by the BigQuery connector.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-azure-resources ===

# Enabling Azure resources

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

Components of your Union.ai data plane will need to connect to and communicate with other resources in your Azure cloud environment, such as Azure [Blob Storage](https://azure.microsoft.com/en-ca/products/storage/blobs/) and [Container Registry](https://azure.microsoft.com/en-us/products/container-registry).

**BYOC deployment > Data plane setup on Azure** provides Union.ai with the necessary permissions to manage underlying Azure resources within your data plane. Access to non-Union.ai Azure resources is subject to Azure limitations and will require additional configuration.

As your projects evolve, your needs may change.
You can always contact the Union.ai team for help enabling additional resources as required.

## Types of access

There are two categories of access that you are likely to have to deal with:

* **Infrastructure access**:
  Enabling access to a resource for your data plane infrastructure.
  The most common case occurs when using your container registry task container images.
  In that case, refer to **BYOC deployment > Enabling Azure resources > Enabling Azure Container Registry (ACR)** to configure the Union.ai data plane to access that registry.
* **Task code access**:
  Enabling access to a resource for your task code.
  For example, your task code might need to access Azure Blob Storage at runtime.
  This involves granting permission to the [User-assigned managed identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview) attached to the Kubernetes cluster within which your task code runs.

## Infrastructure-level access

Infrastructure access with non-Union.ai-managed Azure resources will require additional configuration. Refer to **BYOC deployment > Enabling Azure resources > Enabling Azure Container Registry (ACR)** if you need access to images within an existing or non-Union.ai-managed container registry.

## Task code access

Union.ai tasks run within a Union.ai-managed Kubernetes pod in your data plane. Union.ai uses [Microsoft Entra Workload ID](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview?tabs=dotnet) to create user-assigned managed identities and access Union.ai-managed Azure resources. Additional permissions can be granted to the user-assigned managed identity to access Azure resources within the same Tenant.

Union.ai on Azure has two types of access arrangements:

* **Domain-scoped access**: With this arrangement, you define permissions you want to grant to your tasks, which are applied only to a specific Union.ai domain.
* **Global access**: With this arrangement, you define permissions you want to grant to your tasks, which are applied to an entire Azure subscription or resource group.

> [!NOTE] Azure only supports scoping by domain
> In AWS-based data planes, scoping by both project _and_ domain is supported.
> However, due to intrinsic architectural constraints, Azure-based data planes only support scoping by domain.

Global access is recommended for most use cases since it is simpler. Still, if you have a compelling reason to restrict access, then the subscription/resource group-domain-scoped access is available at the cost of additional complexity in setup.

> [!NOTE] Relationship with RBAC
> The permissions being discussed here are attached to a domain.
> This is independent of the permissions granted to users and machine applications through Union.ai's role-based access control (see the user management documentation).
> But, the two types of permissions are related.
>
> For example, for a user (or machine application) to have read access to a blob storage container, two things are required:
>
> * The user (or machine application) must have **execute** permission for the project and domain where the code that does the reading resides.
> * The domain must have read permission for the blob storage container.

## Domain-scoped access

**Because of the way that Azure works internally, domain-scoped access can only be configured by the Union.ai team.**

Please work directly with the Union.ai team if you have requirements that involve domain-scoped access to cloud resources.

## Globally-scoped access

Union.ai creates a managed identity prefixed with `flyteuser` within the resource group that contains the other Union.ai-managed data plane Azure resources. Navigate to [Azure portal Managed Identities](https://portal.azure.com/#view/HubsExtension/BrowseResource/resourceType/Microsoft.ManagedIdentity%2FuserAssignedIdentities) to find respective managed identity details.

Follow [Azure's official assigned roles documentation](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal) to assign an appropriate role to scope.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-azure-resources/enabling-azure-blob-storage ===

# Enabling Azure Blob Storage

For Union.ai customers whose data plane is in Azure, we walk through setting up access to your own Azure Blob Storage container.

> [!NOTE] Azure Blob Storage in the Union.ai environment
> Your data plane is set up with a Kubernetes cluster and other resources.
> Among these are a number of Azure Storage containers used internally by the Union.ai operator running in the cluster (see [Platform architecture](../platform-architecture)) to store things like workflow metadata.
>
> **These are not the Azure Blob Storage containers we are talking about in this section.**
>
> **We are discussing the case where you have **_**your own Azure Blob Storage container**_**that you set up to store input and output data used by your workflows.**

## Providing permissions to Azure Blob Storage container

Union.ai data plane tasks employ Azure Workload Identity Federation to access Azure resources using an Azure user-assigned identity. Access to Azure Blob Storage containers requires updating permissions to permit this Union.ai-managed user-assigned identity.

### Union.ai-managed permissions

The simplest, most flexible approach is to provide Union.ai the ability to add roles assignments against the blob storage container. [Create a role assignment](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal) to allow Union.ai to assign roles to the blob storage container. These permissions should be scoped to the target container. Follow these steps to set up the required access:

1. Navigate to the Azure portal and locate the target storage container.
2. In the storage container's access control (IAM) section, create a new role assignment.
3. For the 'Assigned to' field, select the Union.ai application's service principal.
4. For the 'Role' field, you have two options:
  * Simplest approach: Assign the built-in Azure role `User Access Administrator`.
  * Advanced approach: Create a custom role with the following specific permissions:
    * `Microsoft.Authorization/roleAssignments/write`
    * `Microsoft.Authorization/roleAssignments/delete`
    * `Microsoft.Authorization/roleAssignments/read`
    * `Microsoft.Authorization/roleDefinitions/read`
5. Ensure the 'Scope' is set to the target blob storage container.
6. Complete the role assignment process.
7. Provide the blob storage container [resource ID](https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.storage.models.resource.id) to Union.ai support.

### Manage permissions directly

Managing permissions directly is required if it is not desirable to grant role assigning permissions to Union.ai. [Create a role assignment](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal)) assigning the `Storage Blob Data Contributor` role to the `userflyterole` user assigned identity scoped the blob storage container.

> [!NOTE] Union.ai managed user-assigned identities
> Refer to [Azure portal&#39;s user assigned managed identitites](https://portal.azure.com/#view/HubsExtension/BrowseResource/resourceType/Microsoft.ManagedIdentity%2FuserAssignedIdentities) if assistance is required identifying the `userflyterole` user assigned managed identity within the same resource group as the Union.ai data plane.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-azure-resources/enabling-azure-container-registry ===

# Enabling Azure Container Registry (ACR)

ACR can be used to store container images within Azure and accessed within your Azure-based Data Plane.

Union.ai leverages Azure Kubernetes Service (AKS) managed identities to authenticate with ACR.

Refer to [Azure documentation for more details](https://learn.microsoft.com/en-us/azure/container-registry/authenticate-kubernetes-options)

## Creating a container registry

### Creating a container registry outside of Union.ai

ACR instances that allow anonymous (I.E., public) access doesn't require additional configuration. Otherwise, the underlying AKS cluster must be granted permissions to pull from the container registry.

Private ACR for Union.ai images is only supported for ACRs within the same tenant as the Union.ai data plane. Refer to [Azure documentation](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-get-started-portal?tabs=azure-cli) for creating Container Registries.

### Creating a Union.ai-managed container registry

Upon request, Union.ai can create a container registry within your data plane.

By default this Union.ai-managed ACR instance:

* Will be created within the same subscription and resource group of the Azure Kubernetes cluster instance.
* Union.ai will create necessary permissions for the Azure Kubernetes cluster to pull images from the container registry.
* Container registry will be created with **Basic** service tier.
* In order to mitigate excessive storage costs, Union.ai creates a weekly [scheduled container registry task](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-tasks-scheduled) to [purge](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-auto-purge#use-the-purge-command) **all** images with last modified dates older then 7 days. As a symptom, some 7 day old images will be rebuilt.

Upon request, Union.ai can:

* Configure the [Container Registry service tier](https://learn.microsoft.com/en-us/azure/container-registry/container-registry-skus).
* Disable the purge task to prevent automated image delettion.
* Configure the purge task to run daily, weekly, and monthly deleting tasks with last modified dates older then 1, 7, and 30 days respectively.
* Configure a [regexp2 with RE2 compatiblity](https://github.com/dlclark/regexp2) regular expression to filter for which repository to purge. For example, `^(?!keep-repo).*` will keep all images with repositories prefixed with keep-repo, E.G., `<CONTAINER_REGISTRY_NAME>/keep-repo/my-image:my-tag>`.

Union.ai will provide the created container registry Name and Login server for Docker authentication.

## Enable access to ACR in a different subscription within the same Azure tenant

Union.ai data plane resources will require permissions to pull images from your container registry.

### Allow Union.ai to manage permissions

The simplest, most flexible approach is to provide Union.ai the ability to add roles assignments against the container registry. [Create a role assignment](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal) to allow Union.ai to assign roles to the container registry. These permissions should be scoped to the target container registry. Follow these steps to set up the required access:

1. Navigate to the Azure portal and locate the target container registry.
2. In the container registry's access control (IAM) section, create a new role assignment.
3. For the 'Assigned to' field, select the Union.ai application's service principal.
4. For the 'Role' field, you have two options:
    * Simplest approach: Assign the built-in Azure role `User Access Administrator`.
    * Advanced approach: Create a custom role with the following specific permissions:
      * `Microsoft.Authorization/roleAssignments/write`
      * `Microsoft.Authorization/roleAssignments/delete`
      * `Microsoft.Authorization/roleAssignments/read`
      * `Microsoft.Authorization/roleDefinitions/read`
5. Ensure the 'Scope' is set to the target container registry.
6. Complete the role assignment process.
7. Provide the container registry [resource ID](https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.storage.models.resource.id) to Union.ai support.

### Manage permissions directly

Managing permissions directly is required if it is not desirable to grant role assigning permissions to Union.ai. [Create a role assignment](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal) assigning the `AcrPull` role to the underlying AKS cluster kubelet service principal ID. The service principal ID can be provided by Union.ai support.

Note, this process needs to be repeated every time the underlying Kubernetes cluster is changed or a new cluster is added.

## Enable access to ACR in a different Azure tenant

Please contact and work directly with Union.ai support.

## References

* [Azure - Authenticate with Azure Container Registry (ACR) from Azure Kubernetes Service (AKS)](https://learn.microsoft.com/en-us/azure/aks/cluster-container-registry-integration?toc=%2Fazure%2Fcontainer-registry%2Ftoc.json&bc=%2Fazure%2Fcontainer-registry%2Fbreadcrumb%2Ftoc.json&tabs=azure-cli)
* [Azure - Pull images from a container registry to an AKS cluster in a different Microsoft Entra tenant](https://learn.microsoft.com/en-us/azure/container-registry/authenticate-aks-cross-tenant)

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/enabling-azure-resources/enabling-azure-key-vault ===

# Enabling Azure Key Vault

> [!NOTE]
> This documentation exists for customers who must use Azure Key Vault for organizational reasons. For everyone else, we strongly recommend using the
> [Union.ai secrets manager](https://www.union.ai/docs/v2/union/user-guide/task-configuration/secrets) to manage secrets rather than Azure Key Vault.

The Union.ai-managed `userflyterole` identity must be granted permission to access [Azure Key Vault secrets](https://learn.microsoft.com/en-us/azure/key-vault/secrets/about-secrets).

> [!NOTE] Managing Azure Key Vault secrets
> Refer to [Azure official documentation](https://learn.microsoft.com/en-us/azure/key-vault/secrets/quick-create-portal) for details on creating and managing secrets.

## Providing permissions to Azure Key Vault

Union.ai data plane tasks employ Azure Workload Identity Federation to access Azure resources using an Azure user-assigned identity. Access to Azure Key Vault containers requires updating permissions to permit this Union.ai-managed user-assigned identity.

[Create a role assignment](https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal) assigning the `Key Vault Secrets User` role to the `userflyterole` user-assigned identity. Make sure it is scoped to the Azure Key Vault Secret.

> [!NOTE] Union.ai managed user-assigned identities
> Refer to [Azure portal's user assigned managed identitites](https://portal.azure.com/#view/HubsExtension/BrowseResource/resourceType/Microsoft.ManagedIdentity%2FuserAssignedIdentities) if assistance is required identifying the `userflyterole` user-assigned identity within the Union.ai data plane resource group.

## Accessing the secret within Union.ai

* Define a `Secret` object where
  * `Secret.group` is the a HTTP URI of the format `https://<KEY_VAULT_NAME>.vault.azure.net/secrets/<SECRET_NAME>`
  * `Secret.group_version` can be omitted to retrieve the latest version or set to an explicit secret version
  * `Secret.mount_requirement` is `Secret.MountType.FILE`
* Pass that `Secret` object in the `secret_requests` parameter of the `@union.task` decorator.
* Inside the task code, retrieve the value of the secret with:
  * `union.current_context().secrets.get(<SECRET_NAME>)` if `Secret.group_version` was omitted.
  * `union.current_context().secrets.get(<SECRET_NAME>, group_version=SECRET_GROUP_VERSION)` if `Secret.group_version` was specified.

Here are examples:

```python
import union

VAULT_NAME = "examplevault"
SECRET_NAME = "example-secret"

SECRET_GROUP = f"https://{VAULT_NAME}.vault.azure.net/secrets/{SECRET_NAME}"
SECRET_GROUP_VERSION = "12345"

SECRET_REQUEST_WITH_VERSION = union.Secret(
  group=SECRET_GROUP,
  group_version=SECRET_GROUP_VERSION,
  mount_requirement=union.Secret.MountType.FILE
)

@union.task(secret_requests=[SECRET_REQUEST_WITH_VERSION])
def task_with_versioned_secret():
    secret_val = union.current_context().secrets.get(
        SECRET_NAME,
        group_version=SECRET_GROUP_VERSION
    )

SECRET_REQUEST_FOR_LATEST = union.Secret(
  group=SECRET_GROUP,
  mount_requirement=union.Secret.MountType.FILE
)

@union.task(secret_requests=[SECRET_REQUEST_FOR_LATEST])
def task_with_latest_secret():
    secret_val = union.current_context().secrets.get(
        SECRET_NAME
    )
```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/single-sign-on-setup ===

# Single sign on setup

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

Union.ai authentication uses OAuth2 with Okta and supports SAML and OIDC-compliant identity providers (IdP) to configure single sign on (SSO).

To enable SSO, create an app for your preferred identity provider and provide the associated secrets to the Union.ai team.
The team will then complete the process.

## Google OpenID Connect

To configure Google OpenID Connect, see **BYOC deployment > Single sign on setup > Google OpenID Connect**.

## Microsoft Entra ID (formerly Azure AD)

To configure Entra ID (Azure AD), see **BYOC deployment > Single sign on setup > Microsoft Entra ID (formerly Azure AD)**.

## Other identity providers

To configure other identity providers, see **BYOC deployment > Single sign on setup > Other identity providers**.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/single-sign-on-setup/google-oidc ===

# Google OpenID Connect

To set up your Union.ai instance to use Google OpenID Connect as the identity provider, follow the directions below.

> [!NOTE] Google Documentation
> In this article, we cover the same steps as in the
> [OpenID Connect](https://developers.google.com/identity/openid-connect/openid-connect) Google documentation,
> but with additional directions specific to Union.ai.

## Setting up OAuth 2.0

First, select an existing project or set up a new project in the
[Google Cloud Console](https://console.cloud.google.com).

1. Navigate to the **Clients** section for [Google Auth Platform](https://console.cloud.google.com/auth/).

2. Click **CREATE CLIENT**. If this is your first client, you might need to provide additional app details. There is no special configuration needed from the Union.ai side.

3. Under **Create OAuth client ID**, select **Web application** as the application type and assign a name.

4. Under **Authorized redirect URIs**, add an entry with the following callback URI:
   `https://signin.hosted.unionai.cloud/oauth2/v1/authorize/callback`.

5. Click **Create**.

## Obtain OAuth 2.0 credentials

Next, retrieve your credentials: Click on your configured client and copy the values for **Client ID** and **Client secret** to a text file on your computer.

![OAuth 2.0 credentials](https://www.union.ai/docs/v2/union/_static/images/user-guide/data-plane-setup/single-sign-on-setup/google-oidc/oauth-credentials.png)

## Share the client ID and client secret securely with Union.ai

Finally, you will need to share the client ID and client secret securely with Union.ai:

1. Copy the public key provided by Union.ai here: 📥 [public-key.txt](/_static/public/public-key.txt)

2. Encrypt the given text file on your computer with a PGP tool of your choice.

3. Share the encrypted message with the Union.ai team over Slack.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/single-sign-on-setup/microsoft-entra-id ===

# Microsoft Entra ID (formerly Azure AD)

To set up your Union.ai instance to use Microsoft Entra ID as the identity provider, follow the directions below.

> [!NOTE] Microsoft documentation
> In this article, we cover the same steps as the
> [Quickstart: Register an application with the Microsoft identity platform](https://learn.microsoft.com/en-us/entra/identity-platform/quickstart-register-app) Microsoft documentation, but with additional directions specific
> to Union.ai.

## Register an Entra ID application

1. Log into your Azure account as a cloud application administrator or higher permission level.

1. In the identity drop down on the top right of the page (indicated by the email you are currently logged in as) select **Switch directory**, then select the directory yin which you want to register this application.

1. Browse to **Identity > Applications > App registrations** and select **New registration**.

1. Under **Name**, enter an appropriate display name. For example, `Union.ai Production`.

1. Under **Supported account types**, select **Accounts in this organizational directory only**.

1. Under **Redirect URI (optional)**, select **Web** and enter the following URI:

   `https://signin.hosted.unionai.cloud/oauth2/v1/authorize/callback`

1. Click **Register**.

> [!NOTE] Make the app visible to users
> New app registrations are hidden to users by default. You must enable the app when you are ready for
> users to see the app on their **My Apps** page.
> To enable the app, in the Microsoft Entra admin center, navigate to
> **Identity > Applications > Enterprise > applications** and select the app.
> Then, on the **Properties** page, toggle **Visible to users?** to **Yes**.

## Copy the values needed by the Union.ai team

When registration finishes, the Microsoft Entra admin center will display the app registration's **Overview** page, from which you can copy the Application (client) ID, Directory (tenant) ID, and client secret needed by the Union.ai team.

### Application (client) ID and directory (tenant) ID

Copy the **Application (client) ID** and **Directory (tenant) ID** from the overview page to a text file on your computer.

![Application and directory ID](https://www.union.ai/docs/v2/union/_static/images/user-guide/data-plane-setup/single-sign-on-setup/microsoft-entra-id/entra-id-application-and-directory-id.png)

### Client secret

To get the **client secret**, on the overview page, go to **Client credentials** and click **Add a certificate or secret**.

![Client credentials](https://www.union.ai/docs/v2/union/_static/images/user-guide/data-plane-setup/single-sign-on-setup/microsoft-entra-id/entra-id-client-credentials.png)

On the subsequent page, under **Client secrets**, click **New client secret** to generate a new secret.
Copy the **Value** of this secret to a plain text file on your computer.

![Client secret](https://www.union.ai/docs/v2/union/_static/images/user-guide/data-plane-setup/single-sign-on-setup/microsoft-entra-id/entra-id-client-secret.png)

## Share the client secret securely with Union.ai

1. Copy the public key provided by Union.ai here: 📥 [public-key.txt](/_static/public/public-key.txt)

2. Go to [https://pgptool.net](https://pgptool.net/).

3. Click the **Encrypt (+Sign)** tab.

4. Enter public key in **Public Key (For Verification)** section.

5. Skip the **Private Key** section.

6. Enter the **client secret** in plain text and encrypt it.

7. Save encypted text to a file and share with the Union.ai team over Slack.

8. Delete the **client secret** from the text file on your computer.

## Share the IDs with Union.ai

Share the **application (client) ID** and **directory (tenant) ID** with the Union.ai team over Slack.
These values do not have to be encrypted.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/byoc/single-sign-on-setup/other-identity-providers ===

# Other identity providers

Depending on the type of identity provider you are using, open the appropriate directions below on the Okta site:

- [Okta-to-Okta](https://developer.okta.com/docs/guides/add-an-external-idp/oktatookta/main/)

- [OpenID Connect (OIDC)](https://developer.okta.com/docs/guides/add-an-external-idp/openidconnect/main/)

- [SAML 2.0](https://developer.okta.com/docs/guides/add-an-external-idp/saml2/main/)

Now, referencing those directions, follow the steps below:

1. Navigate to the section with the heading **Create an app at the Identify Provider**.

1. Complete all the steps in that section and make a note of the **application (client) ID**.

1. Where a callback URI needs to be specified, use `https://signin.hosted.unionai.cloud/oauth2/v1/authorize/callback`.

1. The last step in the setup will generate the **client secret**. Copy this value to a text file on your computer.
   Make a copy of this value.

## Share the client secret securely with the Union.ai team

1. Copy the public key provided by Union.ai here: 📥 [public-key.txt](/_static/public/public-key.txt)

2. Go to [https://pgptool.net](https://pgptool.net/).

3. Click the **Encrypt (+Sign)** tab.

4. Enter public key in **Public Key (For Verification)** section.

5. Skip the **Private Key** section.

6. Enter the **client secret** in plain text and encrypt it.

7. Save encypted text to a file and share with the Union.ai team over Slack.

8. Delete the client secret from the text file on your computer.

## Share the application (client) ID with Union.ai

Share the **application (client) ID** with the Union.ai team over Slack.
This value does not have to be encrypted.