# Agentic refinement

The core of this example is an agentic refinement loop: generate content, critique
it, revise based on feedback, and repeat until quality meets a threshold. This
pattern is fundamental to building self-improving AI systems.

## The agentic pattern

Traditional pipelines are linear: input → process → output. Agentic workflows
are iterative: they evaluate their own output and improve it through multiple
cycles.

```mermaid
flowchart TD
    A[Generate] --> B[Critique]
    B -->|score >= threshold| C[Done]
    B -->|score < threshold| D[Revise]
    D --> B
```

## Critique function

The critique function evaluates the current draft and returns structured feedback.
It's a traced function (not a separate task) that runs inside `refine_report`:

```python
@flyte.trace
async def critique_content(draft: str) -> Critique:
    """
    Critique the current draft and return structured feedback.

    Uses Pydantic models to parse the LLM's JSON response into
    a typed object for reliable downstream processing.

    Args:
        draft: The current draft to critique

    Returns:
        Structured critique with score, strengths, and improvements
    """
    print("Critiquing current draft...")

    response = await call_llm(
        f"Please critique the following report:\n\n{draft}",
        CRITIC_SYSTEM_PROMPT,
        json_mode=True,
    )

    # Parse the JSON response into our Pydantic model
    critique_data = json.loads(response)
    critique = Critique(**critique_data)

    print(f"Critique score: {critique.score}/10")
    print(f"Strengths: {len(critique.strengths)}, Improvements: {len(critique.improvements)}")

    return critique
```

*Source: https://github.com/unionai/unionai-examples/blob/main/v2/user-guide/advanced-project/generate.py*

Key points:
- Uses `json_mode=True` to ensure the LLM returns valid JSON
- Parses the response into a Pydantic `Critique` model
- Returns a typed object for reliable downstream processing
- `@flyte.trace` provides checkpointing—if the task retries, completed critiques aren't re-run

## Revise function

The revise function takes the current draft and specific improvements to address:

```python
@flyte.trace
async def revise_content(draft: str, improvements: list[str]) -> str:
    """
    Revise the draft based on critique feedback.

    Args:
        draft: The current draft to revise
        improvements: List of specific improvements to address

    Returns:
        The revised draft
    """
    print(f"Revising draft to address {len(improvements)} improvements...")

    improvements_text = "\n".join(f"- {imp}" for imp in improvements)
    prompt = f"""Please revise the following report to address these improvements:

IMPROVEMENTS NEEDED:
{improvements_text}

CURRENT DRAFT:
{draft}"""

    revised = await call_llm(prompt, REVISER_SYSTEM_PROMPT)

    print(f"Revision complete ({len(revised)} characters)")
    return revised
```

*Source: https://github.com/unionai/unionai-examples/blob/main/v2/user-guide/advanced-project/generate.py*

The prompt includes:
1. The list of improvements from the critique
2. The current draft to revise

This focused approach helps the LLM make targeted changes rather than rewriting
from scratch.

## The refinement loop

The `refine_report` task orchestrates the iterative refinement. It runs in the
reusable `llm_env` because it makes multiple LLM calls through traced functions:

```python
@llm_env.task(retries=3)
async def refine_report(
    topic: str,
    max_iterations: int = 3,
    quality_threshold: int = 8,
) -> str:
    """
    Iteratively refine a report until it meets the quality threshold.

    This task runs in a reusable container because it makes multiple LLM calls
    in a loop. The traced helper functions provide checkpointing, so if the
    task fails mid-loop, completed LLM calls won't be re-run on retry.

    Args:
        topic: The topic to write about
        max_iterations: Maximum refinement cycles (default: 3)
        quality_threshold: Minimum score to accept (default: 8)

    Returns:
        The final refined report
    """
    # Generate initial draft
    draft = await generate_initial_draft(topic)

    # Iterative refinement loop
    for i in range(max_iterations):
        with flyte.group(f"refinement_{i + 1}"):
            # Get critique
            critique = await critique_content(draft)

            # Check if we've met the quality threshold
            if critique.score >= quality_threshold:
                print(f"Quality threshold met at iteration {i + 1}!")
                print(f"Final score: {critique.score}/10")
                break

            # Revise based on feedback
            print(f"Score {critique.score} < {quality_threshold}, revising...")
            draft = await revise_content(draft, critique.improvements)
    else:
        print(f"Reached max iterations ({max_iterations})")

    return draft
```

*Source: https://github.com/unionai/unionai-examples/blob/main/v2/user-guide/advanced-project/generate.py*

### How it works

1. **Generate initial draft**: Creates the first version of the report
2. **Enter refinement loop**: Iterates up to `max_iterations` times
3. **Critique**: Evaluates the current draft and assigns a score
4. **Check threshold**: If score meets `quality_threshold`, exit early
5. **Revise**: If below threshold, revise based on improvements
6. **Repeat**: Continue until threshold met or iterations exhausted

All the LLM calls (generate, critique, revise) are traced functions inside this
single task. This keeps the task graph simple while the reusable container handles
the actual LLM work efficiently.

### Early exit

The `if critique.score >= quality_threshold: break` pattern enables early exit
when quality is sufficient. This saves compute costs and time—no need to run
all iterations if the first draft is already good.

## Grouping iterations with flyte.group

Each refinement iteration is wrapped in `flyte.group`:

```python
for i in range(max_iterations):
    with flyte.group(f"refinement_{i + 1}"):
        critique = await critique_content(draft)
        # ...
```

### Why use flyte.group?

Groups provide hierarchical organization in the Flyte UI. Since critique and
revise are traced functions (not separate tasks), groups help organize them:

```
refine_report
├── generate_initial_draft (traced)
├── refinement_1
│   ├── critique_content (traced)
│   └── revise_content (traced)
├── refinement_2
│   ├── critique_content (traced)
│   └── revise_content (traced)
└── [returns refined report]
```

Benefits:
- **Clarity**: See exactly how many iterations occurred
- **Debugging**: Quickly find which iteration had issues
- **Observability**: Track time spent in each refinement cycle

### Group context

Groups are implemented as context managers. All traced calls and nested groups
within the `with flyte.group(...)` block are associated with that group.

## Configuring the loop

The refinement loop accepts parameters to tune its behavior:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `max_iterations` | 3 | Upper bound on refinement cycles |
| `quality_threshold` | 8 | Minimum score (1-10) to accept |

### Choosing thresholds

- **Higher threshold** (9-10): More refinement cycles, higher quality, more API costs
- **Lower threshold** (6-7): Faster completion, may accept lower quality
- **More iterations**: Safety net for difficult topics
- **Fewer iterations**: Cost control, faster turnaround

A good starting point is `quality_threshold=8` with `max_iterations=3`. Adjust
based on your quality requirements and budget.

## Best practices for agentic loops

1. **Always set max iterations**: Prevent infinite loops if the quality threshold
   is never reached.

2. **Use structured critiques**: Pydantic models ensure you can reliably extract
   the score and improvements from LLM responses.

3. **Log iteration progress**: Print statements help debug when reviewing logs:
   ```python
   print(f"Iteration {i + 1}: score={critique.score}")
   ```

4. **Consider diminishing returns**: After 3-4 iterations, improvements often
   become marginal. Set `max_iterations` accordingly.

5. **Use groups for observability**: `flyte.group` makes the iterative nature
   visible in the UI, essential for debugging and monitoring.

## Next steps

With the agentic refinement loop complete, learn how to
[generate multiple outputs in parallel](https://www.union.ai/docs/v2/union/user-guide/advanced-project/agentic-refinement/parallel-outputs).

---
**Source**: https://github.com/unionai/unionai-docs/blob/main/content/user-guide/advanced-project/agentic-refinement.md
**HTML**: https://www.union.ai/docs/v2/union/user-guide/advanced-project/agentic-refinement/