Directory Structure

Overview

EZStitcher uses a structured approach to directory management that balances automation with flexibility. This document explains how directories are managed, resolved, and customized in EZStitcher.

For information about how pipelines handle directories, see Pipeline. For information about how steps handle directories, see Step.

Basic Directory Concepts

In EZStitcher, several key directories are used during processing:

Plate Path: The original directory containing microscopy images
Workspace Path: A copy of the plate path with symlinks to protect original data
Input Directory: Where a step reads images from
Output Directory: Where a step saves processed images
Positions Directory: Where position files for stitching are saved
Stitched Directory: Where final stitched images are saved

Default Directory Structure

When you run a pipeline, EZStitcher creates a directory structure as steps are executed:

/path/to/plate/                  # Original plate path
/path/to/plate_workspace/        # Workspace with symlinks to original images
/path/to/plate_workspace_out/    # Processed images (configurable suffix)
/path/to/plate_workspace_positions/  # Position files for stitching (configurable suffix)
/path/to/plate_workspace_stitched/   # Stitched images (configurable suffix)

This structure ensures that:

Original data is protected (via the workspace)
Processed images are kept separate from original images
Position files are stored in a dedicated directory
Stitched images are stored separately from individual processed tiles

Directory Resolution

For detailed API documentation, see:

EZStitcher automatically resolves directories for steps in a pipeline, minimizing the need for manual directory management. Here’s how it works:

Basic Resolution Logic:
```
Pipeline Input Dir → Step 1 → Step 2 → Step 3 → ... → Pipeline Output Dir
                     |         |         |
                     v         v         v
                  Output 1  Output 2  Output 3
```
- Each step’s output directory becomes the next step’s input directory
- If a step doesn’t specify an output directory, it’s automatically generated
- The pipeline’s output directory is used for the last step if not specified
First Step Special Handling: - If the first step doesn’t specify an input directory, the pipeline’s input directory is used - Typically, you should set the first step’s input directory to orchestrator.workspace_path
Default Directory Generation: - The first step always gets a new output directory (with “_out” suffix) if none is specified - This ensures we never modify files in the workspace path - Subsequent steps will use their input directory as their output directory (in-place processing) if no output directory is specified - This allows for more efficient processing by avoiding unnecessary file copying
ImageStitchingStep Behavior: - The ImageStitchingStep follows the standard directory resolution logic, using the previous step’s output directory as its input - You can explicitly set input_dir=orchestrator.workspace_path to use original images for stitching instead of processed images - By default, its output directory is set to {workspace_path}_stitched - This ensures stitched images are saved separately from processed individual tiles

Example Directory Flow

Here’s an example of how directories flow through a pipeline:

# Starting with a plate path: /data/plates/plate1

orchestrator.workspace_path = /data/plates/plate1_workspace

# Pipeline with 3 steps:

Step 1 (Z-Stack Flattening):
  input_dir = /data/plates/plate1_workspace
  output_dir = /data/plates/plate1_workspace_out  # New directory to protect workspace

Step 2 (Channel Processing):
  input_dir = /data/plates/plate1_workspace_out
  output_dir = /data/plates/plate1_workspace_out  # In-place processing

Step 3 (Position Generation):
  input_dir = /data/plates/plate1_workspace_out
  output_dir = /data/plates/plate1_workspace_positions  # New directory for position files

Step 4 (Image Stitching):
  input_dir = /data/plates/plate1_workspace_positions  # Uses previous step's output by default
  # Alternative: input_dir = /data/plates/plate1_workspace  # Can be set to use original images instead
  positions_dir = /data/plates/plate1_workspace_positions  # Same as input_dir
  output_dir = /data/plates/plate1_workspace_stitched  # New directory for stitched images

This automatic directory resolution simplifies pipeline creation and ensures a consistent directory structure.

Step Initialization Best Practices

When initializing steps, follow these best practices for directory specification:

First Step in a Pipeline: - Always specify input_dir for the first step, typically using orchestrator.workspace_path - This ensures that processing happens on the workspace copies, not the original data - Specify output_dir only if you need a specific directory structure
```
# First step in a pipeline
first_step = Step(
    name="First Step",
    func=IP.stack_percentile_normalize,
    input_dir=orchestrator.workspace_path,  # Always specify for first step
    # output_dir is automatically determined
)
```

Subsequent Steps: - Don’t specify input_dir for subsequent steps - Each step’s output directory automatically becomes the next step’s input directory - Specify output_dir only if you need a specific directory structure

# Subsequent step in a pipeline
subsequent_step = Step(
    name="Subsequent Step",
    func=stack(IP.sharpen),
    # input_dir is automatically set to previous step's output_dir
    # output_dir is automatically determined
)

Specialized Steps: - For PositionGenerationStep, don’t specify input_dir or output_dir unless needed - For ImageStitchingStep, don’t specify input_dir, positions_dir, or output_dir unless needed

# Directories are automatically determined
position_step = PositionGenerationStep()

# Directories are automatically determined
stitch_step = ImageStitchingStep(
    # Uncomment to use original images instead of processed images:
    # input_dir=orchestrator.workspace_path
)

Common Mistakes to Avoid: - Specifying unnecessary directories, making the code more verbose - Forgetting to use orchestrator.workspace_path for the first step - Manually managing directories that could be automatically resolved

Following these best practices will make your code more concise and less error-prone, while taking full advantage of EZStitcher’s automatic directory resolution.

Custom Directory Structures

While EZStitcher’s automatic directory resolution works well for most cases, you may sometimes need more control over where files are saved.

You can create custom directory structures by explicitly specifying output directories:

# Create a pipeline with custom directory structure
pipeline = Pipeline(
    steps=[
        # First step: Save to a specific directory
        Step(
            name="Z-Stack Flattening",
            func=(IP.create_projection, {'method': 'max_projection'}),
            variable_components=['z_index'],
            input_dir=orchestrator.workspace_path,
            output_dir=Path("/custom/output/path/flattened")
        ),

        # Second step: Save to another specific directory
        Step(
            name="Channel Processing",
            func=IP.stack_percentile_normalize,
            variable_components=['channel'],
            group_by='channel',
            # input_dir is automatically set to the previous step's output_dir
            output_dir=Path("/custom/output/path/processed")
        ),

        # Image stitching step: Save to a specific directory
        ImageStitchingStep(
            # input_dir is automatically set to the previous step's output_dir
            # positions_dir is automatically determined
            output_dir=Path("/custom/output/path/stitched")
        )
    ],
    name="Custom Directory Pipeline"
)

Customizing ImageStitchingStep Directories

For more control over the ImageStitchingStep directories:

pipeline = Pipeline(
    steps=[
        # Processing steps...

        # Custom position generation step
        PositionGenerationStep(
            # input_dir is automatically set
            output_dir=Path("/custom/positions")  # Custom positions directory
        ),

        # Custom image stitching step
        ImageStitchingStep(
            input_dir=Path("/custom/input"),  # Custom input directory
            positions_dir=Path("/custom/positions"),  # Custom positions directory
            output_dir=Path("/custom/stitched")  # Custom output directory
        )
    ],
    name="Custom Stitching Pipeline"
)

When to Specify Directories Explicitly

Always specify input_dir for the first step: - Use orchestrator.workspace_path to ensure processing happens on workspace copies - This protects original data from modification
Specify output_dir only when you need a specific directory structure: - For example, when you need to save results in a specific location - When you need to reference the output directory from outside the pipeline
Don’t specify input_dir for subsequent steps: - Each step’s output directory automatically becomes the next step’s input directory - This reduces code verbosity and potential for errors
Don’t specify directories for steps unless needed: - PositionGenerationStep and ImageStitchingStep have intelligent directory handling - They automatically find the right directories based on the pipeline context

Configuring Directory Suffixes

EZStitcher allows you to configure the directory suffixes used for different types of steps through the PipelineConfig class:

from ezstitcher.core.config import PipelineConfig

# Create a configuration with custom directory suffixes
config = PipelineConfig(
    out_dir_suffix="_output",           # For regular processing steps (default: "_out")
    positions_dir_suffix="_pos",        # For position generation steps (default: "_positions")
    stitched_dir_suffix="_stitched"     # For stitching steps (default: "_stitched")
)

# Create an orchestrator with the custom configuration
orchestrator = PipelineOrchestrator(config=config, plate_path=plate_path)

# Now all pipelines run with this orchestrator will use the custom suffixes
pipeline = Pipeline(
    input_dir=orchestrator.workspace_path,
    name="Basic Pipeline",
    steps=[
        Step(name="First Step", func=IP.stack_percentile_normalize),
        PositionGenerationStep(),
        ImageStitchingStep()
    ]
)

# Run the pipeline
orchestrator.run(pipelines=[pipeline])

This allows you to customize the directory structure to match your organization’s naming conventions or to integrate with existing workflows.

Best Practices

For comprehensive best practices for directory management, see Directory Management Best Practices in the Best Practices guide.