Directory Structure
Overview
EZStitcher uses a structured approach to directory management that balances automation with flexibility. This document explains how directories are managed, resolved, and customized in EZStitcher.
For information about how pipelines handle directories, see Pipeline. For information about how steps handle directories, see Step.
Basic Directory Concepts
In EZStitcher, several key directories are used during processing:
Plate Path: The original directory containing microscopy images
Workspace Path: A copy of the plate path with symlinks to protect original data
Input Directory: Where a step reads images from
Output Directory: Where a step saves processed images
Positions Directory: Where position files for stitching are saved
Stitched Directory: Where final stitched images are saved
Default Directory Structure
When you run a pipeline, EZStitcher creates a directory structure as steps are executed:
/path/to/plate/ # Original plate path
/path/to/plate_workspace/ # Workspace with symlinks to original images
/path/to/plate_workspace_out/ # Processed images (configurable suffix)
/path/to/plate_workspace_positions/ # Position files for stitching (configurable suffix)
/path/to/plate_workspace_stitched/ # Stitched images (configurable suffix)
This structure ensures that:
Original data is protected (via the workspace)
Processed images are kept separate from original images
Position files are stored in a dedicated directory
Stitched images are stored separately from individual processed tiles
Directory Resolution
For detailed API documentation, see:
EZStitcher automatically resolves directories for steps in a pipeline, minimizing the need for manual directory management. Here’s how it works:
Basic Resolution Logic:
Pipeline Input Dir → Step 1 → Step 2 → Step 3 → ... → Pipeline Output Dir | | | v v v Output 1 Output 2 Output 3Each step’s output directory becomes the next step’s input directory
If a step doesn’t specify an output directory, it’s automatically generated
The pipeline’s output directory is used for the last step if not specified
First Step Special Handling: - If the first step doesn’t specify an input directory, the pipeline’s input directory is used - Typically, you should set the first step’s input directory to
orchestrator.workspace_pathDefault Directory Generation: - The first step always gets a new output directory (with “_out” suffix) if none is specified - This ensures we never modify files in the workspace path - Subsequent steps will use their input directory as their output directory (in-place processing) if no output directory is specified - This allows for more efficient processing by avoiding unnecessary file copying
ImageStitchingStep Behavior: - The
ImageStitchingStepfollows the standard directory resolution logic, using the previous step’s output directory as its input - You can explicitly setinput_dir=orchestrator.workspace_pathto use original images for stitching instead of processed images - By default, its output directory is set to{workspace_path}_stitched- This ensures stitched images are saved separately from processed individual tiles
Example Directory Flow
Here’s an example of how directories flow through a pipeline:
# Starting with a plate path: /data/plates/plate1
orchestrator.workspace_path = /data/plates/plate1_workspace
# Pipeline with 3 steps:
Step 1 (Z-Stack Flattening):
input_dir = /data/plates/plate1_workspace
output_dir = /data/plates/plate1_workspace_out # New directory to protect workspace
Step 2 (Channel Processing):
input_dir = /data/plates/plate1_workspace_out
output_dir = /data/plates/plate1_workspace_out # In-place processing
Step 3 (Position Generation):
input_dir = /data/plates/plate1_workspace_out
output_dir = /data/plates/plate1_workspace_positions # New directory for position files
Step 4 (Image Stitching):
input_dir = /data/plates/plate1_workspace_positions # Uses previous step's output by default
# Alternative: input_dir = /data/plates/plate1_workspace # Can be set to use original images instead
positions_dir = /data/plates/plate1_workspace_positions # Same as input_dir
output_dir = /data/plates/plate1_workspace_stitched # New directory for stitched images
This automatic directory resolution simplifies pipeline creation and ensures a consistent directory structure.
Step Initialization Best Practices
When initializing steps, follow these best practices for directory specification:
First Step in a Pipeline: - Always specify
input_dirfor the first step, typically usingorchestrator.workspace_path- This ensures that processing happens on the workspace copies, not the original data - Specifyoutput_dironly if you need a specific directory structure# First step in a pipeline first_step = Step( name="First Step", func=IP.stack_percentile_normalize, input_dir=orchestrator.workspace_path, # Always specify for first step # output_dir is automatically determined )
Subsequent Steps: - Don’t specify
input_dirfor subsequent steps - Each step’s output directory automatically becomes the next step’s input directory - Specifyoutput_dironly if you need a specific directory structure# Subsequent step in a pipeline subsequent_step = Step( name="Subsequent Step", func=stack(IP.sharpen), # input_dir is automatically set to previous step's output_dir # output_dir is automatically determined )
Specialized Steps: - For
PositionGenerationStep, don’t specifyinput_diroroutput_dirunless needed - ForImageStitchingStep, don’t specifyinput_dir,positions_dir, oroutput_dirunless needed# Directories are automatically determined position_step = PositionGenerationStep() # Directories are automatically determined stitch_step = ImageStitchingStep( # Uncomment to use original images instead of processed images: # input_dir=orchestrator.workspace_path )
Common Mistakes to Avoid: - Specifying unnecessary directories, making the code more verbose - Forgetting to use
orchestrator.workspace_pathfor the first step - Manually managing directories that could be automatically resolved
Following these best practices will make your code more concise and less error-prone, while taking full advantage of EZStitcher’s automatic directory resolution.
Custom Directory Structures
While EZStitcher’s automatic directory resolution works well for most cases, you may sometimes need more control over where files are saved.
You can create custom directory structures by explicitly specifying output directories:
# Create a pipeline with custom directory structure
pipeline = Pipeline(
steps=[
# First step: Save to a specific directory
Step(
name="Z-Stack Flattening",
func=(IP.create_projection, {'method': 'max_projection'}),
variable_components=['z_index'],
input_dir=orchestrator.workspace_path,
output_dir=Path("/custom/output/path/flattened")
),
# Second step: Save to another specific directory
Step(
name="Channel Processing",
func=IP.stack_percentile_normalize,
variable_components=['channel'],
group_by='channel',
# input_dir is automatically set to the previous step's output_dir
output_dir=Path("/custom/output/path/processed")
),
# Image stitching step: Save to a specific directory
ImageStitchingStep(
# input_dir is automatically set to the previous step's output_dir
# positions_dir is automatically determined
output_dir=Path("/custom/output/path/stitched")
)
],
name="Custom Directory Pipeline"
)
Customizing ImageStitchingStep Directories
For more control over the ImageStitchingStep directories:
pipeline = Pipeline(
steps=[
# Processing steps...
# Custom position generation step
PositionGenerationStep(
# input_dir is automatically set
output_dir=Path("/custom/positions") # Custom positions directory
),
# Custom image stitching step
ImageStitchingStep(
input_dir=Path("/custom/input"), # Custom input directory
positions_dir=Path("/custom/positions"), # Custom positions directory
output_dir=Path("/custom/stitched") # Custom output directory
)
],
name="Custom Stitching Pipeline"
)
When to Specify Directories Explicitly
Always specify input_dir for the first step: - Use orchestrator.workspace_path to ensure processing happens on workspace copies - This protects original data from modification
Specify output_dir only when you need a specific directory structure: - For example, when you need to save results in a specific location - When you need to reference the output directory from outside the pipeline
Don’t specify input_dir for subsequent steps: - Each step’s output directory automatically becomes the next step’s input directory - This reduces code verbosity and potential for errors
Don’t specify directories for steps unless needed: - PositionGenerationStep and ImageStitchingStep have intelligent directory handling - They automatically find the right directories based on the pipeline context
Configuring Directory Suffixes
EZStitcher allows you to configure the directory suffixes used for different types of steps through the PipelineConfig class:
from ezstitcher.core.config import PipelineConfig
# Create a configuration with custom directory suffixes
config = PipelineConfig(
out_dir_suffix="_output", # For regular processing steps (default: "_out")
positions_dir_suffix="_pos", # For position generation steps (default: "_positions")
stitched_dir_suffix="_stitched" # For stitching steps (default: "_stitched")
)
# Create an orchestrator with the custom configuration
orchestrator = PipelineOrchestrator(config=config, plate_path=plate_path)
# Now all pipelines run with this orchestrator will use the custom suffixes
pipeline = Pipeline(
input_dir=orchestrator.workspace_path,
name="Basic Pipeline",
steps=[
Step(name="First Step", func=IP.stack_percentile_normalize),
PositionGenerationStep(),
ImageStitchingStep()
]
)
# Run the pipeline
orchestrator.run(pipelines=[pipeline])
This allows you to customize the directory structure to match your organization’s naming conventions or to integrate with existing workflows.
Best Practices
For comprehensive best practices for directory management, see Directory Management Best Practices in the Best Practices guide.