Skip to main content

HydroGym Data Manager

General Hugging Face data manager for all HydroGym solvers.

Supports three caching strategies:

  • use_clean_cache=True: Create symlinks into HF cache (recommended — clean paths, no duplication)
  • use_clean_cache='copy': Copy to ~/.cache/hydrogym/ (clean paths, duplicated storage)
  • use_clean_cache=False: Use HF cache paths directly (no duplication, messy paths)

Local fallback optimization: When local_fallback_dir is provided and contains the requested environment, it is used directly without copying/linking to the cache (cache layer is only used for HF downloads)

Solver profiles

Each entry in :data:SOLVER_PROFILES describes one solver backend:

  • sentinel: Hidden file stored alongside environment data on HF that identifies the solver (e.g. .MAIA_LB). Used for automatic profile detection.
  • required_files: Files that must exist for validation to pass.
  • required_dirs: Directories that must exist and be non-empty.
  • optional_files: Files that are logged but not required.
  • workspace_files: {source_filename: target_rel_path} mapping used by :meth:``5 to create solver-specific symlinks in a run directory.
  • workspace_dirs: Same idea for directories.

To add a new solver, add an entry here. No other file needs to change for basic support (download + validation + workspace prep).

Usage::

Symlinks (recommended)

dm = HFDataManager(use_clean_cache=True)

Copy files (if you need to modify them)

dm = HFDataManager(use_clean_cache='copy')

Use HF cache directly

dm = HFDataManager(use_clean_cache=False)

Get environment path (downloads if needed)

env_path = dm.get_environment_path('Cylinder_2D_Re200')

Prepare a solver-specific working directory

paths = dm.prepare_working_directory(env_path, './run_dir')

HFDataManager Objects

class HFDataManager()

Manages CFD environment data from Hugging Face Hub.

Handles downloading, caching (via symlinks, copies, or direct HF paths), solver-profile detection, validation, and working-directory preparation.

Solver profiles are detected automatically from sentinel files (e.g. .MAIA_LB, .MAIA_STRCTRD) stored alongside environment data on HF. The local cache is checked first; if no sentinel is found the HF file listing is queried (lightweight — no full download). fallback_profile is used when neither source yields a result (e.g. legacy environments or offline mode).

__init__

def __init__(repo_id: str = "dynamicslab/HydroGym-environments",
cache_dir: Optional[str] = None,
local_fallback_dir: Optional[str] = None,
use_clean_cache: Union[bool, str] = True,
fallback_profile: str = "MAIA_LB")

Initialize the HF Data Manager.

Arguments:

  • repo_id - Hugging Face repository ID.
  • cache_dir - Clean local cache directory (default: ~/.cache/hydrogym). Only used for HF downloads; local_fallback_dir is used directly.
  • local_fallback_dir - Local directory with environment files used when HF is unreachable. When available, used directly without cache layer. use_clean_cache:
    • True: Create symlinks into HF cache (recommended).
    • 'copy': Copy files to clean cache.
    • False: Use HF cache paths directly.
  • cache_dir1 - cache_dir is only used for HF downloads, not for local_fallback.
  • cache_dir2 - Solver profile to use when no sentinel file is found. Defaults to 'MAIA_LB'. Pass the environment class's SOLVER_TYPE attribute to get the right fallback in offline / legacy scenarios.

get_available_environments

def get_available_environments() -> List[str]

Get list of available environments from HF Hub or local fallback.

Returns:

Sorted list of environment names.

get_environment_path

def get_environment_path(env_name: str, force_download: bool = False) -> str

Get path to environment files, downloading from HF Hub if necessary.

The solver profile is detected automatically from sentinel files before any download occurs.

Arguments:

  • env_name - Environment name (e.g. 'Cylinder_2D_Re200').
  • force_download - Force re-download even if cached.

Returns:

Path to the local environment directory.

prepare_working_directory

def prepare_working_directory(env_path: str,
work_dir: str,
profile: Optional[str] = None) -> Dict[str, str]

Create work_dir and populate it with solver-specific symlinks.

The mapping of source files/directories to their target locations inside work_dir is defined by the workspace_files and workspace_dirs entries in :data:SOLVER_PROFILES.

If profile is None the profile is auto-detected from sentinel files already present in env_path, then falls back to self.fallback_profile.

Arguments:

  • env_path - Path to the cached environment data (as returned by :meth:``0).
  • ``1 - Target working directory (created if it does not exist).
  • 2 - Solver profile key (e.g. 'MAIA_LB'). Auto-detected when None``.

Returns:

Dictionary with at least 'work_dir' and 'env_data_path' keys. Additional keys may be present depending on the profile (e.g. 'properties_file' for MAIA_LB).

download_environment

def download_environment(env_name: str, force_download: bool = False) -> str

Alias for :meth:get_environment_path (backwards compatibility).

clear_cache

def clear_cache(env_name: Optional[str] = None)

Clear cached environment files.

Arguments:

  • env_name - Specific environment to clear, or None for all.