defload_cmip6(region="india",columns=None):""" Retrieve a dataset by name. If the dataset is not already cached locally, it will be downloaded from the remote source (e.g., HuggingFace or Zenodo). Parameters ---------- dataset_name : str Key identifying the dataset (must exist in registry). Returns ------- Path : pathlib.Path Local file path to the dataset. Raises ------ ValueError : If dataset_name is not defined in the registry. Notes ----- - Datasets are cached under ~/.climaid/datasets/ - Subsequent calls return the cached file without re-downloading """mapping={"india":"cmip6_india","south_asia":"cmip6_south_asia"}ifregionnotinmapping:raiseValueError("region must be 'india' or 'south_asia'")dataset_name=mapping[region]path=_manager.fetch(dataset_name)returnpd.read_parquet(path,columns=columns)
Dataset Management (Advanced)
Handles dataset downloading, caching, and retrieval.
Manages retrieval and caching of external datasets.
classDatasetManager:""" Manages retrieval and caching of external datasets. This class ensures that: - datasets are downloaded only once - files are stored locally for reuse - users can work offline after first download """def__init__(self):self.base_dir=CACHE_DIRself.base_dir.mkdir(parents=True,exist_ok=True)deffetch(self,dataset_name:str):ifdataset_namenotinDATASETS:raiseValueError(f"Dataset '{dataset_name}' not found")meta=DATASETS[dataset_name]dataset_dir=self.base_dir/dataset_name/meta["version"]dataset_dir.mkdir(parents=True,exist_ok=True)file_path=dataset_dir/meta["filename"]# already cachediffile_path.exists():returnfile_pathprint(f"\nDownloading {dataset_name}...")print(f"Saving to: {file_path}\n")path=pooch.retrieve(url=meta["url"],fname=meta["filename"],path=dataset_dir,progressbar=True,known_hash=None,)returnPath(path)
Available Datasets (Internal)
Defines dataset metadata such as source URLs and versions.