Metadata-Version: 2.1
Name: multi-storage-client
Version: 0.12.2
Summary: Unified high-performance Python client for object and file stores.
Home-page: https://github.com/NVIDIA/multi-storage-client
License: Apache-2.0
Author: NVIDIA Multi-Storage Client Team
Requires-Python: >=3.8.1,<4.0.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Provides-Extra: aistore
Provides-Extra: azure-storage-blob
Provides-Extra: boto3
Provides-Extra: fsspec
Provides-Extra: google-cloud-storage
Provides-Extra: oci
Provides-Extra: torch
Provides-Extra: xarray
Provides-Extra: zarr
Requires-Dist: aistore (>=1.8.1) ; extra == "aistore"
Requires-Dist: azure-storage-blob (>=12.0.0) ; extra == "azure-storage-blob"
Requires-Dist: boto3 (>=1.28.17) ; extra == "boto3"
Requires-Dist: datasketches (>=5.1.0,<6.0.0)
Requires-Dist: filelock (>=3.14.0,<4.0.0)
Requires-Dist: fsspec (>=2022.8.2) ; extra == "fsspec"
Requires-Dist: google-cloud-storage (>=2.10.0) ; extra == "google-cloud-storage"
Requires-Dist: jsonschema (>=4,<5)
Requires-Dist: numpy (<1.25.0) ; python_version >= "3.8" and python_version < "3.9"
Requires-Dist: numpy (>=1.24.0) ; python_version >= "3.9" and python_version < "4.0"
Requires-Dist: oci (>=2.117.0) ; extra == "oci"
Requires-Dist: opentelemetry-api (>=1.24.0,<2.0.0)
Requires-Dist: opentelemetry-exporter-otlp-proto-http (>=1.24.0,<2.0.0)
Requires-Dist: opentelemetry-sdk (>=1.24.0,<2.0.0)
Requires-Dist: python-dateutil (>=2.7.0,<3.0.0)
Requires-Dist: pyyaml (>=6,<7)
Requires-Dist: torch (>=2.0.0,<3.0.0) ; extra == "torch"
Requires-Dist: xarray (<=2023.1.0) ; (python_version < "3.9") and (extra == "xarray")
Requires-Dist: xarray (>=2023.10.0) ; (python_version >= "3.9" and python_version < "4.0") and (extra == "xarray")
Requires-Dist: zarr (>=2.15.0) ; extra == "zarr"
Project-URL: Repository, https://github.com/NVIDIA/multi-storage-client
Description-Content-Type: text/markdown

# Multi-Storage Client

The Multi-Storage Client (MSC) is a unified high-performance Python client for object and file stores such as AWS S3, Azure Blob Storage, Google Cloud Storage (GCS), NVIDIA AIStore, Oracle Cloud Infrastructure (OCI) Object Storage, POSIX file systems, and more.

It provides a generic interface to interact with objects and files across various storage services. This lets you spend less time learning each storage service's unique interface and lets you change where data is stored without having to change how your code accesses it.

See the [documentation](https://nvidia.github.io/multi-storage-client) to get started.

## Layout

Important landmarks:

```text
Key:
🤖 = Generated

.
│   # GitHub templates and pipelines.
├── .github/
│   └── ...
│
│   # GitLab templates and pipelines.
├── .gitlab/
│   └── ...
│
│   # Release notes.
├── .release_notes/
│   └── ...
│
│   # Python package build outputs.
├── dist/ 🤖
│   └── ...
│
│   # Python documentation configuration.
├── docs/
│   │
│   │   # Python documentation build outputs.
│   ├── dist/ 🤖
│   │   └── ...
│   │
│   │   # Python documentation source.
│   ├── src/
│   │   └── ...
│   │
│   │   # Python documentation configuration.
│   └── conf.py
│
│   # Python package source.
├── src/
│   └── ...
│
│   # Python package test source.
├── tests/
│   │
│   │   # Unit tests.
│   ├── unit/
│   │   └── ...
│   │
│   │   # Integration tests.
│   ├── integration/
│   │   └── ...
│   │
│   │   # End-to-end (E2E) tests.
│   └── e2e/
│       └── ...
│
│   # GitLab pipeline entrypoint.
├── .gitlab-ci.yml
│
│   # Integration test containers.
├── docker-compose.yml
│
│   # Reproducible shell configuration.
├── flake.nix
├── flake.lock 🤖
│
│   # Build recipes.
├── justfile
│
│   # Python package configuration.
├── pyproject.toml
└── poetry.lock 🤖
```

## Tools

### Nix

[Nix](https://nixos.org) is a package manager and build system centered around reproducibility.

For us, Nix's most useful feature is its ability to create reproducible + isolated CLI shells on the same machine which use different versions of the same package (e.g. Java 17 and 21). Shell configurations can be encapsulated in Nix configurations which can be shared across multiple computers.

The best way to install Nix is with the [Determinate Nix Installer](https://github.com/DeterminateSystems/nix-installer) ([guide](https://zero-to-nix.com/start/install)).

Once installed, running `nix develop` in a directory with a `flake.nix` will create a nested Bash shell defined by the flake.

> If you're on a network with lots of GitHub traffic, you may get a rate limiting error. To work around this, you can either switch networks (e.g. turn off VPN) or add a GitHub personal access token (classic) to your `/etc/nix/nix.conf` (system) or `~/.config/nix/nix.conf` (user).
>
> ```text
> # https://nixos.org/manual/nix/stable/command-ref/conf-file
> access-tokens = github.com=ghp_{rest of token}
> ```

### direnv

[direnv](https://direnv.net) ([🍺](https://formulae.brew.sh/formula/direnv)) is a shell extension which can automatically load and unload environment variables when you enter or leave a specific directory.

It can automatically load and unload a Nix environment when we enter and leave a project directory.

__Unlike `nix develop` which drops you in a nested Bash shell, direnv extracts the environment variables from the nested Bash shell into your current shell (e.g. Bash, Zsh, Fish).__

Follow the [installation instructions on its website](https://direnv.net#basic-installation).

#### Editor Plugins

Plugins to add editor support for direnv. Note that these won't automatically reload the environment after you change Nix flakes unlike direnv itself so you need to manually trigger a reload.

* Sublime Text
    * [Direnv](https://packagecontrol.io/packages/Direnv) ([caveat](https://github.com/misuzu/direnv-subl#limitations))
* JetBrains IDEs
    * [Direnv integration](https://plugins.jetbrains.com/plugin/15285-direnv-integration) ([JetBrains feature request](https://youtrack.jetbrains.com/issue/IDEA-320397))
* Visual Studio Code
    * [direnv](https://marketplace.visualstudio.com/items?itemName=mkhl.direnv) ([caveat](https://github.com/direnv/direnv-vscode/issues/109))
* Vim
    * [direnv.vim](https://github.com/direnv/direnv.vim)

## Developing

Common recipes are provided as Just recipes. To list them, run:

```shell
just
```

### Building the Package

To do a full release build (runs static analysis + unit tests), run:

```shell
just build
```

If you want to use a specific Python binary such as Python 3.9, run:

```shell
just python-binary=python3.9 build
```

### Running Tests

The project includes unit, integration, and end-to-end (E2E) tests. In most cases, you'll only run the unit and integration tests.

#### Unit Tests

Unit tests verify the functionality of individual components:

```shell
poetry run pytest tests/unit/
```

#### Integration Tests

Integration tests verify interactions between components and local storage services:

```shell
just start-storage-systems

just run-integration-tests

just stop-storage-systems
```

If you want to use a specific Python binary such as Python 3.9, run:

```shell
just python-binary=python3.9 run-integration-tests
```

## Notes

### Updating Flake Locks

The `flake.lock` file locks the inputs (e.g. the Nixpkgs revision) used to evaluate `flake.nix` files. To update the inputs (e.g. to get newer packages in a later Nixpkgs revision), you'll need to update your `flake.lock` file.

```shell
# Update flake.lock.
nix flake update
```

