Data Hub File Service - a service enabling file inspection and re-encryption at Data Hubs
Here you should provide a short summary of the purpose of this microservice.
We recommend using the provided Docker container.
A pre-built version is available on Docker Hub:
docker pull ghga/datahub-file-service:0.1.0Or you can build the container yourself from the ./Dockerfile:
# Execute in the repo's root dir:
docker build -t ghga/datahub-file-service:0.1.0 .For production-ready deployment, we recommend using Kubernetes. However for simple use cases, you could execute the service using docker on a single server:
# The entrypoint is pre-configured:
docker run -p 8080:8080 ghga/datahub-file-service:0.1.0 --helpIf you prefer not to use containers, you may install the service from source:
# Execute in the repo's root dir:
pip install .
# To run the service:
dhfs --helpThe service requires the following configuration parameters:
-
object_storages(object, required): Can contain additional properties.- Additional properties: Refer to #/$defs/S3ObjectStorageNodeConfig.
-
log_level(string): The minimum log level to capture. Must be one of: "CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG", or "TRACE". Default:"INFO". -
service_name(string): Short name of this service. Default:"dhfs". -
service_instance_id(string, required): A string that uniquely identifies this instance across all instances of this service. This is included in log messages.Examples:
"germany-bw-instance-001" -
log_format: If set, will replace JSON formatting with the specified string format. If not set, has no effect. In addition to the standard attributes, the following can also be specified: timestamp, service, instance, level, correlation_id, and details. Default:null.Examples:
"%(timestamp)s - %(service)s - %(level)s - %(message)s""%(asctime)s - Severity: %(levelno)s - %(msg)s" -
log_traceback(boolean): Whether to include exception tracebacks in log messages. Default:true.
-
S3Config(object): S3-specific config params. Inherit your config class from this class if you need to talk to an S3 service in the backend.
Args: s3_endpoint_url (str): The URL to the S3 endpoint. s3_access_key_id (str): Part of credentials for login into the S3 service. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html s3_secret_access_key (str): Part of credentials for login into the S3 service. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html s3_session_token (str | None): Optional part of credentials for login into the S3 service. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html aws_config_ini (Path | None): Path to a config file for specifying more advanced S3 parameters. This should follow the format described here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-a-configuration-file Defaults to None. Cannot contain additional properties.-
s3_endpoint_url(string, required): URL to the S3 API.Examples:
"http://localhost:4566" -
s3_access_key_id(string, required): Part of credentials for login into the S3 service. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html.Examples:
"my-access-key-id" -
s3_secret_access_key(string, format: password, required and write-only): Part of credentials for login into the S3 service. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html.Examples:
"my-secret-access-key" -
s3_session_token: Part of credentials for login into the S3 service. See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html. Default:null.Examples:
"my-session-token" -
aws_config_ini: Path to a config file for specifying more advanced S3 parameters. This should follow the format described here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-a-configuration-file. Default:null.Examples:
"~/.aws/config"
-
-
S3ObjectStorageNodeConfig(object): Configuration for one specific object storage node and one bucket in it.
The bucket is the main bucket that the service is responsible for. Cannot contain additional properties.bucket(string, required)credentials(required): Refer to #/$defs/S3Config.
A template YAML file for configuring the service can be found at
./example_config.yaml.
Please adapt it, rename it to .dhfs.yaml, and place it in one of the following locations:
- in the current working directory where you execute the service (on Linux:
./.dhfs.yaml) - in your home directory (on Linux:
~/.dhfs.yaml)
The config YAML file will be automatically parsed by the service.
Important: If you are using containers, the locations refer to paths within the container.
All parameters mentioned in the ./example_config.yaml
can also be set using environment variables or file secrets.
For naming the environment variables, just prefix the parameter name with dhfs_,
e.g. for the host set an environment variable named dhfs_host
(you may use both upper or lower cases, however, it is standard to define all env
variables in upper cases).
To use file secrets, please refer to the corresponding section of the pydantic documentation.
An OpenAPI specification for this service can be found here.
This is a Python-based service following the Triple Hexagonal Architecture pattern. It uses protocol/provider pairs and dependency injection mechanisms provided by the hexkit library.
For setting up the development environment, we rely on the devcontainer feature of VS Code in combination with Docker Compose.
To use it, you have to have Docker Compose as well as VS Code with its "Remote - Containers"
extension (ms-vscode-remote.remote-containers) installed.
Then open this repository in VS Code and run the command
Remote-Containers: Reopen in Container from the VS Code "Command Palette".
This will give you a full-fledged, pre-configured development environment including:
- infrastructural dependencies of the service (databases, etc.)
- all relevant VS Code extensions pre-installed
- pre-configured linting and auto-formatting
- a pre-configured debugger
- automatic license-header insertion
Inside the devcontainer, a command dev_install is available for convenience.
It installs the service with all development dependencies, and it installs pre-commit.
The installation is performed automatically when you build the devcontainer. However,
if you update dependencies in the ./pyproject.toml or the
lock/requirements-dev.txt, run it again.
This repository is free to use and modify according to the Apache 2.0 License.
This README file is auto-generated, please see .readme_generation/README.md for details.