# eccenca Corporate Memory docker orchestration

This repository contains our products in a Corporate Memory orchestration.
It may be used for starting Corporate Memory locally or for deployments utilizing `docker compose`.

<!-- vim-markdown-toc GitLab -->
* [eccenca Corporate Memory docker orchestration](#eccenca-corporate-memory-docker-orchestration)
  * [Quick start](#quick-start)
    * [Requirements](#requirements)
    * [Installation](#installation)
  * [Services Overview](#services-overview)
  * [User Management](#user-management)
  * [Advanced options](#advanced-options)
    * [Using the help](#using-the-help)
    * [Configuration](#configuration)
    * [Using a feature or bugfix branch of a component](#using-a-feature-or-bugfix-branch-of-a-component)
    * [Selecting your TripleStore](#selecting-your-triplestore)
      * [Add a new triple store backend](#add-a-new-triple-store-backend)
    * [Docker Compose Store Extension](#docker-compose-store-extension)
    * [Starting with a non-static exposed port exposed for the Apache server](#starting-with-a-non-static-exposed-port-exposed-for-the-apache-server)
    * [Starting with DataIntegration plugins](#starting-with-dataintegration-plugins)
    * [Starting with the Corporate Memory Tutorials](#starting-with-the-corporate-memory-tutorials)
    * [DataIntegration VIRTUAL\_JDBC\_ENDPOINT](#dataintegration-virtual_jdbc_endpoint)
    * [DataPlatform OpenAPI configuration](#dataplatform-openapi-configuration)
      * [Activation](#activation)
      * [Custom server URLs](#custom-server-urls)
    * [Update / Upgrade bootstrap data](#update--upgrade-bootstrap-data)
  * [Extensions](#extensions)
  * [SSL / TLS Deployments](#ssl--tls-deployments)
  * [Backup and restore](#backup-and-restore)
    * [Backup](#backup)
    * [Restore](#restore)
    * [Keycloak](#keycloak)
      * [Backup](#backup-1)
      * [Restore](#restore-1)

<!-- vim-markdown-toc -->

## Quick start

This quick start guide contains all needed information in order to start the orchestration locally at [http://docker.localhost/](http://docker.localhost/).

### Requirements

* `memory`, dedicate a minimum of 10GB to your docker installation to run the system comfortably.

The following command line tools are required:

* `docker`, installed as described in [confluence](https://confluence.eccenca.com/display/ECCDEVOPS/Docker).
* `git`, the version control client
* `make`, the build tool used to control environments and execute tasks (deprecated)
* `jq`, json parser to check for container outputs
* (optional `Task`, the build tool used to control environments and execute tasks)

### Installation

Follow theses steps to install and start the orchestration, which only contains datasets and vocabularies required for CMEM to run (called bootstrap data):

```sh
# clone the project
git clone --recursive ssh://git@gitlab.eccenca.com:8101/elds/cmem-orchestration.git && cd cmem-orchestration
# checkout the submodules
git submodule update --init --recursive
# login into https://docker-registry.eccenca.com docker registry
docker login -u YOUR_LDAP_USERNAME https://docker-registry.eccenca.com
# add docker.localhost to /etc/hosts (you need to be root to do this)
grep "127.0.0.1\s*docker\.localhost" /etc/hosts || sudo bash -c 'echo "127.0.0.1 docker.localhost" >> /etc/hosts'

# start the components with bootstrap data (no SSL, normal docker.localhost dev mode)
make clean-pull-start-bootstrap
```

After that, you have a basic blank installation available at:

* [http://docker.localhost/](http://docker.localhost/)

If you want, you may restore usage data as described in [Backup and restore](#backup-and-restore).

**DEFAULT USER CREDENTIALS**: `admin:admin`.

## Services Overview

After successful start, the following services will be available:

* DataManager at [http://docker.localhost](http://docker.localhost)
* DataIntegration at [http://docker.localhost/dataintegration/](http://docker.localhost/dataintegration/)
  * DataIntegration supports multiple workflow executors.
      Per default the `ExecuteSparkWorkflow` is enabled.
      If you want to use another executor, you can change it with the environment variable `DATAINTEGRATION_EXECUTOR`:

      `DATAINTEGRATION_EXECUTOR=ExecuteLocalWorkflow make start`
* DataPlatform at [http://docker.localhost/dataplatform/](http://docker.localhost/dataplatform/)
* Keycloak at [http://docker.localhost/auth/](http://docker.localhost/auth/)

Background services:

* `graphdb` as triple store
* `keycloak` as authentication and authorization server
* `postgres` as Keycloak's database
* `apache2` as reverse proxy to forward requests to the needed components
* `cmemc` as command line client used to handle bootstrap and backup/restore

## User Management

By default there is a 'cmem' realm in keycloak with an 'admin' users, which have admin rights. Keycloak can be reached at [http://docker.localhost/auth/](http://docker.localhost/auth/).
If you want to add users you either can add them through the web interface or from command line with cmemc. For that please refer the `cmemc admin user` command group [https://documentation.eccenca.com/24.1/automate/cmemc-command-line-interface/command-reference/admin/user/](https://documentation.eccenca.com/24.1/automate/cmemc-command-line-interface/command-reference/admin/user/)

## Advanced options

### Using the help

There is a detailed documented Makefile wich generates a help displayed in your terminal if you type

* `make help`
* or just `make`

### Configuration

The Corporate Memory docker orchestration is configured with environment files.

We suggest to create an environment file at `./environments/prod.env`.
In the environments folder you find several configuration templates as a starting point for your configuration.
For now, we assume you use the `config.ssl-letsencrypt.env` template:

```sh
cd environments
cp config.ssl-letsencrypt.env prod.env

# change DEPLOYHOST and LETSENCRYPT_MAIL values
vi prod.env
mv config.env config.bkp.env
ln -s prod.env config.env
```

### Using a feature or bugfix branch of a component

Sometimes you want to test a new feature (if you have access to development builds).
In order to switch a component to another version, you can (temporarily) change the `environments/config.env` file.

Search for the following section ...

```sh
###############################
# Component versions          #
###############################

#EXPLORE_VERSION=develop

#DI_VERSION=develop
```

... and change one of the docker image version identifier according to your needs, such as:

```sh
DI_VERSION=feature_myFeature
```

After that, you can do `make pull start` as usual.

### Selecting your TripleStore

Corporate Memory's default graph database is Ontotext GraphDB. To configure a specific triple store other than our default use the `EXPLORE_STORE` variable in your `.env`. See the `default.env` for potential options and some background on the status of integration and support level per store.

#### Add a new triple store backend

Triples stores are selected by their unique name-slug. See `default.env` or the available `docker-compose.store.YOUR-STORE-NAME.yml` files which options do exist. In order to configure a new triple store option with the orchestration you need to:

1. Add a new docker-compose store extension configuration
2. Add a new section into DataPlatform `application.yml`

### Docker Compose Store Extension

Start with an existing one as a template:

* Start with a copy of the graphdb yml file `cp docker-compose.store.graphdb.yml docker-compose.store.YOUR-STORE-NAME.yml` in case you want to run this store as a local container. Modify the file according to your needs.
* `ln -s docker-compose.store.stub.yml docker-compose.store.YOUR-STORE-NAME.yml` this will act as a placeholder in case you want to configure CMEM to use a _as a service_ store like AWS neptune. No further changes are needed in this file

### Starting with a non-static exposed port exposed for the Apache server

In some cases (e.g. running automated tests in the build server), it is useful to disable the Apache server port, since it might collide with other builds or deployments.
This can be achieved by changing the value of the following variable in the configuration:

* `APACHE_BASE_FILE=docker-compose.apache2-unexposed.yml` or
* `APACHE_BASE_FILE=docker-compose.apache2-exposed-random.yml`

### Starting with DataIntegration plugins

(TODO)
As DataIntegration is fully extensible, DatIntegration plugins can be activated by simply adding a plugin jar file into the plugin directory (`./conf/dataintegration/plugin`).
If the plugin affects the Apache SPARK execution, it is advisable to include the absolute path in the spark.jars property under the spark section in `dataintegration.conf` or the dedicated spark-defaults.conf file in dataintegration config directory.
Further information can be gleaned from the DataIntegration manuals.

### Starting with the Corporate Memory Tutorials

To automatically import projects used in the [tutorials](https://documentation.eccenca.com/latest/tutorials) from the official documentation you need to run the following command after the Corporate Memory initialization:

```sh
make tutorials-import
```

### DataIntegration VIRTUAL_JDBC_ENDPOINT

If you want to use the Hive server (Virtual Datasets JDBC endpoint), then you have to provide an ENV `VIRTUAL_JDBC_ENDPOINT` with the target port. The internal port is `10005`.

### DataPlatform OpenAPI configuration

#### Activation

In order to enable Swagger UI and OpenAPI generation,
the following environment variables in config.env need to be set to `true`.

```sh
EXPLORE_SWAGGER_UI_ENABLED=true
EXPLORE_API_DOCS_ENABLED=true
```

Those properties will activate the following endpoints:

* `/swagger-ui/` [http://docker.localhost/dataplatform/swagger-ui/](http://docker.localhost/dataplatform/swagger-ui/)
* `/v3/api-docs` [http://docker.localhost/dataplatform/v3/api-docs](http://docker.localhost/dataplatform/v3/api-docs) (JSON)
* `/v3/api-docs.yaml` [http://docker.localhost/dataplatform/v3/api-docs.yaml](http://docker.localhost/dataplatform/v3/api-docs.yaml) (YAML)

#### Custom server URLs

In the Swagger UI, the base URL for the endpoints is typically computed dynamically from the user requests.
In some scenarios, it may be convenient to define a custom base URL.
For that purpose, a new environment variable has been defined which takes a comma-separated list of URLs to be used as the base URL for the endpoint when executed inside the Swagger UI:

```sh
EXPLORE_OPENAPI_SERVER_URLS="https://aa.bb.c/d"
```

or

```sh
EXPLORE_OPENAPI_SERVER_URLS="https://aa.bb.c/d,http://aa.bb.c/d"
```

When more than one URL is defines, the user can pick one from a dropdown list on the Swagger UI.
If left empty or undefined, the default behavior (dynamically generated URL) will apply.

### Update / Upgrade bootstrap data

In case an existing deployment need to be updated to a new version, most likely you want to update the bootstrap data as well.

<!--
Theory: All resources delivered as bootstrap data are annotated with `shui:isSystemResource true`. You can safely remove them and add the new data. There is a special case that you should not remove `owl:imports` statements.
-->

Practice using make targets:

```sh
make bootstrap
```

You can also use cmemc for this:

```sh
cmemc admin store bootstrap --import
```

This will fire a DELETE query which looks only for `shui:isSystemResource` tagged resources and adds all bootstrap graphs from the started DataPlatform.

## Extensions

In order to extend the default orchestration with additional services, you can add a second orchestration file to it.

To see the list of available extensions, run `make enable-extension`.
This will also print additional information, like host addresses and credentials.

To run an extended orchestration, execute the following commands:

```sh
# this starts a normal orchestration
make clean start
# this extends the orchestration with the mysql extension (mysql + phpMyAdmin)
make enable-extension EXTENSION=mysql
```

Note: When you start `enable-extension` without a normal `make start` before, it will start the normal orchestration AND the extension with one command.

Note: `make stop` stops a normal and an extended orchestration.

Extension orchestration files are searched in `extensions` directory (see README.md there for more infos on creating extensions).

## SSL / TLS Deployments

A detailed description on TLS deployments is found in [README.tls.md](README.tls.md)

## Backup and restore

### Backup

* `make backup`

To backup the current running Corporate Memory instance `make backup` uses cmemc to export all graphs and workspaces. Also a postgres dump is created to save the current keycloak settings.
Finally a zip is created containing the keycloak-postgresql-dump, GraphDB backup, DataIntegration workspace backup and the volume containing DataIntegration-python-packages within the backup-folder.

### Restore

You can restore a backup from the backup folder.
To list all available backups you can use the `make backup-list` target.
If a backup will be restored, the zip file from the backup folder will be extracted.
This restores all previously backups showcase data to the backup folder.

* `make restore`
  * restore the by `BACKUP` parameter defined backup, if file exists.
    * usage: `make restore BACKUP=2024-06-17_23-16`, where `2024-06-17_23-16` is a backup name.
  * otherwise restore the backup which is linked by `latest` inside the backup-folder

**EXAMPLE**

```sh
## custom.backup-and-restore.Makefile
# list of all restore targets
make restore BACKUP=2024-06-17_23-16
```

### Keycloak

Since the authentication and authorization data is rather considered configuration (instead of data), backup for the PostgreSQL database is handled differently and separately from the rest.

#### Backup

In order to secure the state of the PostgreSQL Keycloak database, execute:

```sh
#custom.backup-and-restore.Makefile
 make backup-keycloak
```

This will create a file `${BACKUPS_DIR}/keycloak/latest.sql`and linked to `${BACKUPS_DIR}/keycloak/${DATETIME}.sql`

#### Restore

Contents of the file `conf/postgres/keycloak_db.sql` are loaded automatically on orchestration clean start-up.
If you however want to load this file after startup (e.g. after manually rewriting it), use the following command in order to force its upload:

```sh
#custom.backup-and-restore.Makefile
 make keycloak-restore
```

With this command the script copies `${BACKUPS_DIR}/keycloak/latest.sql` to `conf/postgres/keycloak_db.sql` and a container recreation is triggered.
