<!-- markdownlint-disable MD012 MD013 MD024 MD033 -->
<!-- markdownlint-configure-file
{
  "ul-indent": {
    "indent": "4"
  },
}
-->
# eccenca Corporate Memory docker orchestration

This repository contains our products in a Corporate Memory orchestration.
It may be used for starting Corporate Memory locally or for deployments utilizing `docker compose`.

<!-- vim-markdown-toc GitLab -->

* [Quick start](#quick-start)
    * [Requirements](#requirements)
    * [Installation](#installation)
* [Services Overview](#services-overview)
* [User Management](#user-management)
* [Advanced options](#advanced-options)
    * [Using the help](#using-the-help)
    * [Configuration](#configuration)
    * [Using a feature or bug-fix branch of a component](#using-a-feature-or-bug-fix-branch-of-a-component)
    * [Selecting a different TripleStore](#selecting-a-different-triplestore)
        * [Add a new triple store backend](#add-a-new-triple-store-backend)
    * [Docker Compose Store Extension](#docker-compose-store-extension)
    * [Starting with a non-static exposed port exposed for the Apache server](#starting-with-a-non-static-exposed-port-exposed-for-the-apache-server)
    * [Starting with the Corporate Memory Tutorials](#starting-with-the-corporate-memory-tutorials)
    * [DataIntegration VIRTUAL_JDBC_ENDPOINT](#dataintegration-virtual_jdbc_endpoint)
    * [Upgrade bootstrap data](#upgrade-bootstrap-data)
* [Extensions](#extensions)
* [SSL Deployments](#ssl-deployments)
* [Backup and restore](#backup-and-restore)
    * [Backup](#backup)
    * [Restore](#restore)
    * [Keycloak](#keycloak)
        * [Backup](#backup-1)
        * [Restore](#restore-1)

<!-- vim-markdown-toc -->

## Quick start

This quick start guide contains all needed information in order to start the orchestration locally at [http://docker.localhost/](http://docker.localhost/).

### Requirements

* `memory`, dedicate a minimum of 10GB to your docker installation to run the system comfortably.

The following command line tools are required:

* `docker`, the docker CLI as well as a local docker for desktop or docker daemon installation
* `git`, the version control client
* `make`, the build tool used to control environments and execute tasks
* `jq`, JSON parser to check for container outputs
* `sed`, GNU sed tool (Note: MacOS user need to install `gnu-sed` with brew and adapt the `PATH`)
* (optional `Task`, the build tool used to control environments and execute tasks)

### Installation

Follow theses steps to install and start the orchestration with a minimal set of configuration graphs (called bootstrap data):

``` sh
# clone the project
git clone --recursive ssh://git@gitlab.eccenca.com:8101/elds/cmem-orchestration.git && cd cmem-orchestration

# checkout the submodules
git submodule update --init --recursive

# login into https://docker-registry.eccenca.com docker registry
docker login -u YOUR_LDAP_USERNAME https://docker-registry.eccenca.com

# add docker.localhost to /etc/hosts (you need to be root to do this)
grep "127.0.0.1\s*docker\.localhost" /etc/hosts || sudo bash -c 'echo "127.0.0.1 docker.localhost" >> /etc/hosts'

# start the components with bootstrap data (no SSL, normal docker.localhost dev mode)
make clean-pull-start-bootstrap
```

After that, you have a basic blank installation available at:

* [http://docker.localhost/](http://docker.localhost/)

If you want, you may restore usage data as described in [Backup and restore](#backup-and-restore).

**DEFAULT USER CREDENTIALS**: `admin:admin`.

## Services Overview

After a successful start, the following services will be available:

* Explore at [http://docker.localhost](http://docker.localhost)
* Build (aka DataIntegration) at [http://docker.localhost/dataintegration/](http://docker.localhost/dataintegration/)
* Keycloak at [http://docker.localhost/auth/](http://docker.localhost/auth/)

Background services:

* `graphdb` as triple store
* `keycloak` as authentication and authorization server
* `postgres` as Keycloak's database
* `apache2` as reverse proxy to forward requests to the needed components
* `cmemc` as command line client used to handle bootstrap and backup/restore

## User Management

By default there is a 'cmem' realm in keycloak with an 'admin' users, which has administration rights.
Keycloak can be reached at [http://docker.localhost/auth/](http://docker.localhost/auth/).
If you want to add users you either can add them through the web interface or from command line with cmemc.
For that please refer the `cmemc admin user` command group [https://documentation.eccenca.com/latest/automate/cmemc-command-line-interface/command-reference/admin/user/](https://documentation.eccenca.com/latest/automate/cmemc-command-line-interface/command-reference/admin/user/)

## Advanced options

### Using the help

There is a detailed documented Makefile wich generates a help displayed in your terminal if you type

* `make help` or just `make`

### Configuration

The Corporate Memory docker orchestration is configured with environment files.

We suggest to create an environment file at `./environments/prod.env`.
In the environments folder you find several configuration templates as a starting point for your configuration.
For now, we assume you use the `config.ssl-letsencrypt.env` template:

``` sh
cd environments
cp config.ssl-letsencrypt.env prod.env

# change DEPLOYHOST and LETSENCRYPT_MAIL values
vi prod.env
mv config.env config.bkp.env
ln -s prod.env config.env
```

#### LLM Provider Setup (Explore)

In Explore we use Spring AI to configure an LLM provider. For full reference see [https://docs.spring.io/spring-ai/reference/api/chat/comparison.html](https://docs.spring.io/spring-ai/reference/api/chat/comparison.html).
For the most useful models and provider we have example configurations in `conf/explore/application-llm-<configname>`.
These can be enabled by using explores profile environment variable. By default it is set to `EXPLORE_LLM_PROFILE=llm-disabled`.
To enable e.g. OpenAI gpt5mini model you have to set it to `EXPLORE_LLM_PROFILE=llm-openai-gpt5mini`.

Please be aware you also need to have a API key for the provider and model of your choice. We reference those API keys in each configuration file as `EXPLORE_AI_APIKEY`.
So you also need to add your key to the `config.env` file.

A typical LLM configuration for the explore companion inside your config.env file would look like this:

``` sh
EXPLORE_LLM_PROFILE=llm-openai-gpt5mini
## Loaded Explore Application Profiles
EXPLORE_SPRING_PROFILES=${EXPLORE_STORE},${EXPLORE_LLM_PROFILE}
## API Key used in LLM profiles of Explore
## openrouter
EXPLORE_AI_APIKEY=<your-key-here>
```

#### LLM Provider Setup (DataIntegration)

For DataIntegrations Mapping Creator LLM support you would need this line:
``` sh
DATAINTEGRATION_AI_APIKEY=<your-key-here>
```
Further configuration for this can be found in `default.env` with variables used in `conf/dataintegration/dataintegration.conf` section `com.eccenca.di.assistant`.
You can also overwrite the defaults in your `config.env` file.

``` sh
DATAINTEGRATION_AI_COREURL=https://openrouter.ai/api/v1
DATAINTEGRATION_AI_MODEL=openai/gpt-5-mini
DATAINTEGRATION_AI_ORGID=
DATAINTEGRATION_AI_REASONINGEFFORT=low
DATAINTEGRATION_AI_LOGQUERIES=false
```

### Using a feature or bug-fix branch of a component

Sometimes you want to test a new feature (if you have access to development builds).
In order to switch a component to another version, you can (temporarily) change the `environments/config.env` file.

Search for the following section ...

``` sh
###############################
# Component versions          #
###############################

#EXPLORE_VERSION=develop

#DI_VERSION=develop
```

... and change one of the docker image version identifier according to your needs, such as:

``` sh
DI_VERSION=feature_myFeature
```

After that, you can do `make pull start` as usual.

### Selecting a different TripleStore

Corporate Memory's default graph database is Graphwise GraphDB.
To configure a specific triple store other than our default use the `EXPLORE_STORE` variable in your `.env`.
See the `default.env` for potential options and some background on the status of integration and support level per store.

#### Add a new triple store backend

Triples stores are selected by their unique name-slug.
See `default.env` or the available `docker-compose.store.YOUR-STORE-NAME.yml` files which options do exist.
In order to configure a new triple store option with the orchestration you need to:

1. Add a new docker-compose store extension configuration
2. Add a new section into DataPlatform `application.yml`

### Docker Compose Store Extension

Start with an existing one as a template:

* Start with a copy of the graphdb yml file `cp docker-compose.store.graphdb.yml docker-compose.store.YOUR-STORE-NAME.yml` in case you want to run this store as a local container. Modify the file according to your needs.
* `ln -s docker-compose.store.stub.yml docker-compose.store.YOUR-STORE-NAME.yml` this will act as a placeholder in case you want to configure CMEM to use a _as a service_ store like AWS neptune. No further changes are needed in this file

### Starting with a non-static exposed port exposed for the Apache server

In some cases (e.g. running automated tests in the build server), it is useful to disable the Apache server port, since it might collide with other builds or deployments.
This can be achieved by changing the value of the following variable in the configuration:

* `APACHE_BASE_FILE=docker-compose.apache2-unexposed.yml` or
* `APACHE_BASE_FILE=docker-compose.apache2-exposed-random.yml`

### Starting with the Corporate Memory Tutorials

To automatically import projects used in the [tutorials](https://documentation.eccenca.com/latest/tutorials) from the official documentation you need to run the following command after the Corporate Memory initialization:

``` sh
make tutorials-import
```

### DataIntegration VIRTUAL_JDBC_ENDPOINT

If you want to use the Hive server (Virtual Datasets JDBC endpoint), then you have to provide an environment variable `VIRTUAL_JDBC_ENDPOINT` with the target port. The internal port is `10005`.


### Upgrade bootstrap data

In case an existing deployment need to be updated to a new version, most likely you want to update the bootstrap data as well.

<!--
Theory: All resources delivered as bootstrap data are annotated with `shui:isSystemResource true`. You can safely remove them and add the new data. There is a special case that you should not remove `owl:imports` statements.
-->

Practice using make targets:

``` sh
make bootstrap
```

You can also use cmemc for this:

``` sh
cmemc admin store bootstrap --import
```

This will fire a DELETE query which looks only for `shui:isSystemResource` tagged resources and adds all bootstrap graphs from the started DataPlatform.

In addition to that, there is the migration recipe command group `cmemc admin migration`, which updates bootstrap data and executes other migration recipes.

## Extensions

In order to extend the default orchestration with additional services, you can add a second orchestration file to it.

To see the list of available extensions, run `make enable-extension`.
This will also print additional information, like host addresses and credentials.

To run an extended orchestration, execute the following commands:

``` sh
# this starts a normal orchestration
make clean start
# this extends the orchestration with the mysql extension (mysql + phpMyAdmin)
make enable-extension EXTENSION=mysql
```

Note: When you start `enable-extension` without a normal `make start` before, it will start the normal orchestration AND the extension with one command.

Note: `make stop` stops a normal and an extended orchestration.

Extension orchestration files are searched in `extensions` directory (see README.md there for more infos on creating extensions).

## SSL Deployments

A detailed description on TLS deployments is found in [README.tls.md](README.tls.md)

## Backup and restore

### Backup

* `make backup`

To backup the current running Corporate Memory instance `make backup` uses cmemc to export all graphs and workspaces.
Also a postgres dump is created to save the current keycloak settings.
Finally a zip is created containing the keycloak-postgresql-dump, triple store backup, build workspace backup and the volume containing DataIntegration-python-packages within the backup-folder.

### Restore

You can restore a backup from the backup folder.
To list all available backups you can use the `make backup-list` target.
If a backup will be restored, the zip file from the backup folder will be extracted.
This restores all previously backups showcase data to the backup folder.

* `make restore`
    * Restore the by `BACKUP` parameter defined backup, if file exists.
        * Usage: `make restore BACKUP=2024-06-17_23-16`, where `2024-06-17_23-16` is a backup name.
    * Otherwise restore the backup which is linked by `latest` inside the backup-folder

``` sh
## custom.backup-and-restore.Makefile
# list of all restore targets
make restore BACKUP=2024-06-17_23-16
```

### Keycloak

Since the authentication and authorization data is rather considered configuration (instead of data), backup for the PostgreSQL database is handled differently and separately from the rest.

#### Backup

In order to secure the state of the PostgreSQL Keycloak database, execute:

``` sh
#custom.backup-and-restore.Makefile
 make backup-keycloak
```

This will create a file `${BACKUPS_DIR}/keycloak/latest.sql` and link it to `${BACKUPS_DIR}/keycloak/${DATETIME}.sql`

#### Restore

Contents of the file `conf/postgres/keycloak_db.sql` are loaded automatically on orchestration clean start-up.
If you however want to load this file after startup (e.g. after manually rewriting it), use the following command in order to force its upload:

``` sh
#custom.backup-and-restore.Makefile
 make keycloak-restore
```

With this command the script copies `${BACKUPS_DIR}/keycloak/latest.sql` to `conf/postgres/keycloak_db.sql` and a container recreation is triggered.

