=========
Configure
=========

configuration yaml
------------------

You need to provide a `config.yaml` file so that `kafkalo` will know how to connect and authenticate with your Kafka infrastrucure and related components (Schema registry, RBAC metadata server).

Look at sample config file included.

Config yaml structure:

.. code-block:: YAML

   ---
   # Configs to reach out all API endpoints
   connections:
     kafka:
       # The bootstrap brokers to make initial connection to. A list
       bootstrapBrokers: ["localhost:9093"]
       ssl:
         # Enable or disable SSL
         enabled: false
         # Path to a CA file to add to the truststore.
         caPath: "/home/user/ca.crt"
       kerberos:
         # Enable/Disable Kerberos authentication
         enabled: false
         # Service name of Kafka (typically 'kafka')
         serviceName: "kafka"
         # Realm to use.
         realm: "EXAMPLE.COM"
         # Username to use (username is appended to realm to form principal)
         username: "trololol"
         # Password. Only if password authentication is used
         password: "password!"
         # path to keytab file to use for authentication.
         # AFAIK sarama does not support using a ticket from the  credential cache
         keytab: "path_to.key" # Will disable user auth if specified
     schemaregistry:
       # The URL of the schema registry
       url: "http://localhost:8081"
       # Timeout for REST calls made to schema registry in seconds
       # Defaults to 5. If you have many subjects you may want to increase this
       timeout: 10
       # Username to use to authenticate
       username: "username"
       # Password to use to authenticate
       password: "password"
       # Path a CA file to add to trust store
       caPath: "/home/user/ca.crt"
       # When you set skipRegistryForReads to true, it will read the _schemas topic directly and build an internal representation of schemas/subjects and configs. It will then use that in-memory cache for queries that would other go to Schema registry REST API. Mutating requests still go to Schema registry REST endpoint.
       # This can provide a huge speed benefit when there are many subjects/schemas.
       skipRegistryForReads: false
     mds:
       # URL to Confluent Metadata Service (MDS)
       url: "http://localhost:8090"
       # Username to authenticate as
       username: "username"
       # Password to use for MDS
       password: "password"
       # Schema registry cluster-id
       schema-registry-cluster-id: "schemaregistry"
       # Connect Cluster ID (if Connect rolebindings are needed)
       connect-cluster-id: "connect-cluster"
       # KSQL cluste id (if ksql rolebindings are needed)
       ksql-cluster-id: "ksql-cluster"
       # Path a CA file to add to trust store
       caPath: "/home/user/ca.crt"

   # App specific configs
   kafkalo:
     input_dirs:
       - "data/*"
       - "data/team2.yaml"
     # This path will be prepended to Schema paths that are not absolute.
     # If a schema: "somedir/schema.json" is defined, it will be treated as:
     # "data/somedir/schema.json"
     schema_dir: "data/"


You can add input dirs with glob patterns to let kafkalo know where to find your YAML definitions. 
Kafkalo will read all the input YAMLs, merge then into a single internal data structure and try to sync them.

Bypassing schema registry for speed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you have many subjects registered, then making REST API calls to schema registry to identify if a schema is already registered, and if the Compatibility matches what is requested, can results in thousands of queries and can take a long time.

There is an option `skipRegistryForReads` that can be set to true, which will consume the `_schemas` topic directly.

It will then construct an in-memory cache of the Schema registry data (schemas, subjects, compatibility) and use that instead of asking the Schema registry.
Mutating operations will still go through the REST API.

Note that this can, in theory, result in discrepancies with how the schema registry handles things (especially canonicalizing and comparing schemas). So please use carefully and only if needed.

encryption
~~~~~~~~~~

`gafkalo` will automatically try to decrypt the config file with sops_. If there no sops metadata in the yaml it will read it as plaintext, otherwise it will attempt to decrypt.

Refer to sops_ for further configuration.

sops_ is bundled as a library and there is no need to have the sops binary in the path.


.. _sops: https://github.com/mozilla/sops

input yaml
----------

Kafkalo will read YAML input file and apply the definitions to the Kafka brokers, Schema registry and Metadata service (Confluent RBAC).

A sample YAML file is as follows:


.. code-block:: YAML

   topics:
     - name: SKATA.VROMIA.POLY
       partitions: 6
       replication_factor: 1
       # Any topic configs can be added to this key
       configs:
         cleanup.policy: delete
         min.insync.replicas: 1
         retention.ms: 10000000
       key:
         # Lookup is relative to file
         schema: "schema-key.json"
         compatibility: BACKWARD
       value:
         schema: "schema.json"
         compatibility: NONE
     - name: SKATA.VROMIA.LIGO
       partitions: 6
       replication_factor: 3
       configs:
         cleanup.policy: delete
         min.insync.replicas: 1
       key:
         schema: "schema-key.json"
     - name: SKATA1
       partitions: 1
       replication_factor: 1
     - name: SKATA2
       partitions: 1
       replication_factor: 1
     - name: SKATA3
       partitions: 1
       replication_factor: 1
     - name: SKATA4
       partitions: 1
       replication_factor: 1
     - name: SKATA5
       partitions: 1
       replication_factor: 1
     - name: SKATA6
       partitions: 1
       replication_factor: 1
     - name: SKATA7
       partitions: 1
       replication_factor: 1
   # Clients configures the RBAC (Confluent MDS)
   clients:
     # principals must be in the form User:name or Group:name
     # For each principal you can have a consumer_for, producer_for or resourceowner_for
     # and the topics for each of these categories
     - principal: User:poutanaola
       consumer_for:
         # By default we will use PREFIXED
         # set isLiteral: true to set it to LITERAL
         - topic: TOPIC1.
         - topic: TOPIC2.
           isLiteral: true
       producer_for:
         - topic: TOPIC1.
       resourceowner_for:
         - topic: TOPIC4.
     - principal: Group:malakes
       consumer_for:
         - topic: TOPIC1.
         - topic: TOPIC2.
       producer_for:
         - topic: TOPIC1.
     - principal: User:produser
       producer_for:
         - topic: TOPIC1.
           # Strict mode is mean for production.
           # It will make the producer able to write the topics but read-only
           # access to the schema registry
           strict: false
      # Alllow this principal access to the following consumer groups.
      # roles can be defined but defaults to DeveloperRead
       groups:
         - name: consumer-produser-
         - name: consumer-produser-owner-
           # if not specified, roles is [DeveloperRead]
           roles: ["ResourceOwner"]
           # prefixed is true by default but can be disabled like below
           prefixed: false
     

topics
~~~~~~
For each topic under the `topics:` key define the name and the required parameters. 
The `configs:` sections is optional and defaults for the cluster will be used.

clients
~~~~~~~~~

This tools is meant to make common tasks easy, not to make anything possible (at least, not yet)
For this reason we define rolebindings primarily by the client's function.
A client meant to be a consumer will have `consumer_for` defined and the topics it can consume from. This will automatically add the correct permissions for the schema registry. You will need to add a `group:` field to add the consumer group permisssion

For producers the `producer_for` section works the same way as the consumer
You can define a role as `strict: true` if you want to disable writing new schemas in the schema registry. Useful for production systems