Configure

configuration yaml

You need to provide a config.yaml file so that kafkalo will know how to connect and authenticate with your Kafka infrastrucure and related components (Schema registry, RBAC metadata server).

Look at sample config file included.

Config yaml structure:

---
# Configs to reach out all API endpoints
connections:
  kafka:
    # The bootstrap brokers to make initial connection to. A list
    bootstrapBrokers: ["localhost:9093"]
    ssl:
      # Enable or disable SSL
      enabled: false
      # Path to a CA file to add to the truststore.
      caPath: "/home/user/ca.crt"
    kerberos:
      # Enable/Disable Kerberos authentication
      enabled: false
      # Service name of Kafka (typically 'kafka')
      serviceName: "kafka"
      # Realm to use.
      realm: "EXAMPLE.COM"
      # Username to use (username is appended to realm to form principal)
      username: "trololol"
      # Password. Only if password authentication is used
      password: "password!"
      # path to keytab file to use for authentication.
      # AFAIK sarama does not support using a ticket from the  credential cache
      keytab: "path_to.key" # Will disable user auth if specified
  schemaregistry:
    # The URL of the schema registry
    url: "http://localhost:8081"
    # Timeout for REST calls made to schema registry in seconds
    # Defaults to 5. If you have many subjects you may want to increase this
    timeout: 10
    # Username to use to authenticate
    username: "username"
    # Password to use to authenticate
    password: "password"
    # Path a CA file to add to trust store
    caPath: "/home/user/ca.crt"
  mds:
    # URL to Confluent Metadata Service (MDS)
    url: "http://localhost:8090"
    # Username to authenticate as
    username: "username"
    # Password to use for MDS
    password: "password"
    # Schema registry cluster-id
    schema-registry-cluster-id: "schemaregistry"
    # Connect Cluster ID (if Connect rolebindings are needed)
    connect-cluster-id: "connect-cluster"
    # KSQL cluste id (if ksql rolebindings are needed)
    ksql-cluster-id: "ksql-cluster"
    # Path a CA file to add to trust store
    caPath: "/home/user/ca.crt"

# App specific configs
kafkalo:
  input_dirs:
    - "data/*"
    - "data/team2.yaml"
  # This path will be prepended to Schema paths that are not absolute.
  # If a schema: "somedir/schema.json" is defined, it will be treated as:
  # "data/somedir/schema.json"
  schema_dir: "data/"

You can add input dirs with glob patterns to let kafkalo know where to find your YAML definitions. Kafkalo will read all the input YAMLs, merge then into a single internal data structure and try to sync them.

encryption

gafkalo will automatically try to decrypt the config file with sops. If there no sops metadata in the yaml it will read it as plaintext, otherwise it will attempt to decrypt.

Refer to sops for further configuration.

sops is bundled as a library and there is no need to have the sops binary in the path.

input yaml

Kafkalo will read YAML input file and apply the definitions to the Kafka brokers, Schema registry and Metadata service (Confluent RBAC).

A sample YAML file is as follows:

topics:
  - name: SKATA.VROMIA.POLY
    partitions: 6
    replication_factor: 1
    # Any topic configs can be added to this key
    configs:
      cleanup.policy: delete
      min.insync.replicas: 1
      retention.ms: 10000000
    key:
      # Lookup is relative to file
      schema: "schema-key.json"
      compatibility: BACKWARD
    value:
      schema: "schema.json"
      compatibility: NONE
  - name: SKATA.VROMIA.LIGO
    partitions: 6
    replication_factor: 3
    configs:
      cleanup.policy: delete
      min.insync.replicas: 1
    key:
      schema: "schema-key.json"
  - name: SKATA1
    partitions: 1
    replication_factor: 1
  - name: SKATA2
    partitions: 1
    replication_factor: 1
  - name: SKATA3
    partitions: 1
    replication_factor: 1
  - name: SKATA4
    partitions: 1
    replication_factor: 1
  - name: SKATA5
    partitions: 1
    replication_factor: 1
  - name: SKATA6
    partitions: 1
    replication_factor: 1
  - name: SKATA7
    partitions: 1
    replication_factor: 1
# Clients configures the RBAC (Confluent MDS)
clients:
  # principals must be in the form User:name or Group:name
  # For each principal you can have a consumer_for, producer_for or resourceowner_for
  # and the topics for each of these categories
  - principal: User:poutanaola
    consumer_for:
      # By default we will use PREFIXED.
      # set prefixed: false to set it to LITERAL
      - topic: TOPIC1.
      - topic: TOPIC2.
        prefixed: false
    producer_for:
      - topic: TOPIC1.
    resourceowner_for:
      - topic: TOPIC4.
  - principal: Group:malakes
    consumer_for:
      - topic: TOPIC1.
      - topic: TOPIC2.
    producer_for:
      - topic: TOPIC1.
  - principal: User:produser
    producer_for:
      - topic: TOPIC1.
        # Strict mode is mean for production.
        # It will make the producer able to write the topics but read-only
        # access to the schema registry
        strict: false
   # Alllow this principal access to the following consumer groups.
   # roles can be defined but defaults to DeveloperRead
    groups:
      - name: consumer-produser-
      - name: consumer-produser-owner-
        # if not specified, roles is [DeveloperRead]
        roles: ["ResourceOwner"]
        # prefixed is true by default but can be disabled like below
        prefixed: false

topics

For each topic under the topics: key define the name and the required parameters. The configs: sections is optional and defaults for the cluster will be used.

clients

This tools is not meant to make common tasks easy, not to make anything possible (at least, not yet) For this reason we define rolebindings primarily by the client’s function. A client meant to be a consumer will have consumer_for defined and the topics it can consume from. This will automatically add the correct permissions for the schema registry. You will need to add a group: field to add the consumer group permisssion

For producers the producer_for section works the same way as the consumer You can define a role as strict: true if you want to disable writing new schemas in the schema registry. Useful for production systems