Configure¶
configuration yaml¶
You need to provide a config.yaml file so that kafkalo will know how to connect and authenticate with your Kafka infrastrucure and related components (Schema registry, RBAC metadata server).
Look at sample config file included.
Config yaml structure:
---
# Configs to reach out all API endpoints
connections:
kafka:
# The bootstrap brokers to make initial connection to. A list
bootstrapBrokers: ["localhost:9093"]
ssl:
# Enable or disable SSL
enabled: false
# Path to a CA file to add to the truststore.
caPath: "/home/user/ca.crt"
kerberos:
# Enable/Disable Kerberos authentication
enabled: false
# Service name of Kafka (typically 'kafka')
serviceName: "kafka"
# Realm to use.
realm: "EXAMPLE.COM"
# Username to use (username is appended to realm to form principal)
username: "trololol"
# Password. Only if password authentication is used
password: "password!"
# path to keytab file to use for authentication.
# AFAIK sarama does not support using a ticket from the credential cache
keytab: "path_to.key" # Will disable user auth if specified
schemaregistry:
# The URL of the schema registry
url: "http://localhost:8081"
# Timeout for REST calls made to schema registry in seconds
# Defaults to 5. If you have many subjects you may want to increase this
timeout: 10
# Username to use to authenticate
username: "username"
# Password to use to authenticate
password: "password"
# Path a CA file to add to trust store
caPath: "/home/user/ca.crt"
mds:
# URL to Confluent Metadata Service (MDS)
url: "http://localhost:8090"
# Username to authenticate as
username: "username"
# Password to use for MDS
password: "password"
# Schema registry cluster-id
schema-registry-cluster-id: "schemaregistry"
# Connect Cluster ID (if Connect rolebindings are needed)
connect-cluster-id: "connect-cluster"
# KSQL cluste id (if ksql rolebindings are needed)
ksql-cluster-id: "ksql-cluster"
# Path a CA file to add to trust store
caPath: "/home/user/ca.crt"
# App specific configs
kafkalo:
input_dirs:
- "data/*"
- "data/team2.yaml"
# This path will be prepended to Schema paths that are not absolute.
# If a schema: "somedir/schema.json" is defined, it will be treated as:
# "data/somedir/schema.json"
schema_dir: "data/"
You can add input dirs with glob patterns to let kafkalo know where to find your YAML definitions. Kafkalo will read all the input YAMLs, merge then into a single internal data structure and try to sync them.
encryption¶
gafkalo will automatically try to decrypt the config file with sops. If there no sops metadata in the yaml it will read it as plaintext, otherwise it will attempt to decrypt.
Refer to sops for further configuration.
sops is bundled as a library and there is no need to have the sops binary in the path.
input yaml¶
Kafkalo will read YAML input file and apply the definitions to the Kafka brokers, Schema registry and Metadata service (Confluent RBAC).
A sample YAML file is as follows:
topics:
- name: SKATA.VROMIA.POLY
partitions: 6
replication_factor: 1
# Any topic configs can be added to this key
configs:
cleanup.policy: delete
min.insync.replicas: 1
retention.ms: 10000000
key:
# Lookup is relative to file
schema: "schema-key.json"
compatibility: BACKWARD
value:
schema: "schema.json"
compatibility: NONE
- name: SKATA.VROMIA.LIGO
partitions: 6
replication_factor: 3
configs:
cleanup.policy: delete
min.insync.replicas: 1
key:
schema: "schema-key.json"
- name: SKATA1
partitions: 1
replication_factor: 1
- name: SKATA2
partitions: 1
replication_factor: 1
- name: SKATA3
partitions: 1
replication_factor: 1
- name: SKATA4
partitions: 1
replication_factor: 1
- name: SKATA5
partitions: 1
replication_factor: 1
- name: SKATA6
partitions: 1
replication_factor: 1
- name: SKATA7
partitions: 1
replication_factor: 1
# Clients configures the RBAC (Confluent MDS)
clients:
# principals must be in the form User:name or Group:name
# For each principal you can have a consumer_for, producer_for or resourceowner_for
# and the topics for each of these categories
- principal: User:poutanaola
consumer_for:
# By default we will use PREFIXED.
# set prefixed: false to set it to LITERAL
- topic: TOPIC1.
- topic: TOPIC2.
prefixed: false
producer_for:
- topic: TOPIC1.
resourceowner_for:
- topic: TOPIC4.
- principal: Group:malakes
consumer_for:
- topic: TOPIC1.
- topic: TOPIC2.
producer_for:
- topic: TOPIC1.
- principal: User:produser
producer_for:
- topic: TOPIC1.
# Strict mode is mean for production.
# It will make the producer able to write the topics but read-only
# access to the schema registry
strict: false
# Alllow this principal access to the following consumer groups.
# roles can be defined but defaults to DeveloperRead
groups:
- name: consumer-produser-
- name: consumer-produser-owner-
# if not specified, roles is [DeveloperRead]
roles: ["ResourceOwner"]
# prefixed is true by default but can be disabled like below
prefixed: false
topics¶
For each topic under the topics: key define the name and the required parameters. The configs: sections is optional and defaults for the cluster will be used.
clients¶
This tools is not meant to make common tasks easy, not to make anything possible (at least, not yet) For this reason we define rolebindings primarily by the client’s function. A client meant to be a consumer will have consumer_for defined and the topics it can consume from. This will automatically add the correct permissions for the schema registry. You will need to add a group: field to add the consumer group permisssion
For producers the producer_for section works the same way as the consumer You can define a role as strict: true if you want to disable writing new schemas in the schema registry. Useful for production systems