========= Configure ========= configuration yaml ------------------ You need to provide a `config.yaml` file so that `kafkalo` will know how to connect and authenticate with your Kafka infrastrucure and related components (Schema registry, RBAC metadata server). Look at sample config file included. Config yaml structure: .. code-block:: YAML --- # Configs to reach out all API endpoints connections: kafka: # The bootstrap brokers to make initial connection to. A list bootstrapBrokers: ["localhost:9093"] ssl: # Enable or disable SSL enabled: false # Path to a CA file to add to the truststore. caPath: "/home/user/ca.crt" kerberos: # Enable/Disable Kerberos authentication enabled: false # Service name of Kafka (typically 'kafka') serviceName: "kafka" # Realm to use. realm: "EXAMPLE.COM" # Username to use (username is appended to realm to form principal) username: "trololol" # Password. Only if password authentication is used password: "password!" # path to keytab file to use for authentication. # AFAIK sarama does not support using a ticket from the credential cache keytab: "path_to.key" # Will disable user auth if specified schemaregistry: # The URL of the schema registry url: "http://localhost:8081" # Timeout for REST calls made to schema registry in seconds # Defaults to 5. If you have many subjects you may want to increase this timeout: 10 # Username to use to authenticate username: "username" # Password to use to authenticate password: "password" # Path a CA file to add to trust store caPath: "/home/user/ca.crt" mds: # URL to Confluent Metadata Service (MDS) url: "http://localhost:8090" # Username to authenticate as username: "username" # Password to use for MDS password: "password" # Schema registry cluster-id schema-registry-cluster-id: "schemaregistry" # Connect Cluster ID (if Connect rolebindings are needed) connect-cluster-id: "connect-cluster" # KSQL cluste id (if ksql rolebindings are needed) ksql-cluster-id: "ksql-cluster" # Path a CA file to add to trust store caPath: "/home/user/ca.crt" # App specific configs kafkalo: input_dirs: - "data/*" - "data/team2.yaml" # This path will be prepended to Schema paths that are not absolute. # If a schema: "somedir/schema.json" is defined, it will be treated as: # "data/somedir/schema.json" schema_dir: "data/" You can add input dirs with glob patterns to let kafkalo know where to find your YAML definitions. Kafkalo will read all the input YAMLs, merge then into a single internal data structure and try to sync them. encryption ~~~~~~~~~~ `gafkalo` will automatically try to decrypt the config file with sops_. If there no sops metadata in the yaml it will read it as plaintext, otherwise it will attempt to decrypt. Refer to sops_ for further configuration. sops_ is bundled as a library and there is no need to have the sops binary in the path. .. _sops: https://github.com/mozilla/sops input yaml ---------- Kafkalo will read YAML input file and apply the definitions to the Kafka brokers, Schema registry and Metadata service (Confluent RBAC). A sample YAML file is as follows: .. code-block:: YAML topics: - name: SKATA.VROMIA.POLY partitions: 6 replication_factor: 1 # Any topic configs can be added to this key configs: cleanup.policy: delete min.insync.replicas: 1 retention.ms: 10000000 key: # Lookup is relative to file schema: "schema-key.json" compatibility: BACKWARD value: schema: "schema.json" compatibility: NONE - name: SKATA.VROMIA.LIGO partitions: 6 replication_factor: 3 configs: cleanup.policy: delete min.insync.replicas: 1 key: schema: "schema-key.json" - name: SKATA1 partitions: 1 replication_factor: 1 - name: SKATA2 partitions: 1 replication_factor: 1 - name: SKATA3 partitions: 1 replication_factor: 1 - name: SKATA4 partitions: 1 replication_factor: 1 - name: SKATA5 partitions: 1 replication_factor: 1 - name: SKATA6 partitions: 1 replication_factor: 1 - name: SKATA7 partitions: 1 replication_factor: 1 # Clients configures the RBAC (Confluent MDS) clients: # principals must be in the form User:name or Group:name # For each principal you can have a consumer_for, producer_for or resourceowner_for # and the topics for each of these categories - principal: User:poutanaola consumer_for: # By default we will use PREFIXED. # set prefixed: false to set it to LITERAL - topic: TOPIC1. - topic: TOPIC2. prefixed: false producer_for: - topic: TOPIC1. resourceowner_for: - topic: TOPIC4. - principal: Group:malakes consumer_for: - topic: TOPIC1. - topic: TOPIC2. producer_for: - topic: TOPIC1. - principal: User:produser producer_for: - topic: TOPIC1. # Strict mode is mean for production. # It will make the producer able to write the topics but read-only # access to the schema registry strict: false # Alllow this principal access to the following consumer groups. # roles can be defined but defaults to DeveloperRead groups: - name: consumer-produser- - name: consumer-produser-owner- # if not specified, roles is [DeveloperRead] roles: ["ResourceOwner"] # prefixed is true by default but can be disabled like below prefixed: false topics ~~~~~~ For each topic under the `topics:` key define the name and the required parameters. The `configs:` sections is optional and defaults for the cluster will be used. clients ~~~~~~~~~ This tools is not meant to make common tasks easy, not to make anything possible (at least, not yet) For this reason we define rolebindings primarily by the client's function. A client meant to be a consumer will have `consumer_for` defined and the topics it can consume from. This will automatically add the correct permissions for the schema registry. You will need to add a `group:` field to add the consumer group permisssion For producers the `producer_for` section works the same way as the consumer You can define a role as `strict: true` if you want to disable writing new schemas in the schema registry. Useful for production systems