========= Configure ========= configuration yaml ------------------ You need to provide a `config.yaml` file so that `kafkalo` will know how to connect and authenticate with your Kafka infrastrucure and related components (Schema registry, RBAC metadata server). Look at sample config file included. Config yaml structure: .. code-block:: YAML --- # Configs to reach out all API endpoints connections: kafka: # The bootstrap brokers to make initial connection to. A list bootstrapBrokers: ["localhost:9093"] ssl: # Enable or disable SSL enabled: false # Path to a CA file to add to the truststore. caPath: "/home/user/ca.crt" kerberos: # Enable/Disable Kerberos authentication enabled: false # Service name of Kafka (typically 'kafka') serviceName: "kafka" # Realm to use. realm: "EXAMPLE.COM" # Username to use (username is appended to realm to form principal) username: "trololol" # Password. Only if password authentication is used password: "password!" # path to keytab file to use for authentication. # AFAIK sarama does not support using a ticket from the credential cache keytab: "path_to.key" # Will disable user auth if specified schemaregistry: # The URL of the schema registry url: "http://localhost:8081" # Timeout for REST calls made to schema registry in seconds # Defaults to 5. If you have many subjects you may want to increase this timeout: 10 # Username to use to authenticate username: "username" # Password to use to authenticate password: "password" # Path a CA file to add to trust store caPath: "/home/user/ca.crt" # When you set skipRegistryForReads to true, it will read the _schemas topic directly and build an internal representation of schemas/subjects and configs. It will then use that in-memory cache for queries that would other go to Schema registry REST API. Mutating requests still go to Schema registry REST endpoint. # This can provide a huge speed benefit when there are many subjects/schemas. skipRegistryForReads: false mds: # URL to Confluent Metadata Service (MDS) url: "http://localhost:8090" # Username to authenticate as username: "username" # Password to use for MDS password: "password" # Schema registry cluster-id schema-registry-cluster-id: "schemaregistry" # Connect Cluster ID (if Connect rolebindings are needed) connect-cluster-id: "connect-cluster" # KSQL cluste id (if ksql rolebindings are needed) ksql-cluster-id: "ksql-cluster" # Path a CA file to add to trust store caPath: "/home/user/ca.crt" # App specific configs kafkalo: input_dirs: - "data/*" - "data/team2.yaml" # This path will be prepended to Schema paths that are not absolute. # If a schema: "somedir/schema.json" is defined, it will be treated as: # "data/somedir/schema.json" schema_dir: "data/" You can add input dirs with glob patterns to let kafkalo know where to find your YAML definitions. Kafkalo will read all the input YAMLs, merge then into a single internal data structure and try to sync them. Bypassing schema registry for speed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you have many subjects registered, then making REST API calls to schema registry to identify if a schema is already registered, and if the Compatibility matches what is requested, can results in thousands of queries and can take a long time. There is an option `skipRegistryForReads` that can be set to true, which will consume the `_schemas` topic directly. It will then construct an in-memory cache of the Schema registry data (schemas, subjects, compatibility) and use that instead of asking the Schema registry. Mutating operations will still go through the REST API. Note that this can, in theory, result in discrepancies with how the schema registry handles things (especially canonicalizing and comparing schemas). So please use carefully and only if needed. encryption ~~~~~~~~~~ `gafkalo` will automatically try to decrypt the config file with sops_. If there no sops metadata in the yaml it will read it as plaintext, otherwise it will attempt to decrypt. Refer to sops_ for further configuration. sops_ is bundled as a library and there is no need to have the sops binary in the path. .. _sops: https://github.com/mozilla/sops input yaml ---------- Kafkalo will read YAML input file and apply the definitions to the Kafka brokers, Schema registry and Metadata service (Confluent RBAC). A sample YAML file is as follows: .. code-block:: YAML topics: - name: SKATA.VROMIA.POLY partitions: 6 replication_factor: 1 # Any topic configs can be added to this key configs: cleanup.policy: delete min.insync.replicas: 1 retention.ms: 10000000 key: # Lookup is relative to file schema: "schema-key.json" compatibility: BACKWARD value: schema: "schema.json" compatibility: NONE - name: SKATA.VROMIA.LIGO partitions: 6 replication_factor: 3 configs: cleanup.policy: delete min.insync.replicas: 1 key: schema: "schema-key.json" - name: SKATA1 partitions: 1 replication_factor: 1 - name: SKATA2 partitions: 1 replication_factor: 1 - name: SKATA3 partitions: 1 replication_factor: 1 - name: SKATA4 partitions: 1 replication_factor: 1 - name: SKATA5 partitions: 1 replication_factor: 1 - name: SKATA6 partitions: 1 replication_factor: 1 - name: SKATA7 partitions: 1 replication_factor: 1 # Clients configures the RBAC (Confluent MDS) clients: # principals must be in the form User:name or Group:name # For each principal you can have a consumer_for, producer_for or resourceowner_for # and the topics for each of these categories - principal: User:poutanaola consumer_for: # By default we will use PREFIXED # set isLiteral: true to set it to LITERAL - topic: TOPIC1. - topic: TOPIC2. isLiteral: true producer_for: - topic: TOPIC1. resourceowner_for: - topic: TOPIC4. - principal: Group:malakes consumer_for: - topic: TOPIC1. - topic: TOPIC2. producer_for: - topic: TOPIC1. - principal: User:produser producer_for: - topic: TOPIC1. # Strict mode is mean for production. # It will make the producer able to write the topics but read-only # access to the schema registry strict: false # Alllow this principal access to the following consumer groups. # roles can be defined but defaults to DeveloperRead groups: - name: consumer-produser- - name: consumer-produser-owner- # if not specified, roles is [DeveloperRead] roles: ["ResourceOwner"] # prefixed is true by default but can be disabled like below prefixed: false topics ~~~~~~ For each topic under the `topics:` key define the name and the required parameters. The `configs:` sections is optional and defaults for the cluster will be used. clients ~~~~~~~~~ This tools is meant to make common tasks easy, not to make anything possible (at least, not yet) For this reason we define rolebindings primarily by the client's function. A client meant to be a consumer will have `consumer_for` defined and the topics it can consume from. This will automatically add the correct permissions for the schema registry. You will need to add a `group:` field to add the consumer group permisssion For producers the `producer_for` section works the same way as the consumer You can define a role as `strict: true` if you want to disable writing new schemas in the schema registry. Useful for production systems