Kuberentes
Kubernetes is an open-source system for automating deployment, operations, and scaling of containerized applications. For example, currently our parsers pipeline consists of different components, as described in NOMAD-Base-Layer and also discussed in the section below. Using Kubernetes we can easily deploy and scale each component as per the requirement, and resources availability.
Official user-guide is great place to start to familiarize yourself to the Kubernetes related terms. There is also a good talk by Brendan Burns, giving a technical overview of Kubernetes. Assuming that you are familiar with docker. The important concepts to get started with Kubernetes are pods, replication controller, services, and namespace.
Overview of the parsing pipeline components on the Kubernetes cluster
-
RabbitMQ message broker: It is one of the most central component of the parsing pipeline. RabbitMQ is the message broker that allows different components to communicate with each other. Number of instances needed : 1 (in most cases).
-
Tree-parser-initializer: Initializes the parsing pipeline, creates tree parser request and sends it via rabbitMQ to the tree parser. Currently we run the initializer outside the cluster.
-
Tree-parser: Reads a archive/tree and finds the list of parsable files with corresponding parsers. Creates and sends request to the calculation parser for each main calculation file found in the archive. Number of instances needed : relatively low (1 in enough in many cases).
-
Calculation parser: Responsible for actually parsing. Most computational heavy component. Number of instances needed : multiple if possible.
-
Normalizer: (To be added) Normalizes the output of the calculation parser.
Steps to start parsing using the Kuberentes cluster: 0. Start the Kubernetes cluster. Skip this step if cluster is up and running. On labdev-nomad Kubernetes cluster is already setup. To setup on a new machine, first install docker and then follow the instructions of the official getting started guide of kubernetes. (http://kubernetes.io/docs/getting-started-guides/docker/). We will soon publish scripts to start the Kubernetes cluster.
-
Create the required docker images of the tree parser and calculation parser using following commands in the nomad-lab-base directory (assumes that you have an up to date clone of the nomad-lab-base project).
$ sbt treeparser/docker $ sbt calculationparser/docker
-
Generate the Kubernetes configuration from our templates.
$ sbt kubernetes/run -A
you can pass
--config <config to use, eg. test>
if you want to generate the files for a special configuration. The default is a configuration that uses${USER}-default
as namespace and thus will not clash with other persons. Your outputs will also be in /parsed/${USER}-default.The generated Kubernetes configuration files can be found in the "kubeGen" directory.
-
If there is no stats db start it (this is typically not needed, as the parsing statistics db is shared between all users
-
Start the desired component. Eg use the following command in the kubeGen directory, to start all components: $ kubectl create -f ./namespace.yaml * See below for more information on namespace $ kubectl create -f ./rabbitMq.yaml $ kubectl create -f ./treeParser.yaml $ kubectl create -f ./calculationParser.yaml
-
Initialize the queue using tree-parser-initializer; a. First set env variable: RABBITMQ_SERVICE_HOST and RABBITMQ_SERVICE_PORT i) Run $ kubectl --namespace=default describe svc Get the NodePort and store in a env variable Eg: if "NodePort: main-port 31964/TCP" then set RABBITMQ_SERVICE_PORT=31964 ii) Set RABBITMQ_SERVICE_HOST=127.0.0.1 b. Run the initializer: sbt treeparserinitializer/re-start you might append
--- -Dnomad_lab.configToUse=<config to use, eg est>
to submit things to another configuration
Side notes: If the component with the same name exists, then kubectl will throw error like: "Error from server: error when creating "./namespace.yaml": namespaces "default" already exists" In many cases, it is desirable to re-use certain components like rabbitMQ message broker and namespace. Like if you want to plug in and test a parser in the already running pipeline. But sometimes it is desirable to have separate components just for yourself. In that case,
- You can create a different config subtree in core/src/main/resources/reference.conf and change the configurationToUser variable to point to your configuration and execute the above mentioned steps for starting the cluster. Or,
- Change values directly in the generated .yaml files as per you requirements