Commit e3841fc5 authored by davidwaroquiers's avatar davidwaroquiers
Browse files

Part 1 of the tuto.

parent c766ace1
%% Cell type:markdown id:northern-young tags:
# Five-minute quickstart
In this quickstart, you will:
* Add a simple workflow to the central database via the command line
* Run that workflow
* Monitor your job status with the FireWorks database
* Get a flavor of the Python API
## Start FireWorks
A MongoDB database (containing the FireWorks database) is running in your docker.
Reset/Initialize the FireWorks database (the LaunchPad) using the command line:
```lpad reset```
Note: All FireWorks commands come with built-in help. For example, type lpad -h or lpad reset -h. There often exist many different options for each command.
Note2: Resetting the FireWorks removes all your workflows and jobs from your database. During this tutorial, you may use this "often" but when you are in production and actually using FireWorks, you will most likely almost never use this reset.
## Add a Workflow
There are many ways to add Workflows to the database, including a Python API. Let’s start with an extremely simple example that can be added via the command line:
%% Cell type:code id:amended-jimmy tags:
``` python
!lpad add_scripts 'echo "hello"' 'echo "goodbye"' -n hello goodbye -w test_workflow
```
%% Cell type:markdown id:necessary-potato tags:
This added a two-job linear workflow. The first jobs prints hello to the command line, and the second job prints goodbye. We gave names (optional) to each step as “hello” and “goodbye”. We named the workflow overall (optional) as “test_workflow”.
Let’s look at our test workflow:
%% Cell type:code id:opponent-strip tags:
``` python
!lpad get_wflows -n test_workflow -d more
```
%% Cell type:markdown id:deadly-gravity tags:
We get back basic information on our workflows. The second step “goodbye” is waiting for the first one to complete; it is not ready to run because it depends on the first job.
## Run all Workflows
You can run jobs one at a time (“singleshot”) or all at once (“rapidfire”). Let’s run all jobs:
%% Cell type:code id:destroyed-flooring tags:
``` python
!rlaunch rapidfire
```
%% Cell type:markdown id:pressing-spiritual tags:
Clearly, both steps of our workflow ran in the correct order.
Let’s again look at our workflows:
%% Cell type:code id:presidential-macintosh tags:
``` python
!lpad get_wflows -n test_workflow -d more
```
%% Cell type:markdown id:designing-bruce tags:
FireWorks automatically created launcher_ directories for each step in the Workflow and ran them. We see that both steps are complete. Note that there exist options to choose where to run jobs, as well as to tear down empty directories after running jobs.
## Look at the web GUI
If you have a web browser, you can launch the web GUI to see your workflows using ```lpad webgui```. In this tutorial, the web GUI is directly integrated in the jupyter:
## Python code
The following Python code achieves the same behavior:
%% Cell type:code id:black-avatar tags:
``` python
from fireworks import Firework, Workflow, LaunchPad, ScriptTask
from fireworks.core.rocket_launcher import rapidfire
# set up the LaunchPad and reset it
launchpad = LaunchPad.auto_load()
launchpad.reset('', require_password=False)
# create the individual FireWorks and Workflow
fw1 = Firework(ScriptTask.from_str('echo "hello"'), name="hello")
fw2 = Firework(ScriptTask.from_str('echo "goodbye"'), name="goodbye")
wf = Workflow([fw1, fw2], {fw1:fw2}, name="test workflow")
# store workflow and launch it locally
launchpad.add_wf(wf)
rapidfire(launchpad)
```
spec:
_tasks:
- _fw_name: ScriptTask
script: echo "howdy, your job launched successfully!" >> howdy.txt
\ No newline at end of file
%% Cell type:markdown id:political-syria tags:
# Introductory Tutorial
In this tutorial, you will:
* Add a simple workflow to the central database via a file
* Run that workflow in a few modes
* Get a flavor of the Python API
The purpose of this tutorial is to get you set up as quickly as possible; it isn’t intended to demonstrate the features of FireWorks or explain things in great detail. This tutorial can be safely completed from the command line, and requires no programming.
First, reset again the FireWorks database with ```lpad reset```.
%% Cell type:markdown id:conceptual-subdivision tags:
## Add a Firework to the LaunchPad
A Firework contains a list of computing tasks (Firetasks) to be performed. For this tutorial, we will use a Firework that consists of only a single step. We’ll tackle more complex workflows in other tutorials. Our workflow consisting of one Firework and one Firetask thus looks like this:
![firetask](images/single_fw.png)
Let's add the firework to the database of jobs:
%% Cell type:code id:shared-printing tags:
``` python
!lpad add fw_test.yaml
```
%% Cell type:markdown id:conceptual-smart tags:
This command added a simple workflow to the database which was serialized into a file called fw_test.yaml. This workflow is just a single step that print some text to a file. Look inside fw_test.yaml with a text editor to see how that workflow was defined:
%% Cell type:code id:elect-omega tags:
``` python
!cat fw_test.yaml
```
%% Cell type:markdown id:short-liberia tags:
You should have received confirmation that the Firework got added. You can query the database for this Firework as follows:
%% Cell type:code id:packed-sound tags:
``` python
!lpad get_fws -i 1 -d all
```
%% Cell type:markdown id:great-divide tags:
This prints, in JSON format, all details of the Firework with fw_id = 1 (the first Firework entered into the database):
Some of the Firework is straightforward, but a few sections deserve further explanation:
The spec of the Firework contains all the information about what job to run and the parameters needed to run it.
Within the spec, the _tasks section tells you what jobs will run. The ScriptTask is a particular type of task that runs commands through the shell. Other sections of the spec can be also be defined, but for now we’ll stick to just _tasks. Later on, we’ll describe how to run multiple _tasks or customized _tasks.
This Firework runs the script echo "howdy, your job launched successfully!" >> howdy.txt", which prints text to a file named howdy.txt.
The state of READY means the Firework is ready to be run.
The name is an optional field that we can set to help query for FireWorks later on. In this case, we did not specify one so a default name was used.
You have now stored a Firework in the LaunchPad, and it’s ready to run!
%% Cell type:markdown id:textile-chinese tags:
## Launch jobs
We can launch jobs using ```rlaunch``` ("Rocket" launch) command:
%% Cell type:code id:committed-cardiff tags:
``` python
!rlaunch singleshot
```
%% Cell type:markdown id:united-banner tags:
This command fetches an available Firework from the FireWorks database and runs it.
Verify that the desired task ran:
%% Cell type:code id:hungry-correlation tags:
``` python
!cat howdy.txt
```
%% Cell type:markdown id:increasing-absence tags:
You should see the text: "howdy, your job launched successfully!"
In addition to howdy.txt, you should also see a file called FW.json. This contains a JSON representation of the Firework that the Rocket ran and can be useful later for tracking down a launch or debugging.
Check the status of your Firework:
%% Cell type:code id:interstate-glossary tags:
``` python
!lpad get_fws -i 1 -d all
```
%% Cell type:markdown id:excellent-moldova tags:
You will now see lots of information about your Rocket launch, such as the time and directory of the launch. A lot of it is probably unclear, but you should notice that the state of the Firework is now COMPLETED.
Try launching another rocket:
%% Cell type:code id:every-setting tags:
``` python
!rlaunch singleshot
```
%% Cell type:markdown id:distinguished-marble tags:
The message "No FireWorks are ready to run and match query!" indicates that it tried to fetch a Firework from the database, but none could be found. Indeed, we had previously run the only Firework that was in the database.
## Launch many Rockets (rapidfire mode)
If you just want to run many jobs on the central server itself, the simplest way is to run the Rocket Launcher in “rapidfire mode”. Let’s try this feature:
Let’s add a Fireworks 3 times:
%% Cell type:code id:limited-stewart tags:
``` python
!lpad add fw_test.yaml
```
%% Cell type:markdown id:breeding-delaware tags:
Confirm that the three Fireworks got added to the database, in addition to the one from before (4 total):
%% Cell type:code id:exceptional-bradley tags:
``` python
!lpad get_fws -d less
```
%% Cell type:markdown id:thousand-guitar tags:
We could also just get information for jobs that are ready to run (our 3 new FireWorks):
%% Cell type:code id:packed-worse tags:
``` python
!lpad get_fws -s READY -d less
```
%% Cell type:markdown id:small-murder tags:
Let’s launch jobs in “rapidfire” mode, which will keep repeating until we run out of Fireworks to run:
%% Cell type:code id:double-contrary tags:
``` python
!rlaunch rapidfire
```
%% Cell type:markdown id:stretch-knife tags:
You should see three directories starting with the tag launcher_. Inside each of these directories, you’ll find the results of one of your FireWorks (a file named howdy.txt):
%% Cell type:code id:documentary-acrobat tags:
``` python
!cat launch*/howdy.txt
```
%% Cell type:markdown id:received-immune tags:
## Running FireWorks automatically
We can set our Launcher to continuously look for new FireWorks to run. Let’s try this feature.
Start the Launcher in a terminal so that it looks for new FireWorks every 10 seconds:
```rlaunch rapidfire --nlaunches infinite --sleep 10```
Let’s insert two FireWorks:
%% Cell type:code id:generic-istanbul tags:
``` python
!lpad add fw_test.yaml
```
%% Cell type:markdown id:beneficial-methodology tags:
After a few seconds, the Rocket Launcher should have picked up the new jobs and run them. Confirm this is the case:
%% Cell type:code id:square-telling tags:
``` python
!cat launch*/howdy.txt
```
%% Cell type:markdown id:unnecessary-projector tags:
You should see the outputs for each Firework we inserted.
You can continue adding FireWorks as desired; the Launcher will run them automatically and create a new directory for each job.
As with all FireWorks scripts, you can run the built-in help for more information:
```rlaunch -h
rlaunch singleshot -h
rlaunch rapidfire -h```
%% Cell type:markdown id:wanted-tourist tags:
## What just happened?
It’s important to understand that when you add a Firework to the LaunchPad using the lpad script, the job just sits in the database and waits. The LaunchPad does not submit jobs to a computing resource when a new Firework is added to the LaunchPad. Rather, a computing resource must request a computing task by running the Launcher.
By running the Launcher from different locations, you can have different computing resources run your jobs. Using rapidfire mode is a convenient way of requesting multiple jobs using a single command.
## Python Examples
While it’s possible to work operate FireWorks using YAML or JSON files, a much cleaner mode of operation is to use Python scripts. For example, here is a runnable script that creates our LaunchPad, defines our test Workflow, and runs it:
%% Cell type:code id:sunrise-lecture tags:
``` python
from fireworks import Firework, LaunchPad, ScriptTask
from fireworks.core.rocket_launcher import launch_rocket
# set up the LaunchPad and reset it
launchpad = LaunchPad.auto_load()
launchpad.reset('', require_password=False)
# create the Firework consisting of a single task
firetask = ScriptTask.from_str('echo "howdy, your job launched successfully!"')
firework = Firework(firetask)
# store workflow and launch it locally
launchpad.add_wf(firework)
launch_rocket(launchpad)
```
%% Cell type:markdown id:temporal-venice tags:
## Summary
At this point, you’ve successfully stored a simple job in a database and run it later on command. You even executed multiple jobs with a single command: rlaunch rapidfire, and looked for new jobs automatically using the infinite mode. This should give a basic feeling of how you can automate many jobs using FireWorks.
However, we still haven’t covered many important topics. For example, we have not executed complex workflows (and in particular materials science workflows), run arbitrary Python code, or run jobs on different types of computing resources.
#!/bin/bash
#SBATCH --partition=debug
echo "I will sleep a bit"
echo "..."
sleep 10
echo "Now I am ready to start"
%% Cell type:markdown id:twelve-medication tags:
# Launch Rockets through a queue
## SLURM
For this tutorial, SLURM has been installed and configured in the docker. You can run standard SLURM commands:
We can submit a job to the queue, e.g. the following job.sh sleep job
%% Cell type:code id:raised-ballet tags:
``` python
!squeue
```
%% Cell type:markdown id:bizarre-genealogy tags:
We can submit a job to the queue, e.g. the following job.sh sleep job:
%% Cell type:code id:chief-protein tags:
``` python
!cat job.sh
```
%% Cell type:code id:sixth-murray tags:
``` python
!sbatch job.sh
```
%% Cell type:markdown id:naughty-anchor tags:
## A few explanations
The simplest way to execute jobs through a queue would be to write a templated queue file and then submit it as a two-task Firework, as in the Firetask tutorial. However, FireWorks then considers your “job” to only be queue submission, and will consider the job completed after the queue submission is complete. FireWorks will not know when the actual payload starts running, or is finished, or if the job finishes successfully. Thus, many of the useful management and monitoring features of FireWorks will not be available to you.
A more powerful way to execute jobs through a queue is presented in this tutorial. In this method, the queue file runs rlaunch instead of running your desired program. This method is just like typing rlaunch into a Terminal window like in the core tutorials, except that now we are submitting a queue script that does the typing for us (it’s very low-tech!). In particular, FireWorks is completely unaware that you are running through a queue!
The jobs we will submit to the queue are basically placeholder jobs that are asleep until the job starts running. When the job is actually assigned computer resources and runs, the script “wakes” up and runs the Rocket Launcher, which then figures out what Firework to run.
The advantage of this low-tech system is that it is quite durable; if your queue system goes down or you delete a job from the queue, there are zero repercussions. You don’t have to tell FireWorks to run those jobs somewhere else, because FireWorks never knew about your queue in the first place. In addition, if you are running on multiple machines and the queue becomes backlogged on one of them, it does not matter at all. Your submission job stuck in the queue is not preventing high-priority jobs from running on other machines.
## Launch jobs
This submission procedure is already configured in this tutorial and you just need to issue the ```qlaunch``` command.
Let’s reset our database and add a new Firework:
%% Cell type:code id:demographic-pitch tags:
``` python
!lpad add_scripts 'echo "hello" > hello_goodbye.txt; sleep 20; echo "goodbye" >> hello_goodbye.txt' -n hello
```
%% Cell type:markdown id:essential-blogger tags:
## Submit a job
Use the ```qlaunch``` command to submit a job:
%% Cell type:code id:double-block tags:
``` python
!qlaunch singleshot
```
%% Cell type:markdown id:modified-contractor tags:
This should have submitted a job to the queue in the current directory. You can read the log files in the logging directory, and/or check the status of your queue to ensure your job appeared.
After your queue manager runs your job, you should see the file hello_goodbye.txt in the current directory.
## Submitting many jobs using rapid-fire mode
While launching a single job to a queue is nice, a more powerful use case is to submit a large number of jobs at once, or to maintain a certain number of jobs in the queue. Like the Rocket Launcher, the Queue Launcher can be run in a “rapid-fire” mode that provides these features.
Let’s reset our database and add three new FireWorks:
%% Cell type:code id:noted-globe tags:
``` python
!lpad add_scripts 'echo "hello" > hello_goodbye.txt; sleep 5; echo "goodbye" >> hello_goodbye.txt' -n hello
```
%% Cell type:markdown id:suitable-concentration tags:
Submit several jobs with a single command:
%% Cell type:code id:assumed-vehicle tags:
``` python
!qlaunch rapidfire -m 3
```
%% Cell type:markdown id:verbal-ending tags:
Note: The Queue Launcher sleeps between each job submission to give time for the queue manager to ‘breathe’. It might take a few minutes to submit all the jobs.
Note2: The command above submits jobs until you have at most 3 jobs in the queue under your username. If you had some jobs existing in the queue before running this command, you might need to increase the -m parameter.
The rapid-fire command should have created a directory beginning with the tag block_. Navigate inside this directory, and confirm that three directories starting with the tag launch were created. The launch directories contain your individual jobs.
There are other options to submit jobs to a queue, submitting multiple fireworks in the same queue job. See the FireWorks documentation.
#!/usr/bin/env python
from fireworks.core.firework import FWAction, FiretaskBase
__author__ = 'Anubhav Jain'
__copyright__ = 'Copyright 2013, The Materials Project'
__version__ = '0.1'
__maintainer__ = 'Anubhav Jain'
__email__ = 'ajain@lbl.gov'
__date__ = 'Feb 17, 2013'
class AdditionTask(FiretaskBase):
_fw_name = "Addition Task"
def run_task(self, fw_spec):
input_array = fw_spec['input_array']
m_sum = sum(input_array)
print("The sum of {} is: {}".format(input_array, m_sum))
return FWAction(stored_data={'sum': m_sum},
mod_spec=[{'_push': {'input_array': m_sum}}])
%% Cell type:markdown id:flexible-meditation tags:
# Defining Jobs using Firetasks
This tutorial shows you how to:
* Run multiple tasks within a single Firework
* Run tasks that are defined within a Python function, rather than a shell script
## Introduction to Firetasks
In the Introductory tutorial, we ran a simple script that performed ```echo "howdy, your job launched successfully!" >> howdy.txt"```. Looking inside fw_test.yaml, recall that the command was defined within a task labeled ScriptTask:
```spec:
_tasks:
- _fw_name: ScriptTask
script: echo "howdy, your job launched successfully!" >> howdy.txt
```
The ScriptTask is one type of Firetask, which is a predefined job template written in Python. The ScriptTask in particular refers Python code inside FireWorks that runs an arbitrary shell script that you give it. You can use the ScriptTask to run almost any job (without worrying that it’s all done within a Python layer). However, you might want to set up jobs that are more powerful than shell scripts using Python programming. Later in this section, we’ll demonstrate how to accomplish this with custom Firetasks. However, first we’ll demonstrate the simplest version to linearly run multiple tasks.
## Running multiple Firetasks
You can run multiple tasks within the same Firework. For example, the first step of your Firework might write an input file that the second step reads and processes. Finally, a third step might move the entire output directory somewhere else on your filesystem (or a remote server, or insert results in a database).
Let’s create a Firework that:
* Writes an input file based on a template with some substitutions applied. We’ll do this using a built-in TemplateWriterTask that can help create such files.
* Executes a script using ScriptTask that reads the input file and produces some output. In our test case, it will just count the number of words in that file. However, this code could be any program, for example a chemistry code.
* Copies all your outputs to your home directory using FileTransferTask.
The three-step Firework thus looks like this:
![firetask](images/templatetask.png)
Let's create our three-step Firework with python:
%% Cell type:code id:processed-montreal tags:
``` python
from fireworks import Firework, FWorker, LaunchPad, ScriptTask, TemplateWriterTask, FileTransferTask
from fireworks.core.rocket_launcher import launch_rocket
# set up the LaunchPad and reset it
launchpad = LaunchPad.auto_load()
launchpad.reset('', require_password=False)
# create the Firework consisting of multiple tasks
firetask1 = TemplateWriterTask({'context': {'opt1': 5.0, 'opt2': 'fast method'}, 'template_file': 'simple_template.txt', 'output_file': 'inputs.txt'})
firetask2 = ScriptTask.from_str('wc -w < inputs.txt > words.txt')
firetask3 = FileTransferTask({'files': [{'src': 'words.txt', 'dest': '~/words.txt'}], 'mode': 'copy'})
fw = Firework([firetask1, firetask2, firetask3])
# store workflow and launch it locally, single shot
launchpad.add_wf(fw)
```
%% Cell type:markdown id:published-accounting tags:
Let's play around in the terminal with ```lpad``` to look at what is in the database and then submit our job using ```qlaunch```.
After having run this firework, you should see two files written out to the system, inputs.txt and words.txt, confirming that you successfully ran the first two steps of your job! You can also navigate to your home directory and look for words.txt to make sure the third step also got completed correctly.
%% Cell type:markdown id:modern-morgan tags:
## Creating a custom Firetask
The TemplateWriterTask, ScriptTask, FileTransferTask are built-into FireWorks and can be used to perform useful operations. In fact, they might be all you need! In particular, because the ScriptTask can run arbitrary shell scripts, it can in theory run any type of computation and is an ‘all-encompassing’ Firetask. ScriptTask also has many additional features (see Fireworks documentation).
However, if you are comfortable with some basic Python, you can define your own custom Firetasks for the codes you run. A custom Firetask gives you more control over your jobs, clarifies the usage of your code, and guards against unintended behavior by restricting the commands that can be executed.