Skip to content
Snippets Groups Projects
Commit 56c78e3c authored by Niels Cautaerts's avatar Niels Cautaerts
Browse files

added files

parent e438e27e
Branches
No related tags found
No related merge requests found
tags
.ipynb*
data
# Basics of working with git
Git is a version control system.
By creating "commits" we can save specific versions of files and return to those different points in time whenever we want.
There's a LOT more to git but for our purposes now this is enough.
## Steps
### Creating a git repo
* In the jupyter environment open a Terminal from the New dropdown menu
* In the terminal create a new folder with `mkdir name-of-project`. This folder will be our git repository and we will keep track of the history of everything we put inside.
* Copy the analysis notebook, the binder folder and the docker file inside with the command `cp <what you want to copy> name-of-project`
* From the jupyter file browser, create a file called `README.md` in this folder and add some information about the project. This will be displayed when you upload to github.
* In the terminal change directories into the folder with `cd name-of-project`.
* Initialize a git repository with `git init`
* Initialize your credentials:
```
$ git config --global user.email "YOUR EMAIL"
$ git config --global user.name "YOUR NAME"
```
### Committing files to history
**Important: you should clear all output of notebooks and never commit large data files to git history. Git is made for keeping track of text files (under the hood Jupyter notebooks are still text files)**
* Check the status of files with `git status`
* Add all files to the staging area to be committed to history with `git add .`. If you don't want to add all files you should instead o f `.` use file paths
* Create a commit with `git commit -m "some commit message like git commit"`
* Your files in their current state are now committed to history. If you make changes you can always return to this state.
### Publishing to Github
* First you need to create access credentials. Create an ssh key with the command `ssh-keygen` and press enter until you come to a new prompt. Don't add passwords.
* Now you need to copy the public key to github. Log into your Github account, click on your avatar in the top right, then go to settings, then SSH and GPG keys. Click on new ssh key.
* Go back to the terminal and type `cat ~/.ssh/id_rsa.pub`. Copy the output to the key field on Github, give it a name and add the key.
* Create a new repository on github. You will see instructions there. Add the repository as a remote with `git remote add origin git@github.com...`.
* Push your repository up to github with `git push -u origin master`. Refresh the github page to see your repo is now on github.
* Whenever you create commits locally and push, all those versions will be available on github.
* Ensure you have copied the binder folder into your repository and that it is on Github.
* if not, copy it into the folder, add, commit and push.
* Go to <https://mybinder.org/> and add the link to your repository in the box. copy the markdown link to the badge and copy it into your README.md file.
* add, commit and push. Now you should be able to click the link and build and run the environment.
* check out the docker file
* check out the github actions workflow
* make sure it is available on your github repo. If not, add, commit, push.
* create an account on dockerhub. Go to account settings and create a security access token. Copy it to a safe place, it will only be shown once.
* Add this under your Github repository secrets as `DOCKERHUB_TOKEN`. Also add another secret `DOCKERHUB_USERNAME` to enter your dockerhub username.
* Modify the file `.github/workflows/build_docker_image.yaml` as instructed, adding your username and repository name in the right place.
* add, commit, push. Then create a release on github. See what happens under actions. When the build process is done, you can check your images on dockerhub.
* if you have docker locally installed on your computer you can download and run your image with the following command: `docker run -p 7000:8888 YOURUSERNAME/example_TEM_analysis:latest`. You can then visit `http://localhost:7000` to view the notebook.
%% Cell type:markdown id:8d76e8b8 tags:
# Keeping track of experimental workflows with electronic lab notebook eLabFTW
[eLabFTW](https://www.elabftw.net) is a popular open source electronic lab notebook application that you can use as a powerful journaling application to keep digital records of your lab activities and objects.
I personally use it to document all my sessions at the microscope and create records of samples.
The advantages are:
* Data about experiments available from everywhere on any device
* Create clickable links between different items, for example samples and experiments, or experiments and the data files
* Add pictures, tables, data, drawings, descriptions all in one place
* Can send links to others, or use links to digital records on sample boxes in QR codes
* Python API to programmatically query/update database
We have set up a temporary eLabFTW instance at
#### <https://elabftwdemo.esc.mpcdf.mpg.de/login.php>
Check out the site as an anonymous visitor; the database on this dummy instance is public.
With a real account you could use this interface to add/update database items and experiments.
%% Cell type:markdown id:d1565d7c tags:
### The Python REST API
The real power of eLabFTW is its python API, which allows us to query information from the database and integrate it with other programs, or send information to the database from other applications.
This we will do in this notebook.
First we set up a connection to the API with a token.
I created this token for a dummy user and it has read + write access, but in principle should not be able to delete any of the items already in the database.
%% Cell type:code id:e7e29eb4 tags:
``` python
# setting up the connection to the elab server with the elabapy package
import elabapy
URL = "https://elabftwdemo.esc.mpcdf.mpg.de/"
TOKEN = "028910cbb11c2af9a592ecea958e061589990094a69ffffd0e3dd494e440c017beff7bedafd105e6074c" # this is a read and write token
ENDPOINT = URL+"api/v1/"
manager = elabapy.Manager(endpoint=ENDPOINT, token=TOKEN, verify=True)
```
%% Cell type:markdown id:2b06499a tags:
Here we query all the data from the database
%% Cell type:code id:33354fbd tags:
``` python
print("------------")
print("Experiments:")
print("------------")
for i in manager.get_all_experiments():
print(i["id"], i["title"])
print("------------")
print("Items:")
print("------------")
for i in manager.get_all_items():
print(i["id"], i["title"])
```
%% Cell type:markdown id:87fe0daf tags:
Here we get one specific item out.
The response is a JSON string which captures all the known information about that item in the database.
We are especially interested in the "body" which is the text information we can read, and any uploaded items.
%% Cell type:code id:6db700b2 tags:
``` python
john = manager.get_item(1)
john
```
%% Cell type:markdown id:e523b575 tags:
We can display the body of the item directly in the notebook using IPython's rendering functions. However we do need to replace links in the body (which are relative) to absolute links with the following helper function.
%% Cell type:code id:e103c9a5 tags:
``` python
import re
def render_links(html, url):
"""
Helper function to return a string with absolute links to images, database items and experiments for rendering
the body of a database item in the notebook
"""
pattern_link = r'(<a href=")(.*)&amp;(id=[0-9]+">)'
conversion_link = lambda x: x.group(1) + url + x.group(2) + "&" + x.group(3)
pattern_img = r'(<img src=")(.*)(" \/>)'
conversion_img = lambda x: x.group(1) + url + x.group(2) + x.group(3)
updated_links = re.sub(pattern_link, conversion_link, html)
return re.sub(pattern_img, conversion_img, updated_links)
```
%% Cell type:code id:83aebd03 tags:
``` python
import IPython
```
%% Cell type:code id:b3cdff86 tags:
``` python
IPython.display.HTML(render_links(john["body"], URL))
```
%% Cell type:markdown id:bf10986c tags:
Here is the item corresponding to an experimental sample. The links are clickable and refer us to the right page on the elab website (after we log in).
%% Cell type:code id:9d567371 tags:
``` python
FIB_sample = manager.get_item(4)
IPython.display.HTML(render_links(FIB_sample["body"], URL))
```
%% Cell type:markdown id:4478e7b3 tags:
We can take this a step further and instead of just displaying what is on the page in the notebook, we can try to parse it so a computer might be able to do something with the information.
The body of each item is plain HTML, so it can be parsed with web scraping packages like beautifulsoup.
Let's build a little tool that will extract all the hyperlinks from the text and separate them by section.
%% Cell type:code id:d5ac31af tags:
``` python
from bs4 import BeautifulSoup
```
%% Cell type:code id:5daecd1e tags:
``` python
def parse_body(html, divider="h1", children={"p", "ol", "ul", "h1"}):
"""Returns a dictionary with information divided by heading, links extracted"""
parsed = BeautifulSoup(html, "html.parser")
dictionary = {}
for i in parsed.find_all(divider, recursive=False):
dictionary[i.string] = {}
subdict = dictionary[i.string]
subdict["Contents"] = []
subdict["Links"] = {}
subdict["Image links"] = []
subdict["Linked database items"] = []
subdict["Linked experiments"] = []
k = i
while True:
k = k.find_next_sibling({"p", "ol", "ul", "h1"}, recursive=False)
if k is None or k.name == divider:
break
else:
lnks = k.find_all("a", recursive=True)
for lnk in lnks:
subdict["Links"][lnk["href"]] = lnk.string
db_item = re.compile(r"database\.php\?.*id\=([0-9]+)").search(lnk["href"])
exp_item = re.compile(r"experiments\.php\?.*id\=([0-9]+)").search(lnk["href"])
if db_item:
subdict["Linked database items"].append(int(db_item.groups()[0]))
if exp_item:
subdict["Linked experiments"].append(int(exp_item.groups()[0]))
imlks = k.find_all("img", recursive=True)
for im in imlks:
subdict["Image links"].append(im["src"])
subdict["Contents"].append(k.__repr__())
return dictionary
```
%% Cell type:code id:903b4225 tags:
``` python
def pretty_print_parsed(parsed_dict):
for key, subdict in parsed_dict.items():
print(key)
print("-"*len(key))
care = ["Links", "Image links", "Linked database items", "Linked experiments"]
for j in care:
if subdict[j]:
print("> ", j)
for i in subdict[j]:
print(" > ", i)
```
%% Cell type:code id:e5dc3c38 tags:
``` python
parsed_fib = parse_body(render_links(FIB_sample["body"], URL))
```
%% Cell type:code id:bd58bcfc tags:
``` python
pretty_print_parsed(parsed_fib)
```
%% Cell type:markdown id:487d629c tags:
We can also query the info on the experimental session to get access to the links where the actual data is stored. We can do a manual download or parse the html to get the URL as a string.
%% Cell type:code id:07f9b08c tags:
``` python
experiment = manager.get_experiment(1)
IPython.display.HTML(render_links(experiment["body"], URL))
```
%% Cell type:code id:8c8de798 tags:
``` python
parsed_experiment = parse_body(render_links(experiment["body"], URL))
data_file_link = list(parsed_experiment["Data files"]["Links"].values())[0]
print(data_file_link)
```
%% Cell type:markdown id:8fcd1fbc tags:
Such tools would allow us to visualize all iterrelationships between database items and allow us to write programs to crawl the database.
We will turn our attention to adding things to the database.
### Adding items to the database
Try it yourself, then try to query your item or experiment back.
To check what is possible, see the documentation <https://doc.elabftw.net/api/#api-Entity>
%% Cell type:code id:add9b29d tags:
``` python
response = manager.create_item(1)
print(f"Created item with id {response['id']}.")
```
%% Cell type:code id:9bf80bf8 tags:
``` python
params = { "title": "Database item", "date": "20200504", "body": "Created from the API", "category": "Sample" } # in the "body" entry you can add arbitrary HTML and CSS syntax for formatting
print(manager.post_item(5, params))
```
%% Cell type:code id:4f1aaff1 tags:
``` python
```
# Binder setup for the tutorials in the Bigmax Summer School 2021 # Example reproducible TEM data analysis
## BigMax summer school 2021
This repo defines the binder setup that is to be used in the tutorials. ##### Niels Cautaerts
##### last updated: 9/9/2021
This example walks through some techniques whereby we can improve the reproducibility of our experimental workflow.
We look at the following tools and techniques:
### 1. Electronic lab notebook eLabFTW
Digitizing our experimental workflow and the links between experiments, samples, other lab resources is crucial for being able to trace back our steps from results.
In addition to the web interface, we interact with the eLabFTW through Python in a Jupyter notebook.
### 2. Jupyter notebook based analysis
Jupyter notebooks are interactive worksheets in which we can write code to perform analysis and visualization of data.
Here we demonstrate a short machine learning inspired analysis workflow of a high resolution STEM image.
### 3. Git version control
Jupyter notebooks are already much more reproducible than click based workflows in GUI programs.
However they are prone to frequent updates and changes; how do you ensure everyone is looking at the same notebook?
We use version control to create "save points" for our notebook everyone could go back to.
We will use git to start version controlling our notebook, and publish on Github.
### 4. MyBinder
Even if everyone has the same version of the notebook, the results might not be reproducible because users have different versions of software packages installed on their system.
One simple solution to this problem is to use a service like mybinder, which builds a jupyter environment from a predefined configuration file.
We go over best practices for making such a configuration file.
### 5. Docker
Even if we pin versions of software with MyBinder it doesn't guarantee we will always get exactly the same environment.
For example, the dependencies of the packages you need may not be pinned, and if those get updated things may still break.
To ensure a completely reproducible environment you want to package EVERYTHING (data, jupyter notebook, software) together in a single image.
This can be achieved with Docker.
We write a Dockerfile, which instructs the `docker` software how to build this image.
Redoing the build process with the same dockerfile may produce slightly different images, but the image itself is static and will always work in the same way.
Here we show how we can build a docker image with Github's CI/CD service.
This diff is collapsed.
vim
nano
name: tutorial-env name: example
channels: channels:
- conda-forge - conda-forge
dependencies: dependencies:
- python - hyperspy=1.6.4
- scipy - scikit-learn=0.24.2
- beautifulsoup4 - atomap=0.3.1
- requests - beautifulsoup4=4.10.0
- scikit-learn - elabapy=0.8.2
- scikit-image
- hyperspy
- atomap
- pip
- pip:
- elabapy
tags
.ipynb_checkpoints
.git
name: Build and push docker images to docker hub when we create a new tag
on:
push:
tags:
- '*'
jobs:
buildandpush:
runs-on: ubuntu-latest
steps:
# Change the USERNAME and REPONAME in the first step below!
- name: Get the latest tag
id: release
run: |
echo "::set-output name=releasetag::$(curl -s https://api.github.com/repos/USERNAME/REPONAME/releases/latest | jq '.tag_name' | sed 's/\"//g')"
- name: Checkout the repo on this tag
uses: actions/checkout@v2
with:
ref: ${{ steps.release.outputs.releasetag }}
# hardware emulation for different CPU architectures
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
# build system
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# images are pushed to the latest as well as specific version tag
- name: Build and push
id: docker_build
uses: docker/build-push-action@v2
with:
push: true
tags: |
${{ secrets.DOCKERHUB_USERNAME }}/example_TEM_analysis:latest
${{ secrets.DOCKERHUB_USERNAME }}/example_TEM_analysis:${{ steps.release.outputs.releasetag }}
# start from a base image providing conda
FROM continuumio/miniconda3
# add Tini
ENV TINI_VERSION v0.19.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "--"]
# install the necessary dependencies
RUN conda install -c conda-forge hyperspy=1.6.4 scikit-learn=0.24.2 atomap=0.3.1 beautifulsoup4=4.10.0 elabapy=0.8.2
# copy all the necessary files into the container
RUN mkdir notebook && mkdir notebook/data
WORKDIR notebook/
COPY analysis.ipynb .
RUN wget https://owncloud.gwdg.de/index.php/s/utJfj0388mp8W1S/download -O data/dataset.emd
# command that runs when we spin up the container
CMD ["jupyter", "notebook", "--port=8888", "--no-browser", "--ip=0.0.0.0", "--allow-root"]
name: example
channels:
- conda-forge
dependencies:
- hyperspy=1.6.4
- scikit-learn=0.24.2
- atomap=0.3.1
- beautifulsoup4=4.10.0
- elabapy=0.8.2
%% Cell type:code id: tags:
``` python
import sklearn
```
%% Cell type:code id: tags:
``` python
import matplotlib
```
%% Cell type:code id: tags:
``` python
import numpy
```
%% Cell type:code id: tags:
``` python
import scipy
```
%% Cell type:code id: tags:
``` python
import skimage
```
%% Cell type:code id: tags:
``` python
import hyperspy
```
%% Cell type:code id: tags:
``` python
import atomap
```
%% Cell type:code id: tags:
``` python
import elabapy
```
%% Cell type:code id: tags:
``` python
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment