diff --git a/docs/develop/gitlab.md b/docs/develop/gitlab.md new file mode 100644 index 0000000000000000000000000000000000000000..c2d904422c75c7b5b601f954dc5a26f0dfbda047 --- /dev/null +++ b/docs/develop/gitlab.md @@ -0,0 +1,157 @@ +--- +title: Using Git/GitLab +--- + +## Branches + +We use *protected* and *feature* branches. You must not (even if you have +the rights) commit to it directly. + +- `master`, a *protected* branch and also the *default* branch. Represents the latest stable +release (i.e. what the current official NOMAD runs on). +- `develop`, a quasi *protected* branch. It contains the latest, potentially +unreleased, features and fixes. +- *feature branches*, this is where you work. Typically they are automatically +named after issues: `<issue-number>-<issue-title>`. +- `vX.X.X`, tags for releases. + + +## Flow: From issue to merge + +### Create issues + +Everyone with an MPCDF GitLab account can create issues. + +- Use descriptive short titles. Specific words over general works. Try to describe +the problem and do not assume causes. +- You do not need to greet us or say thx and goodbye. Let's keep it purely technically. +- For bugs: think about how to re-produce the problem and provide examples when applicable. +- For features: in addition to the feature descriptive, try to provide a use-case that +might help us understand the feature and its scope. +- You can label the issue if you know the label system. + +### Labeling issues + +There are three main categories for labels. Ideally each issue gets one of each category: + +- The *state* labels (grey). Those are used to manage when and how the issue is addressed. +This should be given the NOMAD team member who is currently responsible to moderate the +development. If its a simple fix and you want to do it yourself, assign yourself and use +"bugfixes". + +- The *component* label (purple). This denote the part of the NOMAD software that is +most likely effected. Multiple purple labels are possible. + +- The *kind* label (key). Wether this is a bug, feature, refactoring, or documentation issue. + +Unlabeled issues will get labeled by the NOMAD team as soon as possible. + + +### Working on an issue + +Within the development team and during development meetings we decide who is acting on +an issue. The person is assigned to the issue on GitLab. It typically switches its grey *state* label +from *backlog* to *current* or *bugfixes*. The assigned person is responsible to further +discuss the problem with the issue author and involve more people if necessary. + +To contribute code changes, you should create a branch and merge request (MR) from the +GitLab issue via the button offered by GitLab. This will create a branch of *develop* and +create a merge request that targets *develop*. + +You can work on this branch and push as often as you like. Each push will cause a pipeline +to run. The solution can be discussed in the merge request. + + +### Code review and merge + +When you are satisfied with your solution, and your CI/CD pipeline passes you can mark your MR as *ready*. +To review GUI changes, you should deploy your branch to the dev-cluster via CI/CD actions. +Find someone on the NOMAD developer team to review your MR and request a review through +GitLab. The review should be performed shortly and should not stall the MR longer than +two full work days. + +The reviewer will open *threads* that need to be solved by the MR author. If all +threads are resolved, you can re-request a review. The reviewer should eventually merge +the MR. Typically we squash MRs to keep the revision history short. +This will typically auto-close the issue. + +## Clean version history + +It is often necessary to consider code history to reason about potential problems in +our code. This can only be done, if we keep a "clean" history. + +- Use descriptive commit messages. Use simple verbs (*added*, *removed*, *refactored*, etc.) +name features and changed components. [Include issue numbers](https://docs.gitlab.com/ee/user/project/issues/crosslinking_issues.html) +to create links in gitlab. + +- Learn how to amend to avoid lists of small related commits. + +- Learn how to rebase. Only merging feature-branches should create merge commits. + +- Squash commits when merging. + +- Some videos on more advanced git usage: https://youtube.be/Uszj_k0DGsg, https://youtu.be/qsTthZi23VE + +### amend +While working on a feature, there are certain practices that will help us to create +a clean history with coherent commits, where each commit stands on its own. + +```sh + git commit --amend +``` + +If you committed something to your own feature branch and then realize by CI that you have +some tiny error in it that you need to fix, try to amend this fix to the last commit. +This will avoid unnecessary tiny commits and foster more coherent single commits. With *amend* +you are basically adding changes to the last commit, i.e. editing the last commit. If +you push, you need to force it `git push origin feature-branch --force-with-lease`. So be careful, and +only use this on your own branches. + +### rebase +```sh + git rebase <version-branch> +``` + +Lets assume you work on a bigger feature that takes more time. You might want to merge +the version branch into your feature branch from time to time to get the recent changes. +In these cases, use rebase and not merge. Rebase puts your branch commits in front of the +merged commits instead of creating a new commit with two ancestors. It basically moves the +point where you initially branched away from the version branch to the current position in +the version branch. This will avoid merges, merge commits, and generally leave us with a +more consistent history. You can also rebase before creating a merge request, which basically +allows no-op merges. Ideally the only real merges that we ever have, are between +version branches. + + +### squash +```sh + git merge --squash <other-branch> +``` + +When you need multiple branches to implement a feature and merge between them, try to +use *squash*. Squashing basically puts all commits of the merged branch into a single commit. +It basically allows you to have many commits and then squash them into one. This is useful +if these commits were made just to synchronize between workstations, due to +unexpected errors in CI/CD, because you needed a save point, etc. Again the goal is to have +coherent commits, where each commits makes sense on its own. + + +## Submodules + + +The main NOMAD GitLab-project (`nomad-fair`) uses Git-submodules to maintain its +parsers and other dependencies. All these submodules are places in the `/dependencies` +directory. There are helper scripts to install (`./dependencies.sh`) and +commit changes to all submodules (`./dependencies-git.sh`). After merging or checking out, +you have to make sure that the modules are updated to not accidentally commit old +submodule commits again. Usually you do the following to check if you really have a +clean working directory. + +```sh + git checkout something-with-changes + git submodule update --init --recursive + git status +``` + +We typically use the `master`/`main` branch on all dependencies. Of course feature branches +can be used on dependencies to manage work in progress. diff --git a/docs/develop/guides.md b/docs/develop/guides.md new file mode 100644 index 0000000000000000000000000000000000000000..b45e93e80e398767f71a6f5c80377303a01078e0 --- /dev/null +++ b/docs/develop/guides.md @@ -0,0 +1,194 @@ +--- +title: Code guidelines +--- + +NOMAD has a long history and many people are involved in its development. These +guidelines are set out to keep the code quality high and consistent. Please read +them carefully. + +## Principles and rules + +- simple first, complicated only when necessary +- search and adopt generic established 3rd party solutions before implementing specific solutions +- only uni directional dependencies between components/modules, no circles +- only one language: Python (except, GUI of course) + +The are some *rules* or better strong *guidelines* for writing code. The following +applies to all python code (and were applicable, also to JS and other code): + +- Use an IDE (e.g. [vscode](https://code.visualstudio.com/) or otherwise automatically + enforce code ([formatting and linting](https://code.visualstudio.com/docs/python/linting)). + Use `nomad qa` before committing. This will run all tests, static type checks, linting, etc. + +- There is a style guide to python. Write [pep-8](https://www.python.org/dev/peps/pep-0008/) + compliant python code. An exception is the line cap at 79, which can be broken but keep it 90-ish. + +- Test the public interface of each sub-module (i.e. python file) + +- Be [pythonic](https://docs.python-guide.org/writing/style/) and watch + [this](https://www.youtube.com/watch?v=wf-BqAjZb8M). + +- Add doc-strings to the *public* interface of each sub-module (e.g. python file). Public meaning API that + is exposed to other sub-modules (i.e. other python files). + +- Use google [docstrings](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) and + use Markdown. + +- The project structure is according to [this guide](https://docs.python-guide.org/writing/structure/). + Keep it! + +- Write tests for all contributions. + +- Adopt *Clean Code* practices. Here is a [good introduction](https://youtu.be/7EmboKQH8lM). + + +## Enforcing Rules with CI/CD + +These *guidelines* are partially enforced by CI/CD. As part of CI all tests are run on all +branches; further we run a *linter*, *pep8* checker, and *mypy* (static type checker). You can +run `nomad qa` to run all these tests and checks before committing. + +See [git/gitlab](./gitlab.md) for more details on how to work with issues, branches, merge +requests, and CI/CD. + +## Names and identifiers + +There is a certain terminology consistently used in this documentation and the source +code. Use this terminology for identifiers. + +Do not use abbreviations. There are (few) exceptions: `proc` (processing); `exc`, `e` (exception); +`calc` (calculation), `repo` (repository), `utils` (utilities), and `aux` (auxiliary). +Other exceptions are `f` for file-like streams and `i` for index running variables. +Btw., the latter is almost never necessary in python. + +Terms: + +- upload: A logical unit that comprises a collection of files uploaded by a user, organized + in a directory structure. +- entry: An archive item, created by parsing a *mainfile*. Each entry belongs to an upload and + is associated with various metadata (an upload may have many entries). +- child entry: Some parsers generate multiple entries - a *main* entry plus some number of + *child* entries. Child entries are identified by the mainfile plus a *mainfile_key* + (string value). +- calculation: denotes the results of a theoretical computation, created by CMS code. + Note that entries do not have to be based on calculations; they can also be based on + experimental results. +- raw file: A user uploaded file, located somewhere in the upload's directory structure. +- mainfile: A raw file identified as parseable, defining an entry of the upload in question. +- aux file: Additional files the user uploaded within an upload. +- entry metadata: Some quantities of an entry that are searchable in NOMAD. +- archive data: The normalized data of an entry in nomad's meta-info-based format. + +Throughout nomad, we use different ids. If something +is called *id*, it is usually a random uuid and has no semantic connection to the entity +it identifies. If something is called a *hash* then it is a hash generated based on the +entity it identifies. This means either the whole thing or just some properties of +this entities. + +- The most common hashes is the `entry_hash` based on mainfile and auxfile contents. +- The `upload_id` is a UUID assigned to the upload on creation. It never changes. +- The `mainfile` is a path within an upload that points to a file identified as parseable. + This also uniquely identifies an entry within the upload. +- The `entry_id` (previously called `calc_id`) uniquely identifies an entry. It is a hash + over the `mainfile` and respective `upload_id`. **NOTE:** For backward compatibility, + `calc_id` is also still supported in the api, but using it is strongly discouraged. +- We often use pairs of `upload_id/entry_id`, which in many contexts allow to resolve an entry-related + file on the filesystem without having to ask a database about it. +- The `pid` or (`coe_calc_id`) is a legacy sequential interger id, previously used to identify + entries. We still store the `pid` on these older entries for historical purposes. +- Calculation `handle` or `handle_id` are created based on those `pid`. + To create hashes we use :py:func:`nomad.utils.hash`. + + +## Logging + +There are three important prerequisites to understand about nomad-FAIRDI's logging: + +- All log entries are recorded in a central elastic search database. To make this database + useful, log entries must be sensible in size, frequence, meaning, level, and logger name. + Therefore, we need to follow some rules when it comes to logging. +- We use an *structured* logging approach. Instead of encoding all kinds of information + in log messages, we use key-value pairs that provide context to a log *event*. In the + end all entries are stored as JSON dictionaries with `@timestamp`, `level`, + `logger_name`, `event` plus custom context data. Keep events very short, most + information goes into the context. +- We use logging to inform about the state of nomad-FAIRDI, not about user + behavior, input, or data. Do not confuse this when determining the log-level for an event. + For example, a user providing an invalid upload file should never be an error. + +Please follow the following rules when logging: + +- If a logger is not already provided, only use + :py:func:`nomad.utils.get_logger` to acquire a new logger. Never use the + build-in logging directly. These logger work like the system loggers, but + allow you to pass keyword arguments with additional context data. See also + the [structlog docs](https://structlog.readthedocs.io/en/stable/). +- In many context, a logger is already provided (e.g. api, processing, parser, normalizer). + This provided logger has already context information bounded. So it is important to + use those instead of acquiring your own loggers. Have a look for methods called + `get_logger` or attributes called `logger`. +- Keep events (what usually is called *message*) very short. Examples are: *file uploaded*, + *extraction failed*, etc. +- Structure the keys for context information. When you analyse logs in ELK, you will + see that the set of all keys over all log entries can be quit large. Structure your + keys to make navigation easier. Use keys like `nomad.proc.parser_version` instead of + `parser_version`. Use module names as prefixes. +- Don't log everything. Try to anticipate, how you would use the logs in case of bugs, + error scenarios, etc. +- Don't log sensitive data. +- Think before logging data (especially dicts, list, numpy arrays, etc.). +- Logs should not be abused as a *printf*-style debugging tool. + +The following keys are used in the final logs that are piped to Logstash. +Notice that the key name is automatically formed by a separate formatter and +may differ from the one used in the actual log call. + +Keys that are autogenerated for all logs: + + - `@timestamp`: Timestamp for the log + - `@version`: Version of the logger + - `host`: The host name from which the log originated + - `path`: Path of the module from which the log was created + - `tags`: Tags for this log + - `type`: The *message_type* as set in the LogstashFormatter + - `level`: The log level: `DEBUG`, `INFO`, `WARNING`, `ERROR` + - `logger_name`: Name of the logger + - `nomad.service`: The service name as configured in `config.py` + - `nomad.release`: The release name as configured in `config.py` + +Keys that are present for events related to processing an entry: + + - `nomad.upload_id`: The id of the currently processed upload + - `nomad.entry_id`: The id of the currently processed entry + - `nomad.mainfile`: The mainfile of the currently processed entry + +Keys that are present for events related to exceptions: + + - `exc_info`: Stores the full python exception that was encountered. All + uncaught exceptions will be stored automatically here. + - `digest`: If an exception was raised, the last 256 characters of the message + are stored automatically into this key. If you wish to search for exceptions + in Kibana, you will want to use this value as it will be indexed unlike the + full exception object. + + +## Copyright Notices + +We follow this [recommendation](https://www.linuxfoundation.org/blog/2020/01/copyright-notices-in-open-source-software-projects/) +of the Linux Foundation for the copyright notice that is placed on top of each source +code file. + +It is intended to provide a broad generic statement that allows all authors/contributors +of the NOMAD project to claim their copyright, independent of their organization or +individual ownership. + +You can simply copy the notice from another file. From time to time we can use a tool +like [licenseheaders](https://pypi.org/project/licenseheaders/) to ensure correct +notices. In addition we keep an purely informative AUTHORS file. + + +## Git submodules and other "in-house" dependencies + +As the NOMAD eco-systems grows, you might develop libraries that are used by NOMAD instead +of being part of its main code-base. The same guide-lines should apply. You can +use git-hub actions if you library is hosted on github to ensure automated linting and tests. diff --git a/docs/normalizers.md b/docs/develop/normalizers.md similarity index 100% rename from docs/normalizers.md rename to docs/develop/normalizers.md diff --git a/docs/parser.md b/docs/develop/parser.md similarity index 100% rename from docs/parser.md rename to docs/develop/parser.md diff --git a/docs/search.md b/docs/develop/search.md similarity index 100% rename from docs/search.md rename to docs/develop/search.md diff --git a/docs/developers.md b/docs/develop/setup.md similarity index 56% rename from docs/developers.md rename to docs/develop/setup.md index c1ecfe47c60b76ea22069a1852f1b634d0fb67da..5e214a3a8f2448201c84c04c1972632df897a81f 100644 --- a/docs/developers.md +++ b/docs/develop/setup.md @@ -1,7 +1,15 @@ --- title: Getting started --- -# Getting started with developing NOMAD +# Setup a dev environment, run the app, and run test + +This is a step-by-step guide to get started with NOMAD development. You will clone +all sources, set-up a *Python* and *node* environment, install all necessary dependency, +run the infrastructure in development mode, learn to run out test-suites, and setup-up +*Visual Studio Code* for NOMAD development. + +This is not about working with the NOMAD Python package. You can find the `nomad-lab` +documentation [here](../pythonlib.md). ## Clone the sources If not already done, you should clone nomad. If you have a gitlab@MPCDF account, you can clone with git URL: @@ -521,298 +529,4 @@ this folder into vscode extensions folder `~/.vscode/extensions/` or create an i vsce package ``` -then install the extension by drag the file `nomad-0.0.x.vsix` and drop it into the extension panel of the vscode. - -## Code guidelines - -### Principles and rules - -- simple first, complicated only when necessary -- adopting generic established 3rd party solutions before implementing specific solutions -- only uni directional dependencies between components/modules, no circles -- only one language: Python (except, GUI of course) - -The are some *rules* or better strong *guidelines* for writing code. The following -applies to all python code (and were applicable, also to JS and other code): - -- Use an IDE (e.g. [vscode](https://code.visualstudio.com/) or otherwise automatically - enforce code ([formatting and linting](https://code.visualstudio.com/docs/python/linting)). - Use `nomad qa` before committing. This will run all tests, static type checks, linting, etc. - -- There is a style guide to python. Write [pep-8](https://www.python.org/dev/peps/pep-0008/) - compliant python code. An exception is the line cap at 79, which can be broken but keep it 90-ish. - -- Test the public API of each sub-module (i.e. python file) - -- Be [pythonic](https://docs.python-guide.org/writing/style/) and watch - [this](https://www.youtube.com/watch?v=wf-BqAjZb8M). - -- Document any *public* API of each sub-module (e.g. python file). Public meaning API that - is exposed to other sub-modules (i.e. other python files). - -- Use google [docstrings](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html). - -- Add your doc-strings to the sphinx documentation in `docs`. Use .md, follow the example. - Markdown in sphinx is supported via [recommonmark](https://recommonmark.readthedocs.io/en/latest/index.html#autostructify) - and [AutoStructify](http://recommonmark.readthedocs.io/en/latest/auto_structify.html) - -- The project structure is according to [this guide](https://docs.python-guide.org/writing/structure/). - Keep it! - -- Write tests for all contributions. - - -### Enforcing Rules with CI/CD - -These *guidelines* are partially enforced by CI/CD. As part of CI all tests are run on all -branches; further we run a *linter*, *pep8* checker, and *mypy* (static type checker). You can -run `nomad qa` to run all these tests and checks before committing. - -The CI/CD will run on all refs that do not start with `dev-`. The CI/CD will -not release or deploy anything automatically, but it can be manually triggered after the -build and test stage completed successfully. - -### Names and identifiers - -There is a certain terminology consistently used in this documentation and the source -code. Use this terminology for identifiers. - -Do not use abbreviations. There are (few) exceptions: `proc` (processing); `exc`, `e` (exception); -`calc` (calculation), `repo` (repository), `utils` (utilities), and `aux` (auxiliary). -Other exceptions are `f` for file-like streams and `i` for index running variables. -Btw., the latter is almost never necessary in python. - -Terms: - -- upload: A logical unit that comprises a collection of files uploaded by a user, organized - in a directory structure. -- entry: An archive item, created by parsing a *mainfile*. Each entry belongs to an upload and - is associated with various metadata (an upload may have many entries). -- child entry: Some parsers generate multiple entries - a *main* entry plus some number of - *child* entries. Child entries are identified by the mainfile plus a *mainfile_key* - (string value). -- calculation: denotes the results of a theoretical computation, created by CMS code. - Note that entries do not have to be based on calculations; they can also be based on - experimental results. -- raw file: A user uploaded file, located somewhere in the upload's directory structure. -- mainfile: A raw file identified as parseable, defining an entry of the upload in question. -- aux file: Additional files the user uploaded within an upload. -- entry metadata: Some quantities of an entry that are searchable in NOMAD. -- archive data: The normalized data of an entry in nomad's meta-info-based format. - -Throughout nomad, we use different ids. If something -is called *id*, it is usually a random uuid and has no semantic connection to the entity -it identifies. If something is called a *hash* then it is a hash generated based on the -entity it identifies. This means either the whole thing or just some properties of -this entities. - -- The most common hashes is the `entry_hash` based on mainfile and auxfile contents. -- The `upload_id` is a UUID assigned to the upload on creation. It never changes. -- The `mainfile` is a path within an upload that points to a file identified as parseable. - This also uniquely identifies an entry within the upload. -- The `entry_id` (previously called `calc_id`) uniquely identifies an entry. It is a hash - over the `mainfile` and respective `upload_id`. **NOTE:** For backward compatibility, - `calc_id` is also still supported in the api, but using it is strongly discouraged. -- We often use pairs of `upload_id/entry_id`, which in many contexts allow to resolve an entry-related - file on the filesystem without having to ask a database about it. -- The `pid` or (`coe_calc_id`) is a legacy sequential interger id, previously used to identify - entries. We still store the `pid` on these older entries for historical purposes. -- Calculation `handle` or `handle_id` are created based on those `pid`. - To create hashes we use :py:func:`nomad.utils.hash`. - - -### Logging - -There are three important prerequisites to understand about nomad-FAIRDI's logging: - -- All log entries are recorded in a central elastic search database. To make this database - useful, log entries must be sensible in size, frequence, meaning, level, and logger name. - Therefore, we need to follow some rules when it comes to logging. -- We use an *structured* logging approach. Instead of encoding all kinds of information - in log messages, we use key-value pairs that provide context to a log *event*. In the - end all entries are stored as JSON dictionaries with `@timestamp`, `level`, - `logger_name`, `event` plus custom context data. Keep events very short, most - information goes into the context. -- We use logging to inform about the state of nomad-FAIRDI, not about user - behavior, input, or data. Do not confuse this when determining the log-level for an event. - For example, a user providing an invalid upload file should never be an error. - -Please follow the following rules when logging: - -- If a logger is not already provided, only use - :py:func:`nomad.utils.get_logger` to acquire a new logger. Never use the - build-in logging directly. These logger work like the system loggers, but - allow you to pass keyword arguments with additional context data. See also - the [structlog docs](https://structlog.readthedocs.io/en/stable/). -- In many context, a logger is already provided (e.g. api, processing, parser, normalizer). - This provided logger has already context information bounded. So it is important to - use those instead of acquiring your own loggers. Have a look for methods called - `get_logger` or attributes called `logger`. -- Keep events (what usually is called *message*) very short. Examples are: *file uploaded*, - *extraction failed*, etc. -- Structure the keys for context information. When you analyse logs in ELK, you will - see that the set of all keys over all log entries can be quit large. Structure your - keys to make navigation easier. Use keys like `nomad.proc.parser_version` instead of - `parser_version`. Use module names as prefixes. -- Don't log everything. Try to anticipate, how you would use the logs in case of bugs, - error scenarios, etc. -- Don't log sensitive data. -- Think before logging data (especially dicts, list, numpy arrays, etc.). -- Logs should not be abused as a *printf*-style debugging tool. - -The following keys are used in the final logs that are piped to Logstash. -Notice that the key name is automatically formed by a separate formatter and -may differ from the one used in the actual log call. - -Keys that are autogenerated for all logs: - - - `@timestamp`: Timestamp for the log - - `@version`: Version of the logger - - `host`: The host name from which the log originated - - `path`: Path of the module from which the log was created - - `tags`: Tags for this log - - `type`: The *message_type* as set in the LogstashFormatter - - `level`: The log level: `DEBUG`, `INFO`, `WARNING`, `ERROR` - - `logger_name`: Name of the logger - - `nomad.service`: The service name as configured in `config.py` - - `nomad.release`: The release name as configured in `config.py` - -Keys that are present for events related to processing an entry: - - - `nomad.upload_id`: The id of the currently processed upload - - `nomad.entry_id`: The id of the currently processed entry - - `nomad.mainfile`: The mainfile of the currently processed entry - -Keys that are present for events related to exceptions: - - - `exc_info`: Stores the full python exception that was encountered. All - uncaught exceptions will be stored automatically here. - - `digest`: If an exception was raised, the last 256 characters of the message - are stored automatically into this key. If you wish to search for exceptions - in Kibana, you will want to use this value as it will be indexed unlike the - full exception object. - - -### Copyright Notices - -We follow this [recommendation](https://www.linuxfoundation.org/blog/2020/01/copyright-notices-in-open-source-software-projects/) -of the Linux Foundation for the copyright notice that is placed on top of each source -code file. - -It is intended to provide a broad generic statement that allows all authors/contributors -of the NOMAD project to claim their copyright, independent of their organization or -individual ownership. - -You can simply copy the notice from another file. From time to time we can use a tool -like [licenseheaders](https://pypi.org/project/licenseheaders/) to ensure correct -notices. In addition we keep an purely informative AUTHORS file. - - -## Git/GitLab - -### Branches and clean version history - -The `master` branch of our repository is *protected*. You must not (even if you have -the rights) commit to it directly. The `master` branch references the latest official -release (i.e. what the current NOMAD runs on). The current development is represented by -*version* branches, named `vx.x.x`. Usually there are two or more of these branched, -representing the development on *minor/bugfix* versions and the next *major* version(s). -Ideally these *version* branches are also not manually push to. - -Instead you develop -on *feature* branches. These are branches that are dedicated to implement a single feature. -They are short lived and only exist to implement a single feature. - -The lifecycle of a *feature* branch should look like this: - -- create the *feature* branch from the last commit on the respective *version* branch that passes CI - -- do your work and push until you are satisfied and the CI passes - -- create a merge request on GitLab - -- discuss the merge request on GitLab - -- continue to work (with the open merge request) until all issues from the discussion are resolved - -- the maintainer performs the merge and the *feature* branch gets deleted - -### Submodules - -We currently use git submodules to manage NOMAD internal dependencies (e.g. parsers). -All dependencies are python packages and installed via pip to your python environement. - -This allows us to target (e.g. install) individual commits. More importantly, we can address commit -hashes to identify exact parser/normalizer versions. On the downside, common functions -for all dependencies (e.g. the python-common package, or nomad_meta_info) cannot be part -of the nomad-FAIRDI project. In general, it is hard to simultaneously develop nomad-FAIRDI -and NOMAD-coe dependencies. - -Another approach is to integrate the NOMAD-coe sources with nomad-FAIRDI. The lacking -availability of individual commit hashes, could be replaces with hashes of source-code -files. - -We use the `master` branch on all dependencies. Of course feature branches can be used on -dependencies to manage work in progress. - -### Keep a clean history - -While working on a feature, there are certain practices that will help us to create -a clean history with coherent commits, where each commit stands on its own. - -```sh - git commit --amend -``` - -If you committed something to your own feature branch and then realize by CI that you have -some tiny error in it that you need to fix, try to amend this fix to the last commit. -This will avoid unnecessary tiny commits and foster more coherent single commits. With *amend* -you are basically adding changes to the last commit, i.e. editing the last commit. If -you push, you need to force it `git push origin feature-branch --force-with-lease`. So be careful, and -only use this on your own branches. - -```sh - git rebase <version-branch> -``` - -Lets assume you work on a bigger feature that takes more time. You might want to merge -the version branch into your feature branch from time to time to get the recent changes. -In these cases, use rebase and not merge. Rebase puts your branch commits in front of the -merged commits instead of creating a new commit with two ancestors. It basically moves the -point where you initially branched away from the version branch to the current position in -the version branch. This will avoid merges, merge commits, and generally leave us with a -more consistent history. You can also rebase before creating a merge request, which basically -allows no-op merges. Ideally the only real merges that we ever have, are between -version branches. - -```sh - git merge --squash <other-branch> -``` - -When you need multiple branches to implement a feature and merge between them, try to -use *squash*. Squashing basically puts all commits of the merged branch into a single commit. -It basically allows you to have many commits and then squash them into one. This is useful -if these commits were made just to synchronize between workstations, due to -unexpected errors in CI/CD, because you needed a save point, etc. Again the goal is to have -coherent commits, where each commits makes sense on its own. - -Often a feature is also represented by an *issue* on GitLab. Please mention the respective -issues in your commits by adding the issue id at the end of the commit message: *My message. #123*. - -We tag releases with `vX.X.X` according to the regular semantic versioning practices. -After releasing and tagging the *version* branch is removed. Do not confuse tags with *version* branches. -Remember that tags and branches are both Git references and you can accidentally pull/push/checkout a tag. - -The main NOMAD GitLab-project (`nomad-fair`) uses Git-submodules to maintain its -parsers and other dependencies. All these submodules are places in the `/dependencies` -directory. There are helper scripts to install (`./dependencies.sh`) and -commit changes to all submodules (`./dependencies-git.sh`). After merging or checking out, -you have to make sure that the modules are updated to not accidentally commit old -submodule commits again. Usually you do the following to check if you really have a -clean working directory. - -```sh - git checkout something-with-changes - git submodule update - git status -``` +then install the extension by drag the file `nomad-0.0.x.vsix` and drop it into the extension panel of the vscode. \ No newline at end of file diff --git a/docs/schema/python.md b/docs/schema/python.md index b2a7809b5abd1507de73bd29e4dea456bcc7ba04..6472a3a7e4d07ad6acfae76504288e93ae65b856 100644 --- a/docs/schema/python.md +++ b/docs/schema/python.md @@ -249,7 +249,7 @@ followed? ### Schema super structure -You should follow the basic [developer's getting started](../developers.md) to setup a development environment. This will give you all the necessary libraries and allows you +You should follow the basic [developer's getting started](../develop/setup.md) to setup a development environment. This will give you all the necessary libraries and allows you to place your modules into the NOMAD code. The `EntryArchive` section definition sets the root of the archive for each entry in @@ -257,7 +257,7 @@ NOMAD. It therefore defines the top level sections: - `metadata`, all "administrative" metadata (ids, permissions, publish state, uploads, user metadata, etc.) - `results`, a summary with copies and references to data from method specific sections. This also -presents the [searchable metadata](../search.md). +presents the [searchable metadata](../develop/search.md). - `workflows`, all workflow metadata - Method specific sub-sections, e.g. `run`. This is were all parsers are supposed to add the parsed data. diff --git a/mkdocs.yml b/mkdocs.yml index 70a691c73e07e54f2d3d1d47a1f03a8918d1df4c..95451992676f9e8747c605e62f45e313da65400f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -17,10 +17,12 @@ nav: - schema/elns.md # - Using the AI Toolkit and other remote tools: aitoolkit.md - Developing NOMAD: - - developers.md - - search.md - - parser.md - - normalizers.md + - develop/setup.md + - develop/guides.md + - develop/gitlab.md + - develop/search.md + - develop/parser.md + - develop/normalizers.md - Operating NOMAD (Oasis): oasis.md theme: name: material