How to contribute to skimage

Developing Open Source is great fun! Join us on the scikit-image mailing list and tell us which of the following challenges you’d like to solve.

Development process

Here’s the long and short of it:

  1. If you are a first-time contributor:

    • Go to https://github.com/scikit-image/scikit-image and click the “fork” button to create your own copy of the project.

    • Clone the project to your local computer:

      git clone https://github.com/your-username/scikit-image.git
      
    • Change the directory:

      cd scikit-image
      
    • Add the upstream repository:

      git remote add upstream https://github.com/scikit-image/scikit-image.git
      
    • Now, you have remote repositories named:

      • upstream, which refers to the scikit-image repository

      • origin, which refers to your personal fork

  2. Develop your contribution:

    • Pull the latest changes from upstream:

      git checkout master
      git pull upstream master
      
    • Create a branch for the feature you want to work on. Since the branch name will appear in the merge message, use a sensible name such as ‘transform-speedups’:

      git checkout -b transform-speedups
      
    • Commit locally as you progress (git add and git commit)

  3. To submit your contribution:

    • Push your changes back to your fork on GitHub:

      git push origin transform-speedups
      
    • Enter your GitHub username and password (repeat contributors or advanced users can remove this step by connecting to GitHub with SSH).

    • Go to GitHub. The new branch will show up with a green Pull Request button - click it.

    • If you want, post on the mailing list to explain your changes or to ask for review.

For a more detailed discussion, read these detailed documents on how to use Git with scikit-image (https://scikit-image.org/docs/dev/gitwash/index.html).

  1. Review process:

    • Reviewers (the other developers and interested community members) will write inline and/or general comments on your Pull Request (PR) to help you improve its implementation, documentation, and style. Every single developer working on the project has their code reviewed, and we’ve come to see it as a friendly conversation from which we all learn and the overall code quality benefits. Therefore, please don’t let the review discourage you from contributing: its only aim is to improve the quality of the project, not to criticize (we are, after all, very grateful for the time you’re donating!).

    • To update your pull request, make your changes on your local repository and commit. As soon as those changes are pushed up (to the same branch as before) the pull request will update automatically.

    • Travis-CI, a continuous integration service, is triggered after each Pull Request update to build the code, run unit tests, measure code coverage and check coding style (PEP8) of your branch. The Travis tests must pass before your PR can be merged. If Travis fails, you can find out why by clicking on the “failed” icon (red cross) and inspecting the build and test log.

    • A pull request must be approved by two core team members before merging.

  2. Document changes

    If your change introduces any API modifications, please update doc/source/api_changes.txt.

    If your change introduces a deprecation, add a reminder to TODO.txt for the team to remove the deprecated functionality in the future.

Note

To reviewers: if it is not obvious from the PR description, add a short explanation of what a branch did to the merge message and, if closing a bug, also add “Closes #123” where 123 is the issue number.

Divergence between upstream master and your feature branch

If GitHub indicates that the branch of your Pull Request can no longer be merged automatically, merge the master branch into yours:

git fetch upstream master
git merge upstream/master

If any conflicts occur, they need to be fixed before continuing. See which files are in conflict using:

git status

Which displays a message like:

Unmerged paths:
  (use "git add <file>..." to mark resolution)

  both modified:   file_with_conflict.txt

Inside the conflicted file, you’ll find sections like these:

<<<<<<< HEAD
The way the text looks in your branch
=======
The way the text looks in the master branch
>>>>>>> master

Choose one version of the text that should be kept, and delete the rest:

The way the text looks in your branch

Now, add the fixed file:

git add file_with_conflict.txt

Once you’ve fixed all merge conflicts, do:

git commit

Note

Advanced Git users are encouraged to rebase instead of merge, but we squash and merge most PRs either way.

Build environment setup

Once you’ve cloned your fork of the scikit-image repository, you should set up a Python development environment tailored for scikit-image. You may choose the environment manager of your choice. Here we provide instructions for two popular environment managers: venv (pip based) and conda (Anaconda or Miniconda).

venv

When using venv, you may find the following bash commands useful:

# Create a virtualenv named ``skimage-dev`` that lives in the directory of
# the same name
python -m venv skimage-dev
# Activate it
source skimage-dev/bin/activate
# Install all development and runtime dependencies of scikit-image
pip install -r <(cat requirements/*.txt)
# Build and install scikit-image from source
pip install -e .
# Test your installation
pytest skimage

conda

When using conda, you may find the following bash commands useful:

# Create a conda environment named ``skimage-dev``
conda create --name skimage-dev
# Activate it
conda activate skimage-dev
# Install major development and runtime dependencies of scikit-image
# (the rest can be installed from conda-forge or pip, if needed)
conda install `for i in requirements/{default,build}.txt; do echo -n " --file $i "; done`
# Install minimal testing dependencies
conda install pytest
# Install scikit-image from source
pip install -e . --no-deps
# Test your installation
pytest skimage

Guidelines

  • All code should have tests (see test coverage below for more details).

  • All code should be documented, to the same standard as NumPy and SciPy.

  • For new functionality, always add an example to the gallery.

  • No changes are ever committed without review and approval by two core team members. Ask on the mailing list if you get no response to your pull request. Never merge your own pull request.

  • Examples in the gallery should have a maximum figure width of 8 inches.

Stylistic Guidelines

  • Set up your editor to remove trailing whitespace. Follow PEP08. Check code with pyflakes / flake8.

  • Use numpy data types instead of strings (np.uint8 instead of "uint8").

  • Use the following import conventions:

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import ndimage as ndi
    
    cimport numpy as cnp  # in Cython code
    
  • When documenting array parameters, use image : (M, N) ndarray and then refer to M and N in the docstring, if necessary.

  • Refer to array dimensions as (plane), row, column, not as x, y, z. See Coordinate conventions in the user guide for more information.

  • Functions should support all input image dtypes. Use utility functions such as img_as_float to help convert to an appropriate type. The output format can be whatever is most efficient. This allows us to string together several functions into a pipeline, e.g.:

    hough(canny(my_image))
    
  • Use Py_ssize_t as data type for all indexing, shape and size variables in C/C++ and Cython code.

  • Use relative module imports, i.e. from .._shared import xyz rather than from skimage._shared import xyz.

  • Wrap Cython code in a pure Python function, which defines the API. This improves compatibility with code introspection tools, which are often not aware of Cython code.

  • For Cython functions, release the GIL whenever possible, using with nogil:.

Testing

scikit-image has an extensive test suite that ensures correct execution on your system. The test suite has to pass before a pull request can be merged, and tests should be added to cover any modifications to the code base.

We make use of the pytest testing framework, with tests located in the various skimage/submodule/tests folders.

To use pytest, ensure that Cython extensions are built and that the library is installed in development mode:

$ pip install -e .

Now, run all tests using:

$ PYTHONPATH=. pytest skimage

Or the tests for a specific submodule:

$ PYTHONPATH=. pytest skimage/morphology

Or tests from a specific file:

$ PYTHONPATH=. pytest skimage/morphology/tests/test_grey.py

Or a single test within that file:

$ PYTHONPATH=. pytest skimage/morphology/tests/test_grey.py::test_3d_fallback_black_tophat

Use --doctest-modules to run doctests. For example, run all tests and all doctests using:

$ PYTHONPATH=. pytest --doctest-modules skimage

Test coverage

Tests for a module should ideally cover all code in that module, i.e., statement coverage should be at 100%.

To measure the test coverage, install pytest-cov (using easy_install pytest-cov) and then run:

$ make coverage

This will print a report with one line for each file in skimage, detailing the test coverage:

Name                                             Stmts   Exec  Cover   Missing
------------------------------------------------------------------------------
skimage/color/colorconv                             77     77   100%
skimage/filter/__init__                              1      1   100%
...

Activate Travis-CI for your fork (optional)

Travis-CI checks all unit tests in the project to prevent breakage.

Before sending a pull request, you may want to check that Travis-CI successfully passes all tests. To do so,

  • Go to Travis-CI and follow the Sign In link at the top

  • Go to your profile page and switch on your scikit-image fork

It corresponds to steps one and two in Travis-CI documentation (Step three is already done in scikit-image).

Thus, as soon as you push your code to your fork, it will trigger Travis-CI, and you will receive an email notification when the process is done.

Every time Travis is triggered, it also calls on Codecov to inspect the current test overage.

Building docs

To build docs, run make from the doc directory. make help lists all targets. For example, to build the HTML documentation, you can run:

make html

Then, all the HTML files will be generated in scikit-image/doc/build/html/. To rebuild a full clean documentation, run:

make clean
make html

Requirements

Sphinx and LaTeX are needed to build the documentation.

Sphinx:

Sphinx and other python packages needed to build the documentation can be installed using: scikit-image/requirements/docs.txt file.

pip install -r requirements/docs.txt

LaTeX Ubuntu:

sudo apt-get install -qq texlive texlive-latex-extra dvipng

LaTeX Mac:

Install the full MacTex installation or install the smaller BasicTex and add ucs and dvipng packages:

sudo tlmgr install ucs dvipng

Fixing Warnings

  • “citation not found: R###” There is probably an underscore after a reference in the first line of a docstring (e.g. [1]_). Use this method to find the source file: $ cd doc/build; grep -rin R####

  • “Duplicate citation R###, other instance in…”” There is probably a [2] without a [1] in one of the docstrings

  • Make sure to use pre-sphinxification paths to images (not the _images directory)

Auto-generating dev docs

This set of instructions was used to create scikit-image/tools/deploy-docs.sh

  • Go to Github account settings -> personal access tokens

  • Create a new token with access rights public_repo and user:email only

  • Install the travis command line tool: gem install travis. On OSX, you can get gem via brew install ruby.

  • Take then token generated by Github and run travis encrypt GH_TOKEN=<token> from inside a scikit-image repo

  • Paste the output into the secure: field of .travis.yml.

  • The decrypted GH_TOKEN env var will be available for travis scripts

https://help.github.com/articles/creating-an-access-token-for-command-line-use/ https://docs.travis-ci.com/user/encryption-keys/

Deprecation cycle

If the behavior of the library has to be changed, a deprecation cycle must be followed to warn users.

  • a deprecation cycle is not necessary when:

    • adding a new function, or

    • adding a new keyword argument to the end of a function signature, or

    • fixing what was buggy behavior

  • a deprecation cycle is necessary for any breaking API change, meaning a

    change where the function, invoked with the same arguments, would return a different result after the change. This includes:

    • changing the order of arguments or keyword arguments, or

    • adding arguments or keyword arguments to a function, or

    • changing a function’s name or submodule, or

    • changing the default value of a function’s arguments.

Usually, our policy is to put in place a deprecation cycle over two releases.

For the sake of illustration, we consider the modification of a default value in a function signature. In version N (therefore, next release will be N+1), we have

def a_function(image, rescale=True):
    out = do_something(image, rescale=rescale)
    return out

that has to be changed to

def a_function(image, rescale=None):
    if rescale is None:
        warn('The default value of rescale will change '
             'to `False` in version N+3.', stacklevel=2)
        rescale = True
    out = do_something(image, rescale=rescale)
    return out

and in version N+3

def a_function(image, rescale=False):
    out = do_something(image, rescale=rescale)
    return out

Here is the process for a 2-release deprecation cycle:

  • In the signature, set default to None, and modify the docstring to specify that it’s True.

  • In the function, _if_ rescale is set to None, set to True and warn that the default will change to False in version N+3.

  • In doc/release/release_dev.rst, under deprecations, add “In a_function, the rescale argument will default to False in N+3.”

  • In TODO.txt, create an item in the section related to version N+3 and write “change rescale default to False in a_function”.

Note that the 2-release deprecation cycle is not a strict rule and in some cases, the developers can agree on a different procedure upon justification (like when we can’t detect the change, or it involves moving or deleting an entire function for example).

Scikit-image uses warnings to highlight changes in its API so that users may update their code accordingly. The stacklevel argument sets the location in the callstack where the warnings will point. In most cases, it is appropriate to set the stacklevel to 2. When warnings originate from helper routines internal to the scikit-image library, it is may be more appropriate to set the stacklevel to 3. For more information, see the documentation of the warn function in the Python standard library.

To test if your warning is being emitted correctly, try calling the function from an IPython console. It should point you to the console input itself instead of being emitted by the files in the scikit-image library.

  • Good: ipython:1: UserWarning: ...

  • Bad: scikit-image/skimage/measure/_structural_similarity.py:155: UserWarning:

Benchmarks

While not mandatory for most pull requests, we ask that performance related PRs include a benchmark in order to clearly depict the use-case that is being optimized for. A historical view of our snapshots can be found on at the following website.

In this section we will review how to setup the benchmarks, and three commands asv dev, asv run and asv continuous.

Prerequisites

Begin by installing airspeed velocity in your development environment. Prior to installation, be sure to activate your development environment, then if using venv you may install the requirement with:

source skimage-dev/bin/activate
pip install asv

If you are using conda, then the command:

conda activate skimage-dev
conda install asv

is more appropriate. Once installed, it is useful to run the command:

asv machine

To let airspeed velocity know more information about your machine.

Writing a benchmark

To write benchmark, add a file in the benchmarks directory which contains a a class with one setup method and at least one method prefixed with time_.

The time_ method should only contain code you wish to benchmark. Therefore it is useful to move everything that prepares the benchmark scenario into the setup method. This function is called before calling a time_ method and its execution time is not factored into the benchmarks.

Take for example the TransformSuite benchmark:

import numpy as np
from skimage import transform

class TransformSuite:
    """Benchmark for transform routines in scikit-image."""

    def setup(self):
        self.image = np.zeros((2000, 2000))
        idx = np.arange(500, 1500)
        self.image[idx[::-1], idx] = 255
        self.image[idx, idx] = 255

    def time_hough_line(self):
        result1, result2, result3 = transform.hough_line(self.image)

Here, the creation of the image is completed in the setup method, and not included in the reported time of the benchmark.

It is also possible to benchmark features such as peak memory usage. To learn more about the features of asv, please refer to the official airpseed velocity documentation.

Testing the benchmarks locally

Prior to running the true benchmark, it is often worthwhile to test that the code is free of typos. To do so, you may use the command:

asv dev -b TransformSuite

Where the TransformSuite above will be run once in your current environment to test that everything is in order.

Running your benchmark

The command above is fast, but doesn’t test the performance of the code adequately. To do that you may want to run the benchmark in your current environment to see the performance of your change as you are developing new features. The command asv run -E existing will specify that you wish to run the benchmark in your existing environment. This will save a significant amount of time since building scikit-image can be a time consuming task:

asv run -E existing -b TransformSuite

Comparing results to master

Often, the goal of a PR is to compare the results of the modifications in terms speed to a snapshot of the code that is in the master branch of the scikit-image repository. The command asv continuous is of help here:

asv continuous master -b TransformSuite

This call will build out the environments specified in the asv.conf.json file and compare the performance of the benchmark between your current commit and the code in the master branch.

The output may look something like:

$ asv continuous master -b TransformSuite
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.7-cython-numpy1.15-scipy
·· Installing 544c0fe3 <benchmark_docs> into conda-py3.7-cython-numpy1.15-scipy.
· Running 4 total benchmarks (2 commits * 2 environments * 1 benchmarks)
[  0.00%] · For scikit-image commit 37c764cb <benchmark_docs~1> (round 1/2):
[...]
[100.00%] ··· ...ansform.TransformSuite.time_hough_line           33.2±2ms

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

In this case, the differences between HEAD and master are not significant enough for airspeed velocity to report.