Commit 80132a00 authored by Jonathan Juhl's avatar Jonathan Juhl
Browse files

Merge branch 'master' of gitlab.au.dk:au482896/sortem

parents 6b7f7326 d476e071
To install the package download the repository and cd into folder and run
python3 setup.py install --user
The following steps for installing sortem can be done through conda or pip. The easiest way is to install the requirements through conda as the gpu libraries and other requirements are downloaded with the python packages. If you install the packages through pip the nvidia modules will not be installed with it. In addition installing it into a conda environment avoids breaking paths.
SortEM is an Algorithm to sort cryo-EM 2D projections.
**1) Create conda enviroment**: conda create --name sortem
The algorithm is run in a three step approach.
1) Step 1 , the computer computes transformation invariant keypoints , meaning the feature vector contains the same information for all projections from the same molecule. (not implemented in master branch comming soon). Base on article: https://arxiv.org/pdf/1806.06778.pdf
**2) activate the environment**: conda activate sortem
**3) download the repository**: pip install --extra-index-url https://gitlab.au.dk/au482896/sortem
**4) install the requirements**: conda install --file requirements.txt
If you wish to install through pip it can be done by:
pip3 install -r requirements.txt
Allthough be warned this can break global paths and the nvidia modules would not be included.
The algorithm is works the following way:
1) The algorithm computes transformation invariant keypoints represented as a binary vector of [-1,1] , meaning the feature vector contains the same information for all projections from the same molecule(https://ieeexplore.ieee.org/abstract/document/9169844).
2) The vector expresses key characteristics of the protein, each pixel of the computed 16 x 16 image is weighed by a value between [0,max]. 4 areas of the protein are is extracted used for training of the neural network, improving the classification. ()
3) Each protein component is represented as a binarized vector which is concatenated with the other part , partial and full image vectors, improving the overall accuracy(https://arxiv.org/pdf/1902.09941.pdf).
2) An Autoencoder is produced to project the keypoint vector into a low dimensional space ,suitable for constructing a graph. (Implemented). The nodes are used as anchor points and are sampled randomly from the dataset.
3) A graph is constructed and optimized, the clusters are found by the connected components of the graph (In branch)
https://arxiv.org/pdf/1803.01449.pdf
Running SortEM
--num_gpus how many gpus to use( only tested on single gpu, can run on multi gpu)
......@@ -20,54 +32,20 @@ Running SortEM
--float16 , write True to use half precision, works well on volta series and higher, increases training speed up to 2.5 times.
--star list of star files, can contain wild cards
--ab The batch size to train with on a single gpu.
--pb the batch size of the prediction, can usually be larger than the training batch size
--o The output director (defaults ./results)
--mp The max particles to use pr training epoch.
--pca (not implemented) The number of nodes to use in the graph
--epochs The number of epochs, such that the total number of training imabes are epochs*mp
--tr (not implemented) Use pretrained model, this will skip step 1 and 2, and the optimization procedure in step 3 so everything is just predicted. This can predict image dater within 10 min for a huge dataset.
--tr Use pretrained model, this will skip step 1 and 2, and the optimization procedure in step 3 so everything is just predicted. This can predict image dater within 10 min for a huge dataset.
--log If the star file contains classes you can track the training with actual human classification, from Relion / cryosparc (to test to see if its worth it)
--num_classes How many classes you want when you want to compare pretraining (step 1) with the number of classes in star file.
--num_classes How many parts of the protein to rfine you want when you want to compare pretraining (step 1) with the number of classes in star file.
To Do List:
- The Algorithm experience a increase and then degradation in accuracy over training time, this is due to features learnt do not represent distinct features of the molecule, this can be improved by alot by adding **Keypoint features**. (this is step 1)
- Finalize multi gpu support (shou)
- Finalize multi gpu support
- Finalize transfer learning support
- Finalize Float16 support (can run now but not fully optimized)
Example of typical run (what is required)
python3 main.py --star /u/misser11/Sortinator/p28/*.star --ab 64 --num_classes 2
Command line instructions
You can also upload existing files from your computer using the instructions below.
Example of typical run , the star file is required, the --ab argument is the batch size. If the batch size is to big
Git global setup
git config --global user.name "Jonathan Juhl"
git config --global user.email "joju@inano.au.dk"
python3 main.py --star /u/misser11/Sortinator/p28/*.star --ab 64
Create a new repository
git clone git@gitlab.au.dk:au482896/sortem.git
cd sortem
touch README.md
git add README.md
git commit -m "add README"
git push -u origin master
Push an existing folder
cd existing_folder
git init
git remote add origin git@gitlab.au.dk:au482896/sortem.git
git add .
git commit -m "Initial commit"
git push -u origin master
Push an existing Git repository
cd existing_repo
git remote rename origin old-origin
git remote add origin git@gitlab.au.dk:au482896/sortem.git
git push -u origin --all
git push -u origin --tags
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment