README.md 3.58 KB
Newer Older
Jonathan Juhl's avatar
Jonathan Juhl committed
1
To install the package download the repository and  cd into folder and run
Jonathan Juhl's avatar
Jonathan Juhl committed
2
3
4
5
python3 setup.py install --user



Jonathan Juhl's avatar
Jonathan Juhl committed
6
7
8
9
10
11
12
13
14
15
SortEM is an Algorithm to sort cryo-EM 2D projections.

The algorithm is run in a three step approach. 
1) Step 1 , the computer computes transformation invariant keypoints , meaning the feature vector contains the same information for all projections    from the same molecule. (not implemented in master branch comming soon). Base on article: https://arxiv.org/pdf/1806.06778.pdf

2) An Autoencoder is produced to project the keypoint vector into a low dimensional space ,suitable for constructing a graph. (Implemented). The nodes are used as anchor points and are sampled randomly from the dataset.
3) A graph is constructed and optimized, the clusters are found by the connected components of the graph (In branch)

https://arxiv.org/pdf/1803.01449.pdf
Running SortEM
Jonathan Juhl's avatar
Jonathan Juhl committed
16
    
Jonathan Juhl's avatar
Jonathan Juhl committed
17
18
19
20
21
22
23
24
25
26
27
28
29
    --num_gpus how many gpus to use( only tested on single gpu, can run on multi gpu)
    --gpu_list list of strings of specific gpus to use if not using slurm queue ,write: gpu:0 gpu:1 gpu:2 (does not work for multi gpu yet)
    --num_cpus integer, how many CPUs to use to preprocess the images optained from the mrc files.
    --float16 , write True to use half precision, works well on volta series and higher, increases training speed up to 2.5 times.
    --star  list of star files, can contain wild cards
    --ab The batch size to train with on a single gpu.
    --pb the batch size of the prediction, can usually be larger than the training batch size
    --o The output director (defaults ./results)
    --mp The max particles to use pr training epoch.
    --pca (not implemented) The number of nodes to use in the graph 
    --epochs The number of epochs, such that the total number of training imabes are epochs*mp
    --tr (not implemented) Use pretrained model, this will skip step 1 and 2, and the optimization procedure in step 3 so everything is just        predicted. This can predict image dater within 10 min for a huge dataset.
    --log If the star file contains classes you can track the training with actual human classification, from Relion / cryosparc (to test to see if its worth it)
Jonathan Juhl's avatar
Jonathan Juhl committed
30
    --num_classes How many classes you want when you want to compare pretraining (step 1) with the number of classes in star file. 
Jonathan Juhl's avatar
Jonathan Juhl committed
31
32
33
34
35
36
37
38

To Do List:

    - The Algorithm experience a increase and then degradation in accuracy over training time, this is due to features learnt do not represent        distinct features of the molecule, this can be improved by alot by adding **Keypoint features**. (this is step 1)
    - Finalize multi gpu support (shou)
    - Finalize transfer learning support
    - Finalize Float16 support (can run now but not fully optimized)
    
Jonathan Juhl's avatar
Jonathan Juhl committed
39
40
41
Example of typical run (what is required)

 python3 main.py --star /u/misser11/Sortinator/p28/*.star --ab 64 --num_classes 2   
Jonathan Juhl's avatar
Jonathan Juhl committed
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73


Command line instructions
You can also upload existing files from your computer using the instructions below.


Git global setup
git config --global user.name "Jonathan Juhl"
git config --global user.email "joju@inano.au.dk"

Create a new repository
git clone git@gitlab.au.dk:au482896/sortem.git
cd sortem
touch README.md
git add README.md
git commit -m "add README"
git push -u origin master

Push an existing folder
cd existing_folder
git init
git remote add origin git@gitlab.au.dk:au482896/sortem.git
git add .
git commit -m "Initial commit"
git push -u origin master

Push an existing Git repository
cd existing_repo
git remote rename origin old-origin
git remote add origin git@gitlab.au.dk:au482896/sortem.git
git push -u origin --all
git push -u origin --tags