README.md 4.88 KB
Newer Older
Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
1
2
# SortEM

Jonathan Juhl's avatar
Jonathan Juhl committed
3
4
5
The following steps for installing sortem can be done through conda or pip. The easiest way is to install the requirements through conda as the gpu libraries and other requirements are downloaded with the python packages. If you install the packages through pip the nvidia modules will not be installed with it. In addition installing it into a conda environment avoids breaking paths. 


Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
6
7
8
9
10
11
12
13
14
15
>>>
**1) Create conda enviroment:**

_`conda create -n sortem python=3.7 tensorflow-gpu==2.2 matplotlib scikit-learn numpy==1.19.5`_

**2) activate the environment:** 
    
_`conda activate sortem`_

**3) now install the gui**: 
Jonathan Juhl's avatar
Jonathan Juhl committed
16

Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
17
_`pip install tensorflow-addons==0.11.2 appjar`_
Jonathan Juhl's avatar
Jonathan Juhl committed
18

Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
19
**4) go into anaconda environmental directory**:
Jonathan Juhl's avatar
Jonathan Juhl committed
20

Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
21
_`cd ${CONDA_PREFIX}`_
Jonathan Juhl's avatar
Jonathan Juhl committed
22

Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
23
**5) download the repository**:
Jonathan Juhl's avatar
Jonathan Juhl committed
24

Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
25
_`git clone https://gitlab.au.dk/au482896/sortem`_
Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
26

Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
27
**5) Add sortem executable to conda-envs PATH**:
Jonathan Juhl's avatar
Jonathan Juhl committed
28

Jesper Lykkegaard Karlsen's avatar
Jesper Lykkegaard Karlsen committed
29
30
_`ln -s ../sortem/sortem bin/.`_
>>>
Jonathan Juhl's avatar
Jonathan Juhl committed
31
32


Jonathan Juhl's avatar
Jonathan Juhl committed
33

Jonathan Juhl's avatar
Jonathan Juhl committed
34
35
36
37
38
39
The algorithm is works the following way:
1) The algorithm computes transformation invariant keypoints represented as a binary vector of [-1,1] , meaning the feature vector contains the same information for all projections  from the same molecule(https://ieeexplore.ieee.org/abstract/document/9169844).

2) The vector expresses key characteristics of the protein, each pixel of the computed 16 x 16 image is weighed by a value between [0,max]. 4 areas of the protein are is extracted used for training of the neural network, improving the classification. ()

3) Each protein component is represented as a binarized vector which is concatenated with the other part , partial and full image vectors, improving the overall accuracy(https://arxiv.org/pdf/1902.09941.pdf).
Jonathan Juhl's avatar
Jonathan Juhl committed
40
41
42


Running SortEM
Jonathan Juhl's avatar
Jonathan Juhl committed
43
44
45
46
47
48
49
    parser.add_argument('--gpu_id',type=int, default= 0, help='GPU ID. The ID of the GPU to execute the operations on. ')
    parser.add_argument('--num_cpus',type=int,default = 8,help='The maximum allowed cpus to use for preprocessing data (image resize and normalization')
   
    parser.add_argument('--star', type=str, nargs='+',
                        help='list of path to the star files, wild cards are accepted. The star file must refer to the .mrc files')
    parser.add_argument('--batch_size', type=int,default=[100,75,50,20,10], nargs='+',
                        help='deep learning model training batch size for each image scale')     
Jonathan Juhl's avatar
Jonathan Juhl committed
50
    
Jonathan Juhl's avatar
Jonathan Juhl committed
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
    parser.add_argument('--o', type=str,default='./results',
                        help='output directory')   
   
    parser.add_argument('--f16', dest='f16',action='store_true',
                        help='Apply Tensor core acceleration to training and inference, requires compute capability of 10.0 or higher.')        

    parser.add_argument('--save_model', type=int,default=5,help='validation interval where models at full size are printed out.')  
        
    parser.add_argument('--lr_g',type=float,default=[10**(-5),0.5*10**(-5),10**(-6),0.5*10**(-6),10**(-7),0.5*10**(-7)], nargs='+',help='The staircase learning rates of the generator')

    parser.add_argument('--lr_d',type=float,default=[10**(-4),0.5*10**(-4),10**(-5),0.5*10**(-5),10**(-6),0.5*10**(-6)],  nargs='+',help='The staircase learning rates of the discriminator')

    parser.add_argument('--lr_e',type=float,default=[10**(-4),0.5*10**(-4),10**(-5),0.5*10**(-5),10**(-6),0.5*10**(-6)],  nargs='+',help='The staircase learning rates of the encoder')
       
    parser.add_argument('--ctf', dest='ctf',action='store_true',default=False,help='Use CTF parameters for model.')

    parser.add_argument('--noise', dest='noise',action='store_true',default=False ,help='Use the noise generator to generate and scale the noise')

    parser.add_argument('--steps',type=int,default=[10000,10000,10000,10000,10000], nargs='+',help='how many epochs( runs through the dataset) before termination')

    parser.add_argument('--l_reg',type=float,default=0.01,help='the lambda regulization of the diversity score loss if the noise generator is active')

    parser.add_argument('--frames',type=int,default=4,help='number of models to generate from each cluster')

    parser.add_argument('--umap_p_size',type=int,default=100,help='The number of feature vectors to use for training Umap'

    parser.add_argument('--umap_t_size',type=int,default=100,help='The number of feature vectors to use for intermediate evaluation of clusters in the umap algorithm')

    parser.add_argument('--neighbours',type=int,default=30,help='number of neighbours in the graph creation algorithm')

    parser.add_argument('--t_res',type=int,default=None,choices=[32,64,128,256,512],help='The maximum resolution to train the model on')

Jonathan Juhl's avatar
Jonathan Juhl committed
83
    parser.add_argument('--minimum_size',type=int,default=500,help='the minimum size before its considered an actual cluster, anything else less is considered noise')- 
Jonathan Juhl's avatar
Jonathan Juhl committed
84
85

Example of typical run , the star file is required, and use --ctf and --noise to run on real data.
Jonathan Juhl's avatar
Jonathan Juhl committed
86

Jonathan Juhl's avatar
Jonathan Juhl committed
87
 python3 main.py --star /u/misser11/Sortinator/p28/*.star --ctf --noise
Jonathan Juhl's avatar
Jonathan Juhl committed
88
89
90