pylipid.func.calculate_scores¶

pylipid.func.calculate_scores(dist_matrix, kde_bw=0.15, pca_component=0.9, score_weights=None)[source]¶

Calculate scores based on probability density.

This function first lower the dimension of dist_matrix by using a PCA. Then the distribution of the distance vectors for each atom is estimated using KDEMultivariate.

The score of a lipid pose is calculated based on the probability density function of the atom positions in the binding site and weights given to the atoms:

\[\text { score }=\sum_{i} W_{i} \cdot \hat{f}_{i, H}(D)\]

where \(W_{i}\) is the weight given to atom i of the lipid molecule, H is the bandwidth and \(\hat{f}_{i, H}(D)\) is a multivariate kernel density etimation of the position of atom i in the specified binding site. \(\hat{f}_{i, H}(D)\) is calculated from all the bound lipid poses in that binding site.

Parameters

dist_matrix (numpy.ndarray, shape=(n_lipid_atoms, n_poses, n_binding_site_residues)) – The distance vectors describing the position of bound poses in the binding site. This dist_matrix can be generated by vectorize_poses().
kde_bw (scalar, default=0.15) –
The bandwidth for kernel density estimation. Used by KDEMultivariate. By default, the bandwidth is set to 0.15nm which roughly corresponds to the vdw radius of MARTINI 2 beads.
pca_component (scalar, default=0.9) –
The number of components to keep. if 0 < pca_component<1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components. It is used by PCA.
score_weights (None or dict) – A dictionary that contains the weight for n_lipid_atoms, {idx_atom: weight}

Returns

scores – Scores for bound poses.

Return type

numpy.ndarray, shape=(n_samples,)