pylipid.func.analyze_pose_wrapper¶

pylipid.func.analyze_pose_wrapper(bs_id, bound_poses, binding_nodes, pose_info, pose_dir=None, n_top_poses=3, protein_atom_indices=None, lipid_atom_indices=None, atom_weights=None, kde_bw=0.15, pca_component=0.9, pose_format='gro', n_clusters='auto', eps=None, min_samples=None, metric='euclidean', trajfile_list=None)[source]¶

A wrapper function that ranks poses, clusters poses and calculates pose RMSD.

A wrapper function is to assist the calculation using p_tqdm, which is a multiprocessing library incorporated with progress bars. So some of the setting or parameters are used to assist the use of multiprocessing.

This function uses vectorize_poses() to convert bound poses into distance vectors, which measure the relative position of lipid atoms in their binding site. Then it uses calculate_scores() to calculate the density distributions of lipid atoms and score the bound poses based on the probability density function of the position of lipid atoms. Based on the scores, a couple of top-ranked poses (determined by n_top_poses) are written out in a coordinate format determined by pose_format. The top-ranked poses for a binding site are saved in the directory of BSidP{bs_id}_rank.

This function also clusters the bound poses. The bound poses are clustered using their distance vectoris, i.e. their relative positions in the binding site. If n_clusters is set to auto, this function will cluster the bound poses using pylipid.func.cluster_DBSCAN(). DBSCAN finds clusters of core samples of high density. If n_clusters is given an integer larger than 0, this function will cluster the lipid bound poses using pylipid.func.cluster_KMeans(). The KMeans cluster separates the samples into n clusters of equal variances, via minimizing the inertia.

For writing out the cluster poses, this function will randomly select one pose from each cluster in the case of using KMeans or one from the core samples of each cluster when DBSCAN is used, and writes the selected protein-lipid coordinates in the provided pose format pose_format. The clustered poses are saved in the directory BSid{bs_id}_clusters.

The root mean square deviation (RMSD) of a lipid bound pose in a binding site is calculated from the relative position of the pose in the binding site compared to the average position of the bound poses. Thus, the pose RMSD is defined as:

\[RMSD=\sqrt{\frac{\sum_{i=0}^{N} \sum_{j=0}^{M}\left(D_{i j}-\bar{D}_{j}\right)^{2}}{N}}\]

where \(D_{i j}\) is the distance of atom i to the residue j in the binding site; \(\bar{D}_{j}\) is the average distance of atom i from all bound poses in the binding site to residue j; N is the number of atoms in the lipid molecule and M is the number of residues in the binding site.

Parameters

bs_id (int) – Binding Site ID. Used in creating directory for stoing poses.
bound_poses (mdtraj.Trajectory) – A mdtraj.Trajectory object that contains protein-lipid complex coordinates.
binding_nodes (list) – A list of residue indices of binding site residues.
pose_info (list) – A list of tuples of information of the bound poses in bound_poses. A info tuple contains four figures, which correspond to traj_idx, protein_idx, lipid_residue_index, frame_id*traj.timestep. These pose_info tuples and the bound poses are generated by collect_bound_poses().
pose_dir (str or None) – The directory to save representative bound poses and clustered poses. If None, save at the current working directory.
n_top_poses (int, default=3) – Number of representative bound poses selected to write to disc.
protein_atom_indices (list) – A list of residue atom indices in the protein-lipid complex structure, i.e. [[0,1,2],[3,4,5],[6,7]] means the first residue contains atom 0,1,2, and the second residue contains atom 3,4,5. Used when vectorizing the bound poses by vectorize_poses().
lipid_atom_indcies (list) – The atom indices of the lipid molecule in the protein-lipid complex structure. Used when vectorizing the bound poses by vectorize_poses().
atom_weights (None or dict) – A dictionary that contains the weight for n_lipid_atoms, {idx_atom: weight}. Used when scoring the bound poses by calculate_scores().
kde_bw (calar, default=0.15) – The bandwidth for kernel density estimation. Used when estimating the kernel density of lipid atom positions by calculate_scores(). By default, the bandwidth is set to 0.15nm which roughly corresponds to the vdw radius of MARTINI 2 beads.
pca_component (scalar, default=0.9) – The number of components to keep. if 0 < pca_component<1, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components. It is used to decrease the dimentions of distance vectors when estimating the kernel density, by calculate_scores().
pose_format (str, optional, default="gro") – The format of the coordinate files. Support formats included by mdtraj.Trajectory.save(). Used by write_bound_poses().
n_clusters (int or "auto", default="auto") – Number of clusters to find in the bound poses. When set to auto, pylipid.func.cluster_DBSCAN() will be used for clustering. When a integer is given, pylipid.func.cluster_KMeans() will be used.
eps (float or None, default=None) – The maximum distance between two samples for one to be considered as in the neighborhood of the other. Used by pylipid.func.cluster_DBSCAN().
min_samples (int or None, default=None) – The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. Used by pylipid.func.cluster_DBSCAN().
metric (string, or callable, default=’euclidean’) – The metric to use when calculating distance between instances in a feature array. Used by pylipid.func.cluster_DBSCAN().
trajfile_list (str or None, default=None) – Used to write pose information, i.e. what trajectory, which timestep and which lipid molecule the written pose was taken from.

Returns

pose_rmsds – A list of bound pose RMSDs.

Return type

list