Notes on the Serpentines

Is Dynamic Bayesian Networks Apt To Predict Receptor Conformational States?

Part I. Introduction.

A motivation here is to predict specific conformational states of receptors in all possible multimeric complexes, and to approximate how the components may influence properties of one another in the complex.

Algorithms, which enable prediction of probable protomeric states in multimers, would define intracellular signalling cascades occurring individually, or simultaneously, via specific receptor activation or co-activation. By relating the information with other findings obtained for gene regulatory networks and intracellular protein interactions, full pictures of the intracellular cascades occurring in the cell would be gathered in detail, provided that the same or a very similar set of expression profile was obtained by microarray or RT-PCR in the studies to be compared.

Since information involved in the entire process of cell-signalling are hugely expansive, modelling the whole would yet be an over-ambitious project. A model which represents a local region for receptors could be made, ideally containing all relevant components of the environment; however, missing or excess components are inevitable, partly because few experimental procedures are completely free from a fractional possibility of false negatives or positives arising, though some may disagree on this notion. Model systems which fit into empirical evidences and hypotheses would be a theoretical evidence that could dissolve any inconsistencies observed in experimental findings, as well as allowing predictions for unobserved behaviours to be made.

Statistical inferences enable models to be constructed incorporating certain degrees of uncertainty. A brief description of Bayesian inference is given below.

Background

Bayesian Probability

Bayesian inference is simply expressed as

P(H|O) = P(O|H)P(H)/P(O)

where P(H) is prior probability over hypothesis, P(O) is marginal probability of observed, P(O|H) is conditional probability of observed for given hypothesis, and P(H|O) is posterior probability of hypothesis for given observation.
NB: The above O should not be confused with odd ratio. In expressing Bayes rule, E often denotes evidence observed; here, E denotes edges in the next section and so the duplicated meaning on a notation was avoided.

for more on Bayes theorem plato.Stanford.encyclopedia. Bayes-theorem

Bayesian Network

Bayesian network is a graphical model in which joint probability distribution of a sequence of random variables and conditional dependencies of the variables are presented in a directed acyclic graph (DAG).

DAG is formed by a collection of vertices (i.e. nodes) and directed edges (i.e. arrows) joined in a manner in which a sequence of edges initiated at a vertex never loop back to said vertex, hence it is an acyclic graph of directed sequences. Vertices at which edges are initiated are often called parents, which connect to children vertices.

A DAG is expressed as G = (V, E)

where V signifies a set of vertices of random variables {X1,X2,...,Xp}, and E denotes a set of edges between the variables (see FIG 1).

FIG 1. A simple Bayesian network

V = {1,2,3,4,5}, E = {(1,2), (1,3), (2,4), (3,4), (4,5)}, and
factorized joint probability: P(1,2,3,4,5) = P(1)P(2|1)P(3|1)P(4|3,2)P(5|4).

A Bayesian network can give answers to probabilistic queries about the variables contained and relevant interrelations. The network forecast the state of a sequence of variables when other variables of evidences are observed.

Machine Learning and Parameter Estimation

In statistical inferences, conditional probability often include unknown parameters which can be evaluated from data.

A statistical inference, the frequentist approach, has a method of maximum-likelihood estimation (MLE), wherein mean and variance were estimated from a given sample, then a unknown parameter value is determined in order for the observed outcome to be most probable. When the model depends on unobserved inferred variables, a parameter can be estimated by expectation-maximisation (EM) algorithm, performed iteratively in alternating two steps: 1) E step for expected likelihood of the inferred variables to be evaluated; and 2) M step that maximises the likelihood derived at E step to derive parameters (Dempster, Laird & Rubin 1977; Wu 1983).

In Bayesian approach, parameters are estimated by specifying a loss function, by selecting the value that minimises expected loss under the posterior probability distribution for loss function in order to avoid reaching a wrong estimate (reviewed by Brooks 2003).

Bayesian approach often employs Markov chain Monte Carlo (MCMC) method for faster approximation of complex data. A Markov chain is a string of observations generated for any value at each point in the string, to be dependent of the forward but not of the previous values. The algorithm behaves like an explorer who decides randomly where to go next from his current position; his slight preference for the higher ground allows peaks on the surface to emerge in the uncharted landscape. The random walker, MCMC explorer gathers a global scene fast, in contrast to a MLE explorer who focuses much at a high peak for a better resolution, tending to get stuck there (explained more with the same analogy by Brooks 2003).

A dynamic, Hidden Markov model (HMM) assess invisible unobserved states from the outcomes which depend on the states following posterior probability distribution. HMM could allow an infinite numbers of unknown states applying stochastic Dirichlet process as a probability distribution.

wikipedia.org/wiki/Hidden_Markov_model
JASSS Markov Chain Analysis

HMM has been implemented in programs for sequence analysis on motif search and homology search. An example is HMMER, an open source software for sequence analysis, written by Eddy SR in Howard Hughes Medical Institute.
HMMER.janelia.org

Bayesian Networks in Biology

Bayesian networks have been employed extensively in biology, notably in analysing gene regulatory networks (Perrin et al 2003; Husmeier 2003), protein interactions (Chen et al 2010; Xu et al 2011), signalling pathways (Bender et al 2010), phylogenic tree analysis (Rasmussen & Kellis 2011), natural selection (Geisler & Diehl 2002), and predicting secondary structures of proteins (Aydin et al 2011). These are just a few examples of numerous studies that applied Bayesian inference to give answers to various queries in biology.

The Challenge

Analysing heteromeric receptor association complexes and receptor-effector interactions in a given cell-type with dynamic Bayesian network

Once the expression profile of the membrane proteins and intracellular proteins that associate with GPCRs (G-protein subtypes, kinases, phosphatases, β-arrestins and so on) in a cell-type under a specified condition was obtained, then their interaction network could be modelled for the cell-type specific for the condition.

If relevant experimental observations have already been made in said cell-type under the same condition, for instances ligand binding profiles of receptors of interests, affinity of effectors to the receptors, phosphorylation profiles, the rate of internalisation, known oligomerisation tendencies, and known synergistic activity of the receptor with other receptors etc are available, these information would be in the model formation, or treated as hidden to estimate the validity of the model. Such models would likely include several unknowns including a dynamic oscillatory changes in intracellular calcium concentration upon receptor activation.

The model may indicate how each receptor’s function relates with the other receptor, as well as mapping probable interactions with other relevant components in the local condition.

A comparable set of models for particular receptors could be generated based on expression profiles of the cells under different conditions of the same cell-type, or on datasets of different cell-types that express the receptors of interests with slightly different component. The processes may reveal better detailed pictures as to how the receptors tend to operate and how receptors may adapt to local environmental changes.

Predicting protein interactions with dynamic Bayesian network has already been performed by many research groups. Some of these are going to be discussed in Part II, which follows this introduction.

Estimating conformational status of protomers in putative oligomeric complexes

A question here is: how plausible would it be to predict each conformational state of protomer in a putative multimeric complex in each specific condition: could that be estimated solely based on experimental observations?

Although unknown variables can be generated by models, there have to be a certain consistency in the patterns related to the receptor pharmacology.

It would be beneficial to define case by case whether a certain synergistic effect observed by experiments are due to intracellular signalling cross-talks by mutual components shared between two distinct pathways of independent receptor actions, or resulted from co-activation of two receptors that influence physically and directly the function of one another by associations.

The beauty of models implementing machine learning is that once a system is relatively established, it could be continuously developed forward to a realistic representation, with probable variations in hands in accordance with empirical findings obtained for different conditions. This venture should be worthy of some efforts.

References

Aydin Z et al 2011. BMC Bioinformatics. 12:154. doi:10.1186/1471-2105-12-154

Bender C et al 2010. Bioinformatics 26: i596–i602. doi:10.1093/bioinformatics/btq385

Brooks SP. 2003. Phil Trans R Soc Lond A. 361: 2681–2697. doi:10.1098/rsta.2003.1263.

Chen X et al 2010. Bioinformatics. 26: i334–i342. doi:10.1093/bioinformatics/btq175

Dempster AP, Laird NM & Rubin DB. 1977. J Roy Statist Soc Ser B. 39: 1–38.

Geisler & WS & Diehl EL. 2002. Phil Trans R Soc Lond B 357: 419-448. doi: 10.1098/rstb.2001.1055

Husmeier D 2003. Bioinformatics 19: 2271–2282. doi: 10.1093/bioinformatics/btg313

Perrin BE et al 2003. Bioinformatics 19: ii138–ii148. doi: 10.1093/bioinformatics/btg1071.

Rasmussen and Kellis 2011. Mol Biol Evol. 28: 273–290. doi:10.1093/molbev/msq189

Xu Y et al 2011. J. R. Soc. Interface. 8: 555–567. doi:10.1098/rsif.2010.0384

Wu CFJ. 1983. Ann. Statist. 1: 95–103.

Notes on the Serpentines

29 May 2011

No comments:

Post a Comment

Arrestin-2

Arrestin-1

murine G12 with GDP

murine G13 with human Rho GEF1 RGS domain

About Me

Arrestin-3

human GRK6 with AMP

bovine Opsin - intracellular view