GPCR Prediction Ensemble | Background Page
G Protein-Coupled Receptors
G protein-coupled receptors (GPCRs) are the largest and most diverse group of membrane receptors involved in many physiological processes such as vision, smell, and inflammation. They are the potential targets of many prescribed drugs [Gusach et al, 2020], and offer many opportunities for new drug discoveries [Hauser et al., 2017, Yang et al. 2021].A GPCR molecule consists of seven hydrophobic transmembrane helices connecting an extracellular ligand binding site that induces conformational changes to a heterotrimeric G protein binding site in the intracellular surface [Manglik and Kruse, 2017] (Watch video), as illustrated in Figure 1 below published by Li et al. 2002.

Downstream signaling targets of GPCRs can be associated with cancer growth and development [Bar-Shavit et al, 2016], a wide range of cardiovascular [Wang et al, 2018] and liver diseases [Chiang, 2013] and several psychiatric disorders [Komatsu et al, 2019] (Watch video).
GPCR Classification
There are approximately 400,000 GPCR sequences available in the UniProt database [Consortium, 2018] among which only about 3000 GPCRs from 300 distinct organisms or viral strains are confirmed [Begum et al, 2020]. Following figure shows the distribution of confirmed GPCRs among different species groups and majority of them belonging to mammals.
Groups with more than 40 sequences are shown as separate bars. The remaining ones are grouped as “Others” [Begum et al, 2020].
Based on their amino acid sequences and functions, GPCRs can be classified into several distinct classes or families [Basith et al, 2018]. The GRAFS system places the Vertebrate GPCRs into 5 distinct classes: glutamate, rhodopsin-like, adhesion, frizzled/Taste2, and secretin [Schiöth and Fredriksson, 2005]. An alternative classification scheme that applies to both vertebrate and invertebrate GPCRs is the IUPHAR system [Armstrong et al, 2019]. Figure 3 shows the distribution of GPCR classes in percentage.

Based on the endogenous ligand type, receptor families and their subtypes the GPCR classes are further classified into sub-family, sub-sub family and sub-type levels respectively [Pándy-Szekeres et al, 2018]. The number of groups at each level for the different families are shown in Table I.

GPCR-Prediction Ensemble (GPCR-PEn)
GPCRs are the largest group of membrane proteins in eukaryotes and also the largest class of drug targets [Venkatakrishnan et al, 2019]. There are approximately 350 non-olfactory receptors in human genome and only ~200 of these have known ligand binding and signal transduction pathway information [Morri et al, 2018]. The rest of the ~150 receptors, also known as orphan GPCRs have unknown functions and identifying these could be very beneficial for the modern medicinal drugs [Morri et al, 2018]. That is why predicting the GPCRs and classifying them to understand the functionality has a very high significance in the modern medical field (Watch video).The available prediction and classification tools utilize different sets of protein features (e.g. amino acids or dipeptide counts, transmembrane structure, etc.) to train and test their statistical and machine learning algorithms. Using different groups of features and algorithms can be useful to identify one class of proteins but not necessarily for the rest of them. Therefore, using an ensemble approach where robust tools are incorporated, increases the accuracy level for prediction and classification of GPCRs. Following is the list of tools that are currently available in our prediction ensemble.
- BLAST [Altschul et al, 1997] search determines if a given sequence bears close sequence similarity to known sequences (from the GPCR-PEnDB [Begum et al, 2020] database), operating under the assumption that sequence similarity may indicate sequence homology.
- PFAM utilizes protein domain profiles in order to identify families and functional domains [Finn et al, 2016]. PFAM can also help to determine class as there are several class-specific 7-transmembrane profiles available [Finn et al, 2016]. 7tm_1 denotes the Rhodopsin-like class, 7tm_2 denotes Secretin-like, 7tm_3 denotes the Glutamate class and several other profiles denote other classes [Finn et al, 2016].
- GPCR-Tm TMHMM predicts the number of transmembrane helices and locate them in the ORFs [Krogh et al, 2001]. The output data is parsed according to length in amino acids and number of predicted helices where ORFs with lengths between 100 and 234 amino acids, and containing at least 3 predicted trans-membrane helices are parsed into the dataset “Not Full Length GPCR Candidate ORF” [Guerrero et al, 2016]. ORFs of at least 235 aminoacids long and containing at least 6 predicted helices are parsed as candidate full-length GPCRs and designated for stop codon analysis [Guerrero et al, 2016].
- GPCRpred uses Support Vector Machine (SVM) with the dipeptide composition feature of a protein sequence for the prediction of a GPCR and the family level classification [Bhasin and Raghava, 2004].
- PCA-ISA (Principal component analysis & intimate sorting) is the improved version of the existing tool by Peng et al. (2010) and can predict all the lower levels of classification.
- Log-reg uses the penalized multinomial logistic regression using 1360 sequence features to predict and classify all the lower levels.
- MLP-NN is the multi-layer perceptron neural network which also utilizes 1360 sequence features to predict and classify all the lower levels.
By combining these various tools and approaches we can make a more accurate prediction of a query protein as a GPCR and also provide lower-level classifications based on the ligand and receptor binding.
GPCR Structure
The rapid growth of available three-dimensional (3D) structural data of the biologically and pharmaceutically important G-protein coupled receptors has revealed useful information for studying GPCR ligand binding modes, and G-protein binding mechanisms. These structures (Figure 4) shed light on structural similarity and diversity of the GPCR ligand recognition, GPCR functional states, and characteristics of a receptor structure that is competent for G-protein binding [Zhang et al. 2015]. Comparative analyses of GPCR sequences and structures assumes a central role in understanding GPCRs and their functions [Hasegawa & Holm, 2009]. Assuming an evolutionary continuity of structure and function, describing the structural similarity relationships between protein structures allows scientists to infer the functions of newly discovered proteins [Holm & Laakso, 2016]. In drug discovery, receptor structure-based virtual screening assumes knowledge of the receptor structure with molecular docking being one of the most prominent structure-based virtual screening techniques [Raschka & Kaufman, 2020] (Watch video).
Studies have shown that structural pattern defined by transmembrane intramolecular interactions are suitable for comparison of GPCR 3D structures and unsupervised distinction of the receptor states [Koensgen et al. 2019]. As of February 4, 2021, there were 817 atomic-level 3D GPCR structures related to 161 distinct GPCRs, of which 107, 27, 16, 2, 0, 9, and 0 are in Class A, B, C, D, E, F, and T2R respectively on PDB [Figure 5]. There were no 3D structures for Class E, however, Class A GPCRs had the most 3D structures on PDB.

GPCR Ligand Binding
The binding of ligands to GPCRs play important roles in activating signal transduction pathways and cellular responses [Seo et al., 2018]. Various molecules like hormones, lipids, peptides and neurotransmitters exert their biological effects by binding to GPCRs coupled to heterotrimeric G-proteins, which are highly specialized transducers able to modulate diverse signaling pathways [Lappano & Maggiolini, 2012]. The detection of ligand-binding sites is often the starting point for protein function identification and drug discovery [Brylinski & Skolnick, 2008]. Given the physiological and pharmacological relevance of GPCRs, unraveling GPCR ligand binding can be extremely useful both for understanding receptor function and for designing new drugs [Alfonso-Prieto et al., 2019]. As GPCR ligand interactions are generally specific in terms of the ligand involved and the binding location where the interaction takes place, many efforts have been made to develop computational methods that can predict protein ligand binding sites [Kukol, 2014]. Various bioinformatics approaches to identification of drug-receptor binding have been proposed, focusing on calculating protein-ligand binding affinity that largely rely on 3D structures of proteins and ligands [Seo et al., 2018]. The rapidly growing body of public data on three dimensional (3D) structures of GPCRs and GPCR ligand interactions has made computational prediction of GPCR ligand binding a convincing option to high throughput screening and other experimental approaches during the beginning phases of ligand discovery. Such predictions are cost-efficient and can be important aides for planning in vitro experiments to help elucidate signaling pathways and expedite drug discovery.References
- Alfonso-Prieto, M., Navarini, L., & Carloni, P. (2019). Understanding ligand binding to g-protein coupled receptors using multiscale simulations. Frontiers in molecular biosciences, 6, 29. PMID: 31131282
- Altschul, S. F., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research,25(17), 3389–3402.PMID: 9254694
- Armstrong, J. F., et al. (2019). The IUPHAR/BPS Guide to PHARMACOLOGY in 2020: extending immunopharmacology content and introducing the IUPHAR/MMV Guide to MALARIA PHARMACOLOGY. Nucleic Acids Research 48(D1): D1006–D1021. PMID: 31691834
- Bar-Shavit, R., et al.(2016). G protein-coupled receptors in cancer. International Journal of Molecular Sciences. MDPI AG.PMID: 27529230
- Basith, S., et al. (2018). Exploring G protein-coupled receptors (GPCRs) ligand space via cheminformatics approaches: Impact on rational drug design. Frontiers in Pharmacology 9(MAR): 1–26. PMID: 29593527
- Begum K., et al. (2020). GPCR-PEnDB: A database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors. Database (2020). PMID: 33216895
- Berman, H. M., et al. (2000). The protein data bank. Nucleic acids research 28(1): 235–242. PMID: 10592235
- Bhasin, M. and Raghava, G. P. S. (2004). GPCRpred: An SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nucleic Acids Research 32(WEB SERVER ISS.): 383–389. PMID: 15215416
- Brylinski, M., & Skolnick, J. (2008). A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proceedings of the National Academy of Sciences of the United States of America, 105(1), 129–134. https://doi.org/10.1073/pnas.0707684105. PMID: 18165317
- Chiang, J. Y. L. (2013). Bile acid metabolism and signaling. Comprehensive Physiology. Compr Physiol 3(3): 1191–1212. PMID: 23897684
- Christopher, J. A., et al. (2018). Structure-based optimization strategies for G protein-coupled receptor (GPCR) allosteric modulators: a case study from analyses of new metabotropic glutamate receptor 5 (mGlu5) X-ray structures. Journal of medicinal chemistry, 62(1), 207-222. PMID: 29455526
- Consortium, T. U. (2018). UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research 47(D1): D506–D515.PMID: 30395287
- Finn, R.D., et al. (2015). The Pfam protein families database: towards a more sustainable future. Nucleic acids research, Oxford University Press 44(D1): D279–D285. PMID: 26673716
- Guerrero F.D., et al. (2016). Prediction of G protein-coupled receptor encoding sequences from the synganglion transcriptome of the cattle tick, Rhipicephalus microplus. Ticks Tick. Borne. Dis., 7, 670–677. doi: 10.1016/j.ttbdis.2016.02.014 .PMID: 26922323
- Gusach, A., et al. (2020). Beyond structure: emerging approaches to study GPCR dynamics. Current Opinion in Structural Biology. Elsevier Ltd 63: 18–25. PMID: 32305785
- Hasegawa, H., and Holm, L. (2009). Advances and pitfalls of protein structural alignment. Current Opinion in Structural Biology, 19(3), 341–348. PMID: 19481444
- Holm, L., and Laakso, L. M. (2016). Dali server update. Nucleic Acids Research, 44(W1), W351–W355. PMID: 27131377
- Koensgen, F., et al. (2019). Unsupervised classification of G-protein coupled receptors and their conformational states using IChem intramolecular interaction patterns. Journal of chemical information and modeling, 59(9), 3611-3618. PMID: 31408338
- Komatsu, H., et al. (2019). Potential utility of biased GPCR signaling for treatment of psychiatric disorders. International Journal of Molecular Sciences. MDPI AG. PMID: 31261897
- Krogh, A., et al. (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of molecular biology, 305(3), 567-580. PMID: 11152613
- Kukol, A. (2014). Molecular modeling of proteins: Second edition. Molecular Modeling of Proteins: Second Edition, 1215, 1–474. DOI: 978-1-4939-1465-4
- Lappano, R., & Maggiolini, M. (2012). GPCRs and cancer. Acta Pharmacologica Sinica, 33(3), 351–362. https://doi.org/10.1038/aps.2011.183 PMID: 22266725
- Li, J., et al. (2002). The Molecule Pages database. Nature. Nature Publishing Group 420(6916): 716–717. PMID: 17965093
- Manglik, A. and Kruse, A. C. (2017). Structural Basis for G Protein-Coupled Receptor Activation. Biochemistry. American Chemical Society, 5628–5634. PMID: 28967738
- Morri, M., et al. (2018). Optical functionalization of human Class A orphan G-protein-coupled-receptors. Nat COmmun. 2018 May 16;9(1):1950. PMID: 29769519
- Pándy-Szekeres, G. et al. (2018). GPCRdb in 2018: Adding GPCR structure models and ligands. Nucleic Acids Research 46(D1): D440–D446. PMID: 29155946
- Peng, Z-L., et al. (2010). An improved classification of G-protein-coupled receptors using sequence-derived features. BMC bioinformatics 11(1): 420. PMID: 20696050
- Raschka, S., and Kaufman, B. (2020). Machine learning and AI-based approaches for bioactive ligand discovery and GPCR-ligand recognition. Methods, 180(January), 89–110. PMID: 32645448
- Schiöth, H. B. and Fredriksson, R. (2005). The GRAFS classification system of G-protein coupled receptors in comparative perspective. General and Comparative Endocrinology 142(1-2 SPEC. ISS.): 94–101. PMID: 15862553
- Seo, S., Choi, J., Ahn, S. K., Kim, K. W., Kim, J., Choi, J., Kim, J., & Ahn, J. (2018). Prediction of GPCR-Ligand Binding Using Machine Learning Algorithms. Computational and Mathematical Methods in Medicine, 2018. https://doi.org/10.1155/2018/6565241 PMID: 29666662
- Sehnal D, Rose A.S., Kovca J., Burley S.K., Velankar S. (2018) Mol*: Towards a common library and tools for web molecular graphics MolVA/EuroVis Proceedings. doi:10.2312/molva.20181103). DOI: molva20181103
- Venkatakrishnan, A. J., et al. (2019). Diverse GPCRs exhibit conserved water networks for stabilization and activation. 116(8). PMID: 30728297
- Wang, J., et al. (2018), G-protein-coupled receptors in heart disease. Circulation Research. Lippincott Williams and Wilkins, 716–735. PMID: 30355236
- Zhang, D., et al. (2015). Structural studies of G protein-coupled receptors. Molecules and cells, 38(10), 836. PMID: 26467290