Navigation Bar

Pharmacogenetics of Membrane Transporters

Kathleen M. Giacomini¹, Deanna L. Kroetz¹, Conrad C. Huang², Susan Jean Johns², Michiko Kawamoto², Doug Stryke², and Thomas E. Ferrin²

¹ Department of Biopharmaceutical Sciences
University of California, San Francisco

² Resource for Biocomputing, Visualization, and Informatics
University of California, San Francisco


This multi-disciplinary research program, begun in 2000, focuses on the pharmacogenetics of membrane transport proteins that play a role in drug response pathways. The project seeks to understand the genetic basis for variation in drug response for drugs which interact with membrane transport proteins. This class of proteins is of great pharmacological importance, as it provides the target for about 30% of the most commonly used prescription drugs and is a major determinant of the absorption, distribution and elimination of many others. The project seeks to test the hypothesis that variations in the membrane transporter gene sequences underlie inter-individual differences in response to such drugs. Most human sequence variation is in the form of single-nucleotide polymorphisms (SNPs). The project evaluates the prevalence and distribution of both common and rare (<1% of the population) alleles by genotyping an ethnically diverse reference sample that is larger than those used in previous studies. Our program is structured around six interacting cores: genomics, cellular phenotyping, clinical phenotyping, biostatistics, shared resources/collaborations, and bioinformatics. The project also facilitates studies by contributing data to the publicly available knowledge base, PharmGKB (http://www.pharmgkb.org/). This project is funded by the National Institute of General Medical Sciences and underwent a successful competitive renewal last year (priority score 117) and was recommended for five additional years of funding. The seventh year of the project begins on July 1st, 2006.

The specific aims of the PMT project are to:

  1. Identify sequence variants in genes encoding selected membrane transport proteins;
  2. Determine cellular phenotypes for transporter variants through studies in heterologous expression systems;
  3. Determine whether particular clinical pharmacogenetic phenotypes are associated with specific transporter variants;
  4. Deposit the data in the Pharmacogenetics Network Database, the PharmGKB.

The Resource for Biocomputing, Visualization, and Informatics (RBVI) plays a major role in the overall PMT project, acting as the bioinformatics core. The RBVI has developed several new methods for large-scale data collection, storage, analysis, presentation of experimental results (Figure 1), and uploading of these results to the PharmGKB database. The data processing pipeline begins with obtaining the raw sequence data and chromatogram -trace- files from sequencing instruments located in the UCSF Genomics Core Facility, then storing and analyzing this data for statistically significant mutant alleles and formatting the results for presentation on the PMT project's web site, which the RBVI has created and maintains. Formatting the SNP data for analysis of inferred haplotype is also part of the bioinformatics core function. Included in the data analysis is prediction and display of secondary structure, and indication of which amino acids and their location within the predicted secondary structure are affected by the experimentally-determined SNPs (Figure 2). Thus, this work deals with all aspects of the sequence-structurefunction triad and meshes well with the goals of the RBVI Resource.

Figure 1: Screen dump from PMT intranet web site showing results of data analysis for NAS1 transporter. Links provide access to other data associated with NAS1, and a schematic representation of the exons comprising NAS1 indicate where SNPs have been located. The number in each box indicates the number of SNPs, color-coded as follows: red - non-synonymous, green - synonymous, white - non-coding region (for these studies non-coding extends ~150 bp upstream and 50 bp downstream of each exon).

Because of the large number of experimental samples (currently chosen from a population of 272 ethnically diverse individuals), and the potentially large number of exons involved in membrane transporters (for example, 28 exons in MDR1 [ATP-binding cassette gene]), facile data navigation and visualization is key. To date, the project has sequenced 875 amplicons, representing 40 proteins, and generating over 50 Gbytes of data. In addition to dealing with such a large volume of data, because some results have not yet been published and the data therefore is considered proprietary, we have designed and implemented a secure, password-protected "intranet" web site for use only by PMT investigators.

As experimental results are published in various journals, and there have been many publications to date (see project references), the data associated with the experiments are simultaneously made publicly available via the PharmGKB and on the open-access portion of the project site so that other researchers can have direct access to our data. However, it is on the password-protected intranet site that we initially display the identified nucleotide variants, their locations, frequencies, and resulting amino acid changes. We calculate and display Hardy-Weinberg Equilibrium values to aid in assessing data quality and judging statistical significance. We have recently added Grantham difference calculations to the existing analyses, which include population genetics statistics, consensus with mammalian species, and haplotype estimation. We also recently developed and applied a web-based interface for interrogating and graphing microarray data to several data sets, and we have begun the preliminary steps necessary to look into syntenic regions starting with the most highly clustered genes. Our PMT intranet web site also serves to focus the efforts of all the various core groups participating in the PMT project by acting as a centralized knowledge repository for the project. In this role, we rely on various e-mail mailing lists and web-based forms and tools, all used by the PMT for maintaining and disseminating work scheduling and status information. The PMT intranet web site that is maintained by the

RBVI has been of great value to the many researchers that are part of the overall project, and discussions are underway with our NIH scientific project officer to make this web site publicly accessible. Recent progress on this project also includes the development of a new data transfer protocol for moving experimental data and results from the UCSF Genomic Core Facility. Included are defined-format text files, chromatogram files generated by the sequencing instruments, Standard Chromatogram Format (SCF) files, and Common Assembly Format (CAF) multiple alignment files. A relational database and software infrastructure has been developed to support the new protocol and subsequent transfer of data to the PharmGKB. This automated, modular software infrastructure uses XML-based (eXtensible Markup Language) "LoadMap" files to map data files to the database schema, an approach that allows us to efficiently adapt to changing project requirements. This software was designed to take advantage of the automated pipeline infrastructure being developed by the RBVI (FlexPipe).

Last year, Michiko Kawamoto in the RBVI undertook the evaluation of a new version of the PHASE haplotype estimation program from the University of Washington, which was proposed to replace the version we have been using since early in the PMT project. Our evaluation uncovered unexpected results. In correspondence with Matthew Stephens at UW, we identified the source of the errors and its resolution. We then incorporated version 2 of PHASE into our data processing pipeline for haplotype estimation of PMT data. As part of this effort, we incorporated "goldenpath" mapping data from the human genome into PHASE analysis. We create a simple mechanism so that PMT researchers could download the necessary input files required for running the Network (http://www.fluxus-technology.com/fluxe02.htm) and DNAsp (http://www.ub.es/dnasp/) applications on their desktop computers, and then subsequently upload the output files from these programs to generate corresponding linkage disequilibrium and Four-Gametes test triangle plots on the PMT intranet site (Figure 3). In this way, we provide a flexible approach for researchers to download and interact with their data, yet still keep the data stored in a form for easy transmission to the PharmGKB.

Figure 3: Figure 3. The plot shows pairwise linkage disequilibrium indicated by a shaded square. The statistical significance of linkage disequilibrium was assessed by a Fisher's exact test, without correction for multiple comparisons. Light gray represents 0.05 > P > 0.01, medium gray represents 0.01 > P > 0.001, and dark gray represents P < 0.001. The diagonal line, with exons 1, 5, 12, 14, 20 and 26 labeled, indicates the location of each varying site along the gene. Site pairs that lack the power to detect a significant association are indicated by a dot in the centre of the square. These plots can be created automatically by uploading the output from the DNAsp application to the PMT intranet site and then clicking on the "Triangle Plot for LD" button.

Project plans over the next four years call for an even higher rate of data throughput, with the goal of detecting rare SNPs in 300 transmembrane proteins. To accomplish this goal, it will be necessary to increase the level of automation used for data analysis and presentation. We plan to make use the "FlexPipe" automated pipeline infrastructure to monitor file systems and database tables for changes, and to trigger analysis protocols as needed, which in turn will generate additional data changes downstream. This automated, data-driven event model will not only increase processing throughput but also facilitate a finer-grained approach to the analysis. This should provide greater opportunity for quality assessment and data mining. For example, we plan to develop additional automated procedures for the quality assessment of sequence data and to identify the location and frequency of SNPs within the sample population. It is also desirable to automatically correlate the SNPs we have detected in humans to other organisms such as mouse, which can then serve as models for phenotyping experiments. We plan to extend our automated methods to include not only genomic data, but also data from cellular phenotyping studies and ultimately clinical studies. We also plan to make our analysis tools available on the PMT web site so that other researchers involved in similar projects can easily utilize the software tools we have developed.

Our most recent work has focused on re-sequencing promoter regions of 50 selected transporters. The predicted promoter regions were identified using methods developed by the Myers laboratory (Trinklein, 2003). Briefly, the mRNA sequences derived from two different full-length cDNA libraries (the Japanese (AK) and the MGC (BC)) were aligned to the human genome assembly (hg17) to infer transcription start sites. Additionally, when two or more full-length cDNAs from independent sources mapped transcription start to additional sites, these regions were also identified. The promoter regions are defined as -650 to +50 relative to the mapped transcription start sites. For some genes, the aligned cDNAs specify one or more alternative first exons, with the reminder of the transcript identical to the longest transcript (long). For these genes, this alternative promoter region (alt) is also defined. Susan Johns, staff member in the RBVI, has created a new method for visually representing the wealth of comparative sequence data now available for this project, including the promoter regions (Figure 4). This type of visualization, together with the other visual representations described previously, are examples of the facile data navigation made possible though our collaboration with the RBVI.

Figure 4: A image representing the genomic environment of VACHT (SLC18A3), the vesicular acetylcholine transporter. The range shown is approximately 10kb before and after the single exon gene. Ticks are at 10kb intervals. Rows 1 and 2 give promoter location information with the green block in row 2 being the latest specific VACHT promoter prediction from the Myers lab. Row 3 gives possible PXR Nuclear Receptor Response Elements. The next two rows display conserved genomic sequence regions between human/mouse (4) and human/rat (5) in red as determined by VISTA. RepeatMasker found repeats are shown as cyan blocks in row 6. The UTR regions are given in orange in row 7, exons in magenta in row 8 and the studied sequenced amplicon in blue in row 9. Found PMT determined SNPs are given in row 10 and those from dbSNP in row 11. Predicted transmembrane segment locations are given in row 12 using alternating medium blue and red colored blocks.

References:


Laboratory Overview | Research | Outreach & Training | Available Resources | Visitors Center | Search