With a new biomedical research tool known as cDNA microarray, medical research has fundamentally changed the ways we look at diseases. It is now possible to analyze patterns of expression of thousands of genes in a single experiment and use this expression profile to classify diseases and stratify patients. Applying this technology to multiple myeloma as a tool for the study of the disease gene expression, we have been using our 4.3K Myeloma Chip to analyse cell lines and patient samples and identify genes that are differentially expressed between multiple myeloma and non-myeloma cell lines, and between malignant and non-malignant plasma cells. We have also been using in silico data mining to augment the diversity of genes to be printed on the second generation myeloma chip. Since funding support was granted, the following activities had been initiated:
- Myeloma Gene Index
The Myeloma Gene Index, which is our intended repository of microarray data, has been updated. In order to make the Myeloma Gene Index more interactive, we have updated the site by adding a search engine. Any genes of interest can be queried from the Myeloma Gene Index using three choices of databases: (1) genes identified by sequencing (2) genes identified by microarray hybridizations and (3) genes present in our 4.3K Myeloma Array. In the future a search engine for the expression patterns of a particular gene based on microarray data will be incorportated into the website’s search function. Currently, our microarray hierarchical cluster analysis results (2D heat maps) are only available from the Myeloma Gene Index’s Supplement page in a non-searchable format. To access the Myeloma Gene Index, go to www.uhnres.utoronto.ca/akstewart_lab/mgi.html.
- Relational Database for microarray images
Because of the generation of a significant amount of image data files, we have set up our own relational database system that supports the storage of acquired microarray images. Our system of choice is the GeneTraffic Microarray Database and Analysis System (Iobion Informatics, La Jolla, California) which supports complete annotation of data based on current Minimum Information About a Microarray Experiment (MIAME) standards for microarray research (www.mged.org).
- In search of a highly discriminatory gene dataset
We have generated a molecular portrait of 18 myeloma cell lines and 6 hematopoietic non-myeloma cell lines using a total of 5,460 quality controlled spots corresponsing to 152,880 datapoints. Statistical analysis of our microarray data from myeloma and non-myeloma cell lines identified 34 genes that are significantly up-regulated (after immunoglobulin lambda, kappa and J chain genes were filtered out), and 18 genes that are down-regulated in the myeloma cell lines. Among the significantly up-regulated genes in this analysis include heat shock 70 kD protein 5 (also called immunoglobulin heavy chain binding protein), a gene known to be important in the folding and oxidation of antibodies in vitro. The interferon regulatory factor 4 (MUM1/IRF4) is also significantly but not uniquely associated with the myeloma cell lines. MUM1/IRF4 gene expression has been suggested to relate to the stage of differentiation of malignant B plasma cells and has been identified as an oncogene transcriptionally activated by t(6;14)(p25;q32) chromosomal translocation in multiple myeloma. Additional genes also include those involved in B cell biology such as syndecan, BCMA, PIM2 and XBP1. A number of genes that we identified that appear to be differentially expressed between myeloma and non-myeloma cell lines are novel uncharacterized genes matching sequences only in the draft sequence of the human genome. The most significantly up-regulated gene in myeloma cell lines and patient samples was hypothetical protein MGC3178. Further sequence analysis showed that this gene encodes for a protein that contains thiroredoxin domains, a sequence motif present in protein disulfide isomerases (PDI). Therefore, hypothetical protein MGC3178 may be involved in rearrangement of both intrachain and interchain disulfide bonds in proteins, but may also act as a cysteine-type endopeptidase, phospholipase, or a combination of these functions (SOURCE Database).
- In silico data mining of published microaray data to identify genes useful in the molecular classification of myeloma
We have also been doing in silico data analysis to mine publicly available microarray datasets in order to identify genes that may be included in the second generation Myeloma Chip. Using Statistical Analysis of Microarray software, a freeware from Stanford University, we are currently mining raw data published by J. Shaughnessy's laboratory to identify gene signatures in patients with either cyclin D1 or FGFR3 translocations.
- Microarray hybridizations using patient samples
Using patient samples, we have optimized our methodology for use in a limited amount of RNA samples. To date, we have analyzed 29 patient samples using this technique on our 4.3K Myeloma Chip. In the following months, we will attempt to analyze more patient samples as they become available.
- Gene set for Myeloma Chip 2
We have to date identified close to 250 candidate genes that we plan to print in our second generation Myeloma Chip. In the following months, we will refine our data set based on additional information from analysis of our hybridizations using patient samples. The question is how many genes are necessary to explore reasonably well the expression portrait of myeloma. In a report of breast cancer classification using microarray, careful selection of variably expressed genes of only 486 was able to generate clusters that was used to classify the tumors (Perou CM et al., (2000) Nature; 406:747-52). This study suggests that careful scrutiny of the dataset and selection of variably expressed genes of as low as a few hundreds can produce a meaningful classification. Therefore, printing a limited but well-chosen set of genes on our second generation Myeloma Array is not only feasible, but may lead to the development of lower cost expression array that can be used to study large set of clinical samples.
A manuscript related to this work has been accepted in a peer-reviewed journal and it is currently available on-line as a Blood first edition publication.