|Rajarshi trained as a computational chemist and over the last 10 years has worked in a variety of areas related to computational drug discovery including the development of novel algorithms to characterize various aspects of structure-activity relationships, building predictive models of bioactivity & physical properties and implementing software tools, databases and APIs that make these methods and models available to fellow chemists and biologists. He has been using R for the last 10 years and has created a number of packages that enable cheminformatics within R. He currently works at NCATS (National Center for Advancing Translational Sciences) and prior to this was a visiting Assistant Professor in the School of Informatics, Indiana University.||Robots, Small Molecules & R - Ingredients for Exploring and Predicting Biological Effects |
High throughput screening (HTS) employs robotic platforms to screen thousands to hundreds of thousands of molecules in one or more biological assays. Naturally, this leads to large amounts of, possibly multidimensional, numeric data. But on it's own such data is of limited value. To enable actionable results these datasets must be linked to chemical structure, target information & structure and even textual data (academic literature, patents). A key output of such an activity is the ability to predict whether a molecule will exert a biological effect and ideally, how it exerts such an effect. Supporting this requires the development of infrastructure & pipelines for the storage, search, visualization and analysis of these heterogeneous data types.
In this talk I will discuss how R plays a wide ranging role in the analysis of small molecule HTS data - ranging from database access to predictive modeling. I'll start of with a high level overview of the data types & challenges that are faced at each step of the pipeline and how R is used to address them. Given the central role of chemical structure in these problems I'll discuss R tools to query and manipulate them and show how they and related data structures are used for predictive modeling. Finally I will provide a high level overview of how a variety of general techniques (network analysis, spatial correlations etc) can be fruitfully applied to and the R packages that are used.
|Big Data & R