2014 Schedule


The schedule is primarily organized by programming languages and themes.  Traditionally, we’ve had separate R, Python, or Hadoop events; however this year we are having them all at one event. Please note that there are talks with software packages beyond R, Python and Hadoop; such as MatLab, SQL, D3, and others.


Non-Tech: Overview of Data Science-Based Initiatives, Program, Company, etc.
Big Data
Mutliple Languages



Here is a condensed schedule that mirrors the registration for the event; i.e. Fri PM (6-9), Sat AM (9-12), Sat PM (1-4), Sun AM (9-12), and Sun PM (1-4). Since the registration is by day/time you can attend talks in either session. Sessions, such as DataViz or Text Analytics, have talks covering different software and approaches; and as such we believe it will be beneficial for folks of different programming language background to see what is possible on different platforms. Albeit, all that said, R is more on Saturday, and Python on Sunday as we realize many folks often are interested in one or the other

PMR Beginner Workshop
AMData Science & EngineeringText Analytics and Big Data
PMData VisualizationData Science & Engineering
AMBeginner PythonBig Data, R, and Parallel Computing
PMPyData + New Python Libraries!Python, Machine Learning, & Big Data
PM2R Beginner WorkshopMore PyData



Below is an expanded schedule with the sessions AND talks.   Most talks have links to a speaker bio page that includes a talk description. Also, a couple talks are more on Design Principles   than on coding. These focus on thinking through different design options whether for a data science project or data visualization.

1st Floor
10th / 11th Floor
FRIDAYR Beginner Workshop
6:00 - 8:45R Beginner Bootcamp - Joe Kambourakis and John Verostek

SATURDAYData Science & EngineeringTwitter, Text Analytics and Big Data
9:00 - 10:20Introduction to the Data Science Method - David WeismanUsing Twitter to Analyze Switching Across Cellphone Carriers - Tanya Cashorali

Topic Modelling Using R - Herb Susmann
10:20 - 10:30BreakBreak
10:30 - 11:00

11:00 -12:00
Combining R Libraries into Automated Workflows - Dag Holmboe

General Linearized Mixed Models (GLMMs) in R - Julia Pilowsky
Optimizing Multilingual Search Using Solr- David Troiano

Mining of Massive Datasets Using Locality Sensitive Hashing (LSH) - J Singh and Teresa Brooks

12:00 - 1:00



SATURDAYData VisualizationData Science & Engineering
1:00 - 1:50DataViz Design Principles - Angela BassaMassive Feature Selection Using Supercomputing in R - Jean-Loup Loyer
1:50 - 2:40Python DataViz Tour - Ian Stokes-ReesRobots, Small Molecules & R - Ingredients for Exploring and Predicting Biological Effects - Rajarshi Guha
2:40 - 3:00BreakBreak
3:00 - 3:50Interactive DataViz with R: ggvis, rCharts, Shiny - Abhinav Sarapure High Dimensionality in Large Datasets - Sri Krishnamurthy
3:50 - 4:40A Case Study Visualizing Boston's Subway System Using D3 and other Open Source tools - Mike Barry and Brian CardPrinciples of Data Engineering- Edmund Jorgenson and Matt Papi

Baseball and Data Engineering using Statistics, R & Python - Dan Milstein

SUNDAYBeginner PythonBig Data Tools and Parallel Computing
9:00 - 9:50iPython Tutorial - Imran Malek
Creating Custom Big Data Tools including Models, Hadoop Clusters, and DataViz - DigitasLBI

Introduction to Massively Parallel Databases - Wes Reing
10:00 - 10:50Regression Analysis with Python, Pandas, and StatsModels - Allen DowneyR for Analyzing Big Expression Data Parallel Computing - Yuefeng Lu

Scaling R with ScaleR Packages - Steve Belcher
11:00 - 11:50More Pandas! - Mali Akmanalp

Orange Canvas: Python Data Mining - Justin Sun

Open-Source Data-Analysis for Bio-tech - Will Sutton

Introduction to Hive with Case Study on Storing and Querying Protobuf Logs in Hive - Muralikumar Venkat

Gamification and Big Data - Nick Lim

12:00 - 1:00

Lunchtime Talk: Visualizations for Exploring Data - Patrik Lundblad


SUNDAYMore PyData! + New Python LibrariesPython, Machine Learning, & Big Data
1:00 - 1:50Glue: a hackable user interface for multidimensional data exploration - Chris Beaumont

Data Science, YouTube, & Media Disruption - Pete Martin of Pixability

Building Predictive Models in Cloud using Microsoft Azure Machine Learning - Roope Astala
1:50 - 2:40Statistical inference in Python the NIFTY way - Mike Bell

DrivenData.org: Python-based Site for Data Science Competitions - Greg Lipstein & Peter Bull
Using Python's Machine Learning and Dynamic Control Libraries for Online Advertisement Analysis - Michael Els

2:40 - 3:00BreakBreak
SUNDAYR BootcampMore PyData!
3:00 - 3:50R Beginner Bootcamp - Joe Kambourakis and John VerostekIP-Reputation Scoring System in Python and Hadoop - Stuart Layton
3:50 - 4:40R BootcampWeb Scraping Using Python's Beautiful Soup and Selenium - Laurie Skelly
4:40 - 5:30R Bootcamp

Many Thanks to Our Sponsors