|In the 50th Anniversary issue of Communications of the ACM in 2008, two pieces of "Breakthrough Research" were cited. One was Map Reduce, the other was clustering based on Locality Sensitive Hashing (LSH).||Twitter & Text Analytics
|J Singh has been the CTO of various startups and early stage companies in the Boston area, architecting cloud-based platforms and helping bring them to market. He has been an invited speaker at many technology seminars and venture forums, generally speaking on cloud computing and Big Data. J is the organizer of the Boston Cloud Services meetup group. In his day job, he is a Principal at DataThinks and also teaches part time at WPI.||Teresa Nicole Brooks is a software engineer with a passion for all things data. She has both professional and academic experience in natural language processing, information retrieval, big data processing and search. Her interest in web-scale data mining and search drives her interest in near duplicate document detection and locality sensitive hashing. Some of her technical interests are network and software security, artificial intelligence, knowledge extraction, and recommendation systems.|
Teresa received a Masters degree in Computer Science from Pace University in 2010. While at Pace, she successfully published a graduate thesis for MyFido, an intelligent RSS Feed aggregator. MyFido uses natural language processing and other artificial intelligence techniques to make suggestions based on passive and non-passive observed user interests.
Teresa currently works for Xero, Inc's “Fringe” team in NYC. She lives with her cat Molly and dog Rondo
|Locality Sensitive Hashing is for large data sets. Want to know if a piece of writing was plagiarized from the web and modified slightly so as not to be an exact match? Want to see if you have pictures of a suspect in your archives? Curious about where a fragment of Fruit Fly DNA might occur in Humans? LSH will get you there faster than most other techniques.|
In January, we started an open source project called OpenLSH and we'll be introducing it at Data-Con.