|Murali Venkat has 15 years of IT experience. Currently working as a DW architect in Nuance communications. Worked extensively in data and DB architecture for search and retrieval in SOLR/Lucene and other search engines, online advertising and DW architecture.||Introduction to Hive with Case Study on Storing and Querying Protobuf Logs in Hive|
Calllogs are stored in Google Protocol Buffer format and there is need for an easy to use interface to parse and query its contents.
The solution was to do the following:
1. Package individual calllogs into Hadoop Sequence Files
2. Load them into Hive
3. Write a Hive SerDe (Serializer/Deserializer)
4. Write a Hive UDTF which uses the Engineering parser to parse the protobuf and outputs a flat (TSV like) row.