Apsalar’s Scalable and Distributed User Store

04/05/17

By Raj Kandasamy
VP – Engineering

At Apsalar, we receive and process billions of data points each month from mobile devices, ad partners and server-to-server calls. We deal with this volume at scale with high throughput and low latency systems. Our Mobile Marketing Cloud and its DMP platform receives billions of events a day, processes hundreds of user segments and distributes audiences and data in real-time across multiple destinations.

In a previous post, we discussed batch processing our massive data pipeline with Apache Spark and C++ Parquet-Writer. Though the data flow and data sources are similar, the Apsalar DMP platform is built with a pipeline that must process data in real-time at all times.

We ingest billions of data points every day, including ad events, SDK events and API events to our user store. From there, our platform accesses records in the user store to build, evaluate and distribute user segments in real time.

To achieve these objectives, the Apsalar user store is a highly scalable, distributed and fault tolerant key-value store, which performs hundreds of thousands of transactions per second. The user store collects and stores a user’s recent activities and historical data together for faster and easier access. The following diagram outlines the Apsalar real-time DMP platform and the role of the user store in that system.

Why Apsalar Leverages a User Store

Mobile data has a defined key and set of values. Looking up device information quickly and timely is critical to delivering maximum value in an environment where real-time insights matter so much for delivering maximum business value. It’s essential that our user store deliver on all of the following criteria in order to best serve the DMP and our clients:

  •     Process thousands of transactions per second
  •     Accommodate partial reads and writes
  •     Handle data contention when updating the same user record from multiple processes
  •     Accommodate data expiry and data isolation
  •     Evaluate expressions in real-time
  •     Enable secondary indexes for faster data scans
  •     Support Golang API

We are big on Golang at Apsalar. We have implemented most of our infrastructure and DMP platform including data pipeline and data processing in Golang. Having a centralized, high-performing user store and services built on Golang enables Apsalar DMP and Attribution platform to scale so as to support an ever-growing data volume.

Share Button