Introduction: Embracing Real-Time Market Data in Financial Services

In the fast-paced world of financial services, the ability to access and process real-time market data is crucial. Traditionally, this required significant investment in infrastructure and co-location with data providers. However, with the advent of cloud technologies, firms can now build scalable, low-latency streaming pipelines without the hefty overheads.​


Simplifying Access to Real-Time Market Data

Historically, accessing real-time market data involved complex setups, including physical hardware and dedicated teams to manage connectivity. This approach was often limited to large institutions with substantial resources.

Cloud-based solutions have democratized access to real-time data. By leveraging services like Google Cloud's Pub/Sub, firms can stream market data directly to various applications, including:​

  • Retail trading platforms
  • Risk exposure monitoring tools
  • Index publishing systems
  • Predictive analytics models​

This shift reduces infrastructure costs and accelerates time-to-insight for a broader range of financial entities.


Architectural Overview: Streaming Market Data with Google Cloud

A practical implementation involves using CME Group's Smart Stream service, which provides real-time data from the CME Globex trading platform. The data, initially in UDP multicast format, is forwarded to Google Cloud's Pub/Sub topics, each representing a specific financial instrument.​

To distribute this data to web-based front-ends, an open-source tool named Autosocket is employed. Autosocket acts as a bridge, delivering Pub/Sub messages over WebSocket connections to client applications. This setup ensures low-latency, real-time data visualization suitable for various use cases.​

Figure 1: Multicast adaptation to Pub/Sub

Implementing the Solution: Steps to Build Your Pipeline

To replicate this architecture:

  1. Deploy Autosocket: Set up a Cloud Run instance configured to listen to a specific Pub/Sub topic and relay messages via WebSockets.​
  2. Develop Front-End Application: Create a web application capable of establishing WebSocket connections and updating visualizations in response to incoming data streams.​

This approach is particularly beneficial for applications requiring real-time data feeds, such as trading dashboards or market monitoring tools.​


Enhancing the Pipeline: Integrating with Dataflow and BigQuery

For advanced analytics and machine learning applications, integrating the streaming pipeline with Google Cloud's Dataflow and BigQuery services is advantageous.​

  • Dataflow: A serverless data processing service that can ingest data from Pub/Sub, perform transformations, and output to various destinations.​
  • BigQuery: A scalable data warehouse solution ideal for storing and analyzing large volumes of streaming data.​

By connecting Pub/Sub to Dataflow, and subsequently to BigQuery, firms can perform real-time analytics, generate insights, and feed data into machine learning models for predictive analysis.​

Figure 2: Websocket endpoint client connectivity

​​


Case Study: PayPal's Migration to Google Cloud for Streaming Analytics

PayPal transitioned its streaming analytics infrastructure to Google Cloud's Dataflow service to address challenges related to scalability, cost, and integration. The migration involved:​

  • Shifting from Apache Pulsar to Apache Kafka for data ingestion.​
  • Optimizing data pipelines for performance and efficiency.​

This move enabled PayPal to achieve real-time data processing capabilities, reduce operational overhead, and enhance the scalability of its analytics platform.​
Use the following SocketManager class to establish and manage a WebSocket connection for streaming real-time market data to your front-end application.


function SocketManager(endpoint) {
  this.ws = undefined;
  this.lastReload = undefined;
  this.messagesReceived = 0;
  this.endpoint = endpoint;

  if (this.endpoint) {
    this.connect();
  }
}

SocketManager.prototype.processMessage = function(msg) {};

SocketManager.prototype.disconnect = function() {
  if (this.ws) {
    this.ws.close();
    this.ws = undefined;
  }
};

SocketManager.prototype.getMessageRate = function() {
  const now = new Date().getTime();
  const runningTime = now - this.lastReload;
  return (this.messagesReceived / (runningTime / 1000)).toFixed(2);
};

SocketManager.prototype.connect = function() {
  if (this.ws) {
    this.ws.close();
  }

  this.ws = new WebSocket(this.endpoint);

  this.ws.onmessage = (msg) => {
    this.messagesReceived++;
    this.processMessage(JSON.parse(msg.data));
  };

  this.ws.onclose = () => {
    console.log("closing", this.endpoint);
    this.ws = undefined;
  };

  this.ws.onerror = (error) => {
    console.error("error:", error);
  };

  this.lastReload = new Date().getTime();
};

PayPal's Dataflow execution details​​


Conclusion: Leveraging Cloud Technologies for Real-Time Market Data

The integration of services like Pub/Sub, Dataflow, and BigQuery within Google Cloud provides financial institutions with a robust framework for real-time market data processing. This architecture supports various applications, from trading platforms to risk management systems, enabling firms to respond swiftly to market changes and make informed decisions.​

By adopting these cloud-native solutions, financial services can enhance their agility, reduce infrastructure costs, and unlock new opportunities for innovation in data analytics and machine learning.​