Our Fortune 500 client had various Big Data systems that generated logs with information of incoming traffic from various countries. These logs amounted to hundreds of terabytes each day, and the overall log size was in the order of petabytes!
They needed to configure real-time search for some of their systems and an ability to query on logs near-real time on the configured sources. By doing so, they could correlate user and IP activity across various systems, and investigate any suspicious activity immediately.
We provided a solution with the capability of creating pipelines using custom Spark and ingestion of near-real time data into GrayLog with Elastic as the backend. The pipelines could join and filter streams, and create enriched data before it was sent for indexing. A Search Portal with faceted filters was provided to search data across different historical ranges. The Search criteria and the results could be saved and accessed for further reference. The portal could also correlate User and IP activity across datasets and search results, and provide output charts and data downloads.