Architecting, designing and building a robust and reliable analytics platform that satisfy business and operational needs, consist of several steps that should be carefully planned and analysed before bringing them into use. The following section describes a high level overview of processes and storage systems that should be considered in on-premise or cloud environments when evaluating different analytics technologies and solutions.
Data sources and ingestion strategies presents the core foundation of an analytic platform. Some data may change infrequently while other may need to be processed at higher rates, some is structured while other comes in a less structured forms. Platform ingestion layer should consider and enable most of the above cases and additionally take care of fault-tolerance, data availability and ability of horizontal scaling.
Unified stream and batch processing
Fewer and unified processing environments can result in greater developer productivity and less operational complexity. In last few years we could see a number of new processing framework which are suitable for different use cases. When choosing the appropriate framework, we have to often make trade-off decisions between performance and streaming model, delivery semantics, state management and latency requirements. Another important factor to consider is framework maturity and its application ecosystems that enables integration with other operational systems.
Operational and analytics storage layer
One of the first steps in designing a storage layer is defining technical requirements of the storage workload which consequently drives the decision of the underlying storage approach (OLAP, OLTP, HLTP, etc.). Depending on scale and desired performance, this layer often consists of several storage and caching technologies.
Archive storage layer
Despite hardware cost reduction, big amount of data located in specialized systems like datawarehouses present significant cost to the owner. One way how to reduce them is to keep rarely used data on less computational intensive but higher density storage. Such policies often enables owners to optimize their hardware cost and preserve the ability to analyze archival data or meet regulatory compliance.
Long term business value of data lies in the surrounding environment of data services and their efficiency of extracting and providing insights to end users and surrounding systems. On the market we may find a large number of BI and other analytics solutions that tries to address needs of data engineers, data scientist or data data analyst however not many of them covers all of the required functionalities. Enterprise report visualization, data discovery, advanced analytics and custom tailored solutions are just some of the options that have to be considered before establishing the right analytics environment.