During the last years NoSQL databases have been developed to ad-dress the needs of tremendous performance, reliability and horizontal scalability.
NoSQL time series databases (TSDBs) have risen to combine valuable NoSQL properties with characteristics of time series data encountering many use-cases. Solutions offer the efficient handling of data volume and frequency related to time series.
Developers and decision makers struggle with the choice of a TSDB among a large variety of solutions. Up to now no comparison exists focusing on the specific features and qualities of those heterogeneous applications.
This paper aims to deliver two frameworks for the comparison of TSDBs, firstly with a focus on features and secondly on quality. Furthermore, we apply and evaluate the frameworks on up to seven open-source TSDBs such as InfluxDB and OpenTSDB.
We come to the result that the investigated TSDBs differ mainly in support- and extension related points. They share performance-enhancing techniques, time-related query capabilities and data schemas optimized for the handling of time-series data.
Table of Contents
1. Introduction
2. Background
2.1 Distributed Systems and NoSQL Databases
2.2 Time Series and Time Series Data
2.3 Time Series Databases (TSDBs)
2.3.1 Time Series Data Implications
2.3.2 Drawbacks of Relational Databases
2.3.3 Benefits of NoSQL Databases
2.3.4 Architecture of Time Series Databases
3. Related Work
4. Comparison Frameworks
4.1 Feature-oriented
4.2 Quality-oriented
4.2.1 NoSQL-specific characteristics
4.2.2 TSDB-specific characteristics
5. Application
5.1 Feature-oriented
5.1.1 Applied to open-source TSDBs
5.2 Quality-oriented
5.2.1 Applied to InfluxDB
5.2.2 Applied to OpenTSDB
5.2.3 Comparison Overview
6. Results
7. Conclusions
8. References
Objectives & Core Topics
This paper aims to provide a structured comparison of NoSQL-based time series databases (TSDBs) by developing two distinct evaluation frameworks — one focused on general features and one on quality attributes — to assist developers and decision-makers in navigating the diverse landscape of available solutions.
- Development of a feature-oriented comparison framework for TSDBs.
- Development of a quality-oriented comparison framework focusing on system architecture.
- Application and evaluation of these frameworks on selected open-source TSDBs (e.g., InfluxDB, OpenTSDB).
- Comparative analysis of architectural trade-offs in distributed time-series data storage.
- Investigation of scalability, extensibility, and performance in modern NoSQL time series solutions.
Excerpt from the Book
Architecture of InfluxDB
Fig. 2 shows the high level architecture of InfluxDB. The presentation layer consists of API endpoints for customized communication via protocols such as HTTP, Graphite or UDP. Furthermore, an admin user interface serves for query executions and administration purposes such as user management and table management [29].
The application layer consists of a coordinator element including a Raft protocol for coordination purposes as well as a protocol buffer for consensus reasons. InfluxDB has a self-built query engine and parser consisting of Flex as a lexical analyzer and Bison as a parser generator. Shards are used for data organization and replication. At InfluxDB a shard is a block of time that contains data of all series but only for a certain time interval.
In the database layer InfluxDB uses LevelDB as deep storage. In LevelDB a write-ahead-log (WAL) serves to log modifications before they are done in the deep storage. Additionally, the data is structured as a LSM-tree [44] with ordered keys and a value hash function [18].
Summary of Chapters
Introduction: Provides the context of time series data evolution and motivates the need for NoSQL-based TSDBs in modern monitoring and analytics applications.
Background: Defines the essential concepts of distributed systems, CAP-Theorem, and the specific data model requirements for time series information management.
Related Work: Reviews existing literature on time series database theory, traditional implementations, and prior comparison studies of NoSQL and TSDB solutions.
Comparison Frameworks: Establishes the methodology for comparing TSDBs, introducing categorical fields for feature-based and architecture-based quality assessments.
Application: Executes the established frameworks by analyzing seven open-source databases and providing an in-depth quality analysis of InfluxDB and OpenTSDB.
Results: Synthesizes the findings from the application phase, highlighting the maturity differences and architectural trade-offs between the investigated systems.
Conclusions: Summarizes the effectiveness of the proposed frameworks and suggests future research directions, particularly regarding quantitative benchmarking.
Keywords
NoSQL, Time Series Database, TSDB, Comparison Framework, InfluxDB, OpenTSDB, Distributed System, Data Architecture, Scalability, Consistency, Availability, Performance, Meta-data Tagging, Query Functionality, System Monitoring
Frequently Asked Questions
What is the core focus of this research?
This paper focuses on the comparison of NoSQL-based time series databases (TSDBs) because there has been a lack of comprehensive, up-to-date literature evaluating these systems based on their specific features and quality attributes.
What are the primary themes discussed?
The paper covers distributed system architectures, the unique challenges of handling time series data, the limitations of traditional relational databases, and the benefits of applying NoSQL storage engines to these workloads.
What is the primary objective of the work?
The main goal is to deliver two comparison frameworks—one feature-oriented and one quality-oriented—and to apply these frameworks to seven open-source TSDBs to evaluate their suitability for different application needs.
Which scientific methods are utilized?
The author employs a framework-based comparative analysis, where specific qualitative metrics and architectural characteristics are defined and then applied as a test scenario against several existing open-source database solutions.
What topics are covered in the main body of the paper?
The main body revisits distributed system theory (CAP-Theorem), defines TSDB-specific characteristics like data-tagging and query modalities, and presents a detailed architectural investigation of InfluxDB and OpenTSDB.
Which keywords characterize this work?
The research is characterized by terms such as NoSQL, Time Series Database (TSDB), Comparison Framework, OpenTSDB, InfluxDB, System Architecture, and Distributed Systems.
How does the author evaluate the architectural approach of InfluxDB?
The author highlights InfluxDB’s use of the Raft consensus algorithm for cluster coordination and LevelDB for storage, noting its simplicity in deployment compared to more complex stacks.
What is the main finding regarding OpenTSDB’s architecture?
The paper identifies OpenTSDB as a more mature and complex system, built on the Hadoop ecosystem (HBase and HDFS), which provides high scalability but comes with a more demanding deployment and configuration structure.
- Quote paper
- Kevin Rudolph (Author), 2015, A Comparison of NoSQL Time Series Databases, Munich, GRIN Verlag, https://www.hausarbeiten.de/document/299975