Introduction
This short guide ELK Stack for beginners will help you understand the basic terms used with Elasticsearch. Today a business wants to be always up and running. From anomalies on the network to cyberattacks, an organization cannot afford to be out of business due to performance issues. Also, analysis, visualization, and reporting of Key Performance Indicators, metrics and goals is a fundamental task to assess the performance of the company. The applications and hardware supporting the operation of your company produce daily a huge amount of log files. Log files are the most reliable way to assess the performance of your infrastructure. With the proper log aggregation, processing, storage, analysis and visualization you will be able to effectively monitor and identify potential issues as soon as they occur. The centralized log management has become a necessity today. The nature of the IT architecture has fundamentally changed and is comprised of hybrid approaches, cloud solutions, containers, and IOT devices. This reality can be effectively managed only by a log management solution that handles the huge amount of data produced by these diverse devices and applications. ELK Stack is the open-source solution to the aforementioned problem. With the ability to manage large amounts of data and produce the desired results, it provides the necessary visibility you need across the whole infrastructure and facilitates the early detection of performance issues.
Why not use the good, old database concept?
Relational databases are very popular and have advantages but there are some certain use cases where they are not effective enough. First of all, they tend to be slow and not scalable. Also, security is often a big issue when exposed to the internet and need layers to be added to protect them. Log analysis is also a use case where relational databases are not ideal for two reasons. First, due to speed processing issues and second because they cannot store a very big amount of data. Relevance based searching is also a field where databases are not suitable. For example, when we search in the database for a specific term like “presentation”, the database will return if this term exists or not. On the contrary, Elasticsearch will return a relevance score like 0.65 for this term if it exists in uppercase like “Presentation”. This feature is very useful for specific use cases.
What is Elasticsearch, ELK Stack and Elastic Stack?
Elasticsearch is a distributed, open-source engine which can search and analyze all types of data (textual, numerical, geospatial, structured and unstructured), based on Apache Lucene library. Elasticsearch is a No SQL Database meaning that has no relations (non-relational database) and it is easier to scale as compared to a relational database. “ELK” is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana.
- Elasticsearch is a search and analytics engine.
- Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends to Elasticsearch.
- Kibana lets users visualize data with charts and graphs in Elasticsearch.
The Elastic Stack is the combination of the three open-source projects. The last component of the stack, the Beats is a family of lightweight, single-purpose data shippers that were added into the ELK Stack equation.
Where is Elastic Stack used?
With well over 100 million downloads Elastic Stack is suitable for many use cases such as:
- Security analytics
- Logging and log analytics
- Business analytics
- Infrastructure metrics and container monitoring
- Application performance monitoring
- Application search
Why is it so popular?
First of all, Elastic Stack is fast, meaning the latency from the time a source is indexed until it becomes searchable is very short. That’s why it is well suited for time-sensitive use cases such as security analytics and infrastructure monitoring. Elastic Stack is scalable and distributed. The created indices are divided into components called shards (explained below) and each shard can have zero or more replicas. Rebalancing and routing are done automatically. Related data is often stored in the same index, which consists of one or more primary shards, and zero or more replica shards. The shard component initiates the feature of redundancy, which can overcome issues like hardware failures. This form of scalability allows it to handle huge amounts of data. In conclusion, the Elastic Stack makes the log management process a very simple task.
How does Elasticsearch work?
Raw data flows into Elasticsearch from a variety of sources, including log files, system metrics, and web applications. Data ingestion is the process by which this raw data is parsed, normalized, and enriched before it is indexed in Elasticsearch. Once indexed in Elasticsearch, users can run complex queries against their data and use aggregations to retrieve complex summaries of their data. From Kibana, users can create powerful visualizations of their data, share dashboards, and manage the Elastic Stack.
What is an Elasticsearch index?
An Elasticsearch index is a collection of documents that are related to each other. Elasticsearch stores data as JSON documents. Each document correlates a set of keys (names of fields or properties) with their corresponding values (strings, numbers, Booleans, dates, arrays of values, geolocations, or other types of data). Elasticsearch uses a data structure called an inverted index, which is designed to allow very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in. During the indexing process, Elasticsearch stores documents and builds an inverted index to make the document data searchable in near real-time. Indexing is initiated with the index API, through which you can add or update a JSON document in a specific index. As an analogy of the database schema, the index is the database.
Ok, but Elasticsearch uses Inverted Index!
The inverted index is actually the mechanism by which the search engine works. The inverted index actually maps the things that can be searched within the documents, to the documents that those things exist.
What is an Elasticsearch document?
A document is a JSON object which is stored in Elasticsearch. It is like a row in a table in a database. Each document is stored in an index and has a data type that describes what sort of thing this document is. It also has an id and it contains zero or more fields or key-value pairs.
What is a cluster in Elasticsearch?
A cluster consists of one or more nodes that share the same cluster name. Each cluster has a single master node which can be replaced if the current master node fails.
What is a node in Elasticsearch?
A node is a running instance of Elasticsearch which belongs to a cluster. Multiple nodes can be started on a single server. At startup, a node will use unicast to discover an existing cluster with the same cluster name and will try to join that cluster.
What is a shard?
Because Elasticsearch is a distributed search engine, an index is usually split into elements known as shards that are distributed across multiple nodes. Elasticsearch automatically manages the arrangement of these shards. It also rebalances the shards as necessary, so users need not worry about the details. Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has 5 primary shards. Each primary shard can have zero or more replicas. A replica is a copy of the primary shard. By default, there is one replica for each primary shard.
What is mapping in Elasticsearch?
Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. For example, mappings are used to define which string fields should be treated as full-text fields, which fields contain numbers, dates, or geolocations and the format of date values. Mapping is essentially the equivalent for schema in relational databases for the documents in Elasticsearch. Elasticsearch does not use the concept of schema, so there is no need to pre-define the schema for the document before it is indexed in Elasticsearch.
Elasticsearch REST APIs
The REST technology can be thought of as the language of the internet. An API (Application Programming Interface) is code that allows two software programs to communicate with each other. With cloud use on the rise, APIs are emerging to expose web services. Elasticsearch uses REST APIs by the UI components and can be called to utilize Elasticsearch features. Elasticsearch offers a wide range of REST APIs which allow the integration, management and query the indexed data in many different ways. Examples of APIs are cat, Cluster, Cross-cluster replication, Document, Enrich, Graph, Explore, Index, Index lifecycle management and more.
What about plugins in Elasticsearch?
Plugins can enhance the core Elasticsearch functionality in a custom manner. They range from adding custom mapping types, custom analyzers, native scripts and custom discovery. They contain JAR files, but may also contain scripts and config files, and must be installed on every node in the cluster. After installation, each node must be restarted before the plugin becomes visible. Plugins are distinguished in two categories, the Core Plugins and the Community contributed. The first category identifies plugins that are part and maintained by the Elasticsearch project, delivered at the same time as Elasticsearch, their version number always matches the version number of Elasticsearch. The Community contributed are external to the Elasticsearch project. They are provided by individual developers or private companies and have their own licenses as well as their own versioning system. Site plugins — plugins containing HTML, CSS and JavaScript — are no longer supported. The plugin Categories are mainly API Extension, Alerting, Analysis, Discovery, Ingest, Management, Mapper, Security, Snapshot/Restore Repository and Store.
What is Logstash used for?
Logstash is used to aggregate and process data and send it to Elasticsearch. Logstash is an open-source, server-side data processing pipeline that enables you to ingest data from multiple sources simultaneously and enrich and transform it before it is indexed into Elasticsearch. It offers a large selection of plugins to help you parse, enrich, transform, and buffer data from a variety of sources. If your data requires additional processing that is not available in Beats, then you need to add Logstash to your deployment. The Logstash event processing pipeline has three stages: inputs → filters → outputs. Inputs generate events, filters modify them, and outputs ship them elsewhere. Inputs and outputs support codecs that enable you to encode or decode the data as it enters or exits the pipeline without having to use a separate filter. Inputs are used to get data into Logstash. Some of the more commonly used inputs are file (tail -f), syslog, redis and beats. Filters are intermediary processing devices in the Logstash pipeline, which they can be combined with conditionals to perform an action on an event if it meets certain criteria. Commonly used filters are grok, mutate, drop, clone and geoip. Outputs are the final phase of the Logstash pipeline. An event can pass through multiple outputs, but once all output processing is complete, the event has finished its execution. Commonly used outputs are elasticsearch, file, graphite and statsd. Codecs are stream filters that can operate as part of an input or output. They enable you to easily separate the transport of your messages from the serialization process. Commonly used codecs are json and multiline.
What is Kibana used for?
Kibana is a data analytics, visualization and management tool for Elasticsearch that provides real-time histograms, line graphs, pie charts, and maps. You can use Kibana to search, view, and interact with data stored in Elasticsearch indices. You can easily perform advanced data analysis and visualization for your data. Since version 7.0, Kibana uses the new Kibana Querying Language which includes scripted field support and a simplified, easier to use syntax. If you have a Basic license or above, autocomplete functionality will also be enabled. The basics of language syntax have stayed the same, but some things have been refined to make the query language easier to use. The most common search types are free text for quickly searching a specific string, field-level searches for a string within a specific field, logical statements used to combine searches into a logical statement and proximity searches for searching terms within a specific character proximity. The Visualize feature of Kibana enables you to create visualizations of the data from your Elasticsearch indices, which you can then add to dashboards for analysis. Kibana visualizations are based on Elasticsearch queries. By using a series of Elasticsearch aggregations to extract and process your data, you can create charts that show you the trends, spikes, and dips you need to know about. Visualization types are Basic Charts (Pie chart, data table, metric, goal and gauge, heat maps and tag cloud), Time Series Optimized (TSVB, Timelion), Maps (Elastic maps, coordinate maps, region map) and for Developers (Vega). When a collection of visualizations is ready, they can be added into one comprehensive visualization called a dashboard. Dashboards provide at-a-glance insights into your data and enable you to drill down into details. They give the ability to monitor a system or environment for easier event correlation or trend analysis.
What is Spaces in Kibana?
Spaces enable you to organize your dashboards and other saved objects into meaningful categories. Once inside a space, you see only the dashboards and saved objects that belong to that space.
Kibana creates a default space for you. After you create your own spaces, you’re asked to choose a space when you log in to Kibana. You can change your current space at any time by using the menu in the upper left.
What is Beats used for?
The Beats are open source data shippers that you install as agents on your servers to send operational data to Elasticsearch. Beats can send data directly to Elasticsearch or via Logstash, where you can further process and enhance the data. They sit on your servers, with your containers, or deploy as functions — and then centralize data in Elasticsearch. The Beats Family consists of Filebeat, Metricbeat, Packetbeat, Winlogbeat, Auditbeat, Journalbeat, Heartbeat and Functionbeat. Very interesting is the fact that Filebeat and Metricbeat support modules which are built-in configurations and Kibana objects for specific platforms and systems. These modules can be utilized easily because they come with pre-configured settings and they can also be later adjusted according to the organization’s needs. Some Filebeat modules are Apache, Auditd, AWS, Azure, Cisco, Elasticsearch, Google Cloud, haproxy, IIS, Iptables, Kafka and more Accordingly, a number of Metricbeat modules are Aerospike, AWS, Azure, Docker, Elasticsearch, HAProxy, Kafka, Kibana, Kubernetes, Logstash, MSSQL, MySQL, PostgreSQL, RabbitMQ and more.
Which are the Elastic Stack Business-wise Features?
From enterprise-grade security and developer-friendly Application Programming Interfaces (APIs) to machine learning and graph analytics, the Elastic Stack ships with features to help you ingest, analyze, search, and visualize all types of data at scale. It provides various features for Management and Operations, Ingest and Enrich, Data Storage, Search and Analyze, Explore and Visualize. Use cases involve Application Search, Site Search, Business Analytics, Enterprise Search, Metrics Analytics and the benefits are leveraged by a wide range of industry like Automotive & Manufacturing, Education/Non Profit, Financial Services, Food & Beverage/ Hospitality, and Government. Actual implementations of Elastic Stack enabled companies to increase search speed, relevance and data volume in the context of IT system, log data used by IT operations. With the capabilities of Kibana, the company has greater insights on the performance and usage of products to prevent customer churn and drive continued growth.
The ELK Stack as A Service
When you want to get the benefits of the Elastic Stack you have a couple of choices. You can download Elastic Stack and install it on your own hardware or in the cloud, but you will keep the administrative load. Logstail.com provides the ELK Stack as a cloud-hosted, easy to use and fully – managed service. With Logstail.com, you can deploy the ELK Stack rapidly fast, with 360 security, without having to worry about maintenance and capacity issues. Focus on your business and pay as you grow at affordable prices.
Installation (Example for Debian 10)
No ELK Stack for beginners guide can be complete without a demo installation!
We will install ELK Stack in a Debian 10 Virtual Machine with the Metricbeat in order to collect metrics from our own operating system and visualize them. This tutorial is the simplest way for you to realize the power of ELK Stack!! Let’s go:
Step 1. Install Elasticsearch
To install Elasticsearch open a terminal window and use the following commands.
1 2 3 4 5 |
demo@debian:~$ curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.0-amd64.deb demo@debian:~$ sudo dpkg -i elasticsearch-7.5.0-amd64.deb demo@debian:~$ sudo /etc/init.d/elasticsearch start |
to test that the Elasticsearch daemon is up and running, try sending an HTTP GET request on port 9200
1 |
demo@debian:~$ curl http://127.0.0.1:9200 |
A correct response will be similar to this:
Step 2. Install Kibana
To install Kibana open a terminal window and use the following commands:
1 2 3 4 5 6 7 |
demo@debian:~$ curl -L -O https://artifacts.elastic.co/downloads/kibana/kibana-7.5.0-linux-x86_64.tar.gz demo@debian:~$ tar xzvf kibana-7.5.0-linux-x86_64.tar.gz demo@debian:~$ cd kibana-7.5.0-linux-x86_64/ demo@debian:~$ ./bin/kibana |
To launch the Kibana web interface open your browser and type:
1 |
http://127.0.0.1:5601 |
Step 3. Install Metricbeat
To install Metricbeat open a terminal window and use the following commands:
1 2 3 |
demo@debian:~$ curl -L -O https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-7.5.0-amd64.deb demo@debian:~$ sudo dpkg -i metricbeat-7.5.0-amd64.deb |
To set up the system module and start collecting system metrics, From the Metricbeat install directory, enable the system module:
1 |
demo@debian:~$ sudo metricbeat modules enable system |
Set up the initial environment:
1 |
demo@debian:~$ sudo metricbeat setup -e |
Start Metricbeat:
1 |
demo@debian:~$ sudo service metricbeat start |
To visualize system metrics, open your browser and navigate to the Metricbeat system overview dashboard:
1 |
http://localhost:5601/app/kibana#/dashboard/Metricbeat-system-overview-ecs |
Step 4. Install Logstash (Optional)
The reason why Logstash is optional in this installation is why we have already installed Metricbeat and so we can send the data to Elasticsearch. To install Logstash open a terminal window and use the following commands (java required):
1 2 3 |
demo@debian:~$ curl -L -O https://artifacts.elastic.co/downloads/logstash/logstash-7.5.0.deb demo@debian:~$ sudo dpkg -i logstash-7.5.0.deb |
Start Logstash:
1 |
demo@debian:~$ sudo /etc/init.d/logstash start |
Congratulations, you have successfully performed an installation of the Elastic Stack!
Conclusion
This guide is intended to inform and help users with little or no experience to utilize the power and flexibility of ELK Stack. If you combine this platform with the complete services that Logstail.com is offering, you will transform the analysis of your data from a tedious to an exciting task!
Contact Our Expertsor Sign Up for Free