Java 8 Stream API - distinct(), count() & sorted() Example

Tech Lead & Architect | 13+ Years in Cloud, Backend, and AI - Experienced software engineer with expertise in Java, Spring Boot, Microservices, Angular, React, Kafka, DevOps, Python, PySpark, Databricks, and Generative AI. Certified in TOGAF, AWS, and Google Cloud. Passionate about building scalable, secure, and high-performance systems. Enthusiast in Data Engineering & Agentic AI. Author of 1,200+ technical articles sharing insights across diverse tech stacks.
Date: 2021-09-29
Understanding Java 8's Stream API: Sorted, Count, and Distinct Methods
Java 8 introduced a significant enhancement to the language with the Stream API, a powerful tool for processing collections of data efficiently. This article explores three core methods within the Stream API: sorted(), count(), and distinct(). These methods provide concise and elegant ways to manipulate and analyze data streams, leading to cleaner and more readable code. We will delve into each method individually, explaining their functionality and demonstrating their practical application.
The foundation of the Stream API is the concept of a stream, which represents a sequence of elements. Think of a stream as a conveyor belt carrying data. Unlike traditional collections, streams do not store data; they process it on the fly. This "on-the-fly" processing contributes to improved performance, especially when dealing with large datasets. The sorted(), count(), and distinct() methods are operations that can be performed on these streams to transform or analyze the data they carry.
Let's start with the sorted() method. This method, as its name implies, arranges the elements of a stream in a specific order. The order is determined by a comparison function, which dictates how elements should be compared against one another. By default, for elements that naturally support ordering (like numbers), the sorted() method sorts in ascending order. For more complex objects, a custom comparison function (often implemented using a Comparator) can be provided to define the sorting criteria. For example, you might sort a list of employees by their salary, age, or job title. The result is a new stream containing the same elements but arranged according to the defined sorting logic. The original stream remains unchanged. This characteristic allows for chaining multiple stream operations together in a declarative style.
The count() method provides a simple yet essential function: it determines the total number of elements present in a stream. This seemingly basic operation is invaluable in many contexts. For example, it might be used to quickly determine the size of a dataset, to calculate aggregate statistics, or to check for the existence of data. The count() method returns a single numerical value representing the total element count, allowing for immediate integration into other parts of your application logic. This method is particularly helpful when used in conjunction with filtering operations; one might filter a stream of data based on a specific criteria, then use the count() method to determine how many elements satisfied the filter condition.
The distinct() method focuses on removing duplicate elements from a stream. It ensures that each element in the resulting stream is unique. The uniqueness of an element is determined by its equals() method. It's crucial to understand that distinct() relies on the object's own equals() and hashCode() methods for comparison. If these methods are not correctly implemented, the distinct() operation may not produce the expected results. Consequently, ensuring proper implementation of these methods is vital for accurate duplicate removal. This method is particularly useful when dealing with datasets that may contain redundant entries; it cleanses the data by removing repeated values, allowing for cleaner analysis and manipulation downstream.
In a practical scenario, these three methods often work together. Imagine a scenario where you have a large list of employee records, each containing information such as employee ID, name, department, and salary. You might want to determine the number of unique departments within the company. You would first create a stream from the employee list, then apply the distinct() method, using a custom comparison function based on the "department" field, to extract only unique department names. Following this, you would use the count() method to determine the total number of unique departments. You could even add sorted() to the chain, arranging the unique department names alphabetically before counting them.
This combination of stream operations demonstrates the flexibility and power of the Java 8 Stream API. It enables developers to express complex data processing logic in a declarative, concise, and efficient manner. By chaining multiple operations together, one can transform and analyze data streams in a highly readable and maintainable fashion. This contrasts sharply with the more verbose and procedural approaches required in older Java versions. The improvement in code readability and maintainability is often a significant benefit of utilizing the Stream API, particularly for larger projects.
In conclusion, the sorted(), count(), and distinct() methods are three fundamental components of the Java 8 Stream API. Understanding and utilizing these methods effectively contributes to writing cleaner, more efficient, and easier-to-maintain Java code, especially when dealing with collection-based operations. The ability to chain these methods together for sophisticated data manipulation reinforces their value and highlights the significant improvement offered by the Stream API over traditional collection processing methods. Mastering these core functions is crucial for any Java developer seeking to build robust and efficient applications.