EP10 – Why you need to start using Streams in Java

Java 8 StreamsFirst and foremost, let’s talk about why you should learn how to use streams.

Streams in Java 8 are essentially the solution to aggregating data easily.

What the heck do I mean by aggregating data?

Consider a database query that uses aggregates.

In SQL, the aggregates we have access to are operations that are performed on groups of data like:

  • counting rows
  • summing up values
  • finding a min or a max
  • getting an average

All of these operations can be performed on data that is grouped into buckets.

If you’re not at all familiar with the concept of aggregate functions and grouping, I’d highly suggest reading my articles / listening to my podcasts on the topics: SQL Aggregate Functions and SQL Group By

Data Aggregation without Streams

Okay, so if streams are great for data aggregation, then let’s see some examples.

Let’s say we have a game. In this game there are Players, and Players have high scores. They also have location information (i.e. city and state).

Now, if we wanted to get the top high scores for players in all 50 states in the USA, how would we do this with our Java 7 level coding skills (read: not using streams)?

Let’s see how we would get aggregate information without using streams:

(brace yourself, code is coming!)

Player.class

package com.coderscampus;

public class Player
{
  private Long id;
  private Integer highScore;
  private String state;
  private String name;
  
  public Long getId()
  {
    return id;
  }
  public void setId(Long id)
  {
    this.id = id;
  }
  public Integer getHighScore()
  {
    return highScore;
  }
  public void setHighScore(Integer highScore)
  {
    this.highScore = highScore;
  }
  public String getState()
  {
    return state;
  }
  public void setState(String state)
  {
    this.state = state;
  }
  public String getName()
  {
    return name;
  }
  public void setName(String name)
  {
    this.name = name;
  }
  @Override
  public String toString()
  {
    return "\nname=" + name + ", state=" + state+ ", highScore=" + highScore;
  }
  
}

WorkingWithStreams.class

package com.coderscampus;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

public class WorkingWithStreams
{
  static List<Player> players = new ArrayList<>();
  
  public static void main (String[] args)
  {
    System.out.println(getHighScoresWithoutStreams());
  }
  
  private static List<Player> getHighScoresWithoutStreams()
  {
    // without the use of streams, we'll need to break down each step
    //  of the process for finding the high scores in each state
    
    // Step 0: make sure our players List is populated with data... normally 
    //  we would have a database, but for our purposes, we'll just hard code
    //  some example data from the populatePlayerData method.
    populatePlayerData();
    
    // Step 1: Group the Player data by state
    Map<String, List<Player>> groupedPlayerDataByState = new HashMap<>();
    
    for (Player player : players)
    {
      if (groupedPlayerDataByState.containsKey(player.getState()))
      {
        groupedPlayerDataByState.get(player.getState()).add(player);
      }
      else
      {
        groupedPlayerDataByState.put(player.getState(), new ArrayList<Player>(Arrays.asList(player)));
      }
    }
    
    // Step 2 & 3: Sort grouped Player data by high score and return the highest score
    List<Player> highScores = new ArrayList<>();
    
    for (Map.Entry<String, List<Player>> entry : groupedPlayerDataByState.entrySet())
    {
      Collections.sort(entry.getValue(), new Comparator<Player> () {
        @Override
        public int compare(Player player1, Player player2)
        {
          return player2.getHighScore().compareTo(player1.getHighScore());
        }});
      highScores.add(entry.getValue().get(0));
    }
    return highScores;
  }

  private static void populatePlayerData ()
  {
    players.add(createPlayer(1L, "John Doe", 5048, "Arizona"));
    players.add(createPlayer(2L, "Jane Doe", 2400, "Arizona"));
    players.add(createPlayer(3L, "Super Man", 1450, "Washington"));
    players.add(createPlayer(4L, "Bat Man", 3205, "Washington"));
    players.add(createPlayer(5L, "Frodo Baggins", 100, "Washington"));
    players.add(createPlayer(6L, "Daenerys Targaryen", 10000, "Colorado"));
    players.add(createPlayer(7L, "John Snow", 9800, "Colorado"));
    players.add(createPlayer(8L, "Arya Stark", 6050, "Colorado"));
    players.add(createPlayer(9L, "Sansa Stark", 7220, "California"));
    players.add(createPlayer(10L, "Tyrion Lannister", 4680, "California"));
  }
  
  private static Player createPlayer (Long id, String name, Integer highScore, String state)
  {
    Player aPlayer = new Player();
    
    aPlayer.setHighScore(highScore);
    aPlayer.setId(id);
    aPlayer.setName(name);
    aPlayer.setState(state);
    
    return aPlayer;
  }
}

The output of all that code above will be this list of Player high scores by state:

[name=Daenerys Targaryen, state=Colorado, highScore=10000,
name=Sansa Stark, state=California, highScore=7220,
name=John Doe, state=Arizona, highScore=5048,
name=Bat Man, state=Washington, highScore=3205]

Now that’s a CRAP load of code just to get this output… but without the use of Streams and Lambdas in Java 8, this is more or less what you would have to do to get the correct output.

Data Aggregation with Streams

Thankfully the alternative is stupidly simple.

First let’s have a look at what the code would be to get our desired output using Streams, then I’ll explain it.

What I’m going to show you is the one method that will do the work we need to return a list of Players. This method can be called from the main method in our code example above.

private static Collection<Optional<Player>> getHighScoresWithStreams()
{
  return players.stream()
                .collect(Collectors.groupingBy(Player::getState, 
                    Collectors.maxBy(Comparator.comparing(Player::getHighScore))))
                .values();
}

Optional Keyword

Obviously, there’s a LOT going on inside of this code, but that’s why I’m here to try my best to explain what’s going on.

Starting from the top, the first thing you may see and scratch your head over is the Optional keyword. The Optional keyword was also introduced in Java’s version 8. I’ll cover that topic in-depth on the next post.

For now, you just need to know that this means there may or may not be a Player object inside of the Optional Collection container object.

Stream() Method

Next up is the actual stream method. This is the method you call when you want to start the process of creating a stream from your iterable Collection.

In this case, we have a collection of Player objects, this is the collection that we want to start streaming.

Once we’ve started our stream, there are a few things that we can do to our data. Sometimes you’d want to filter your data so as to eliminate data points that you don’t care about, but that’s not what we’re doing in this example (as we want to consider all data points). I’ll show an example with filtering next.

What we do want to do, is group our transactions together by some criteria. In our example, we want to group by the Player‘s state.

This is accomplished by first executing the collect method. The collect method is usually that last operation that you perform on a stream (though that isn’t evident from our example here). Again, I’ll show more examples after this.

In order to collect our data such that it will be returned to our collection of Players, we need to tell Java how we’d like to collect our data. So we tell Java that we’d like to group our data together using the Collectors.groupingBy method.

You can think of the Collectors keyword like a utility class. It’s got a whole bunch of static methods that can be used to help us tell Java how we want to collect our data.

So when we initiate the Collectors.groupingBy method, we need to tell it how we’d like to group our data… so we say groupingBy(Player::getState).

Now, we’re not done with the grouping function just yet. If all we wanted to do was to group our data by the Player‘s state, then sure, we’d be done… but we want to get the MAX value for each state.

Luckily the Collectors.groupingBy method can also take a second parameter. This second parameter is the “downstream” parameter which performs a reduction on the data. All this means is that it will reduce the number of returned data points in some way. For our purposes this reduction is going to reduce the data set to just the maximum value in each bucket.

So how do we tell Java that we’d like to reduce our data set to just the maximum values? Well lucky for us, we have a Collectors.maxBy utility method.

The Collectors.maxBy method takes one parameter, a Comparator.

Now from our last lesson, we learned about Lambdas and how to use them with a Comparator. You could just pass in a lambda expression like we already learned about… or, you could make use of yet another utility method: Comparator.comparing.

With the Comparator.comparing method, you just pass in the method of the class you’d like to compare on. So for our example we pass in Player::getHighScore. It will automatically compare values highest to lowest. If you wanted to get the reverse order, then you would just append .reversed() to the Comparator.comparing() method call.

And finally, now that we’ve told our collect method call how we’d like to collect our data, we’re left with a Map of Strings as the key (the Player‘s state) and a List of Players as the values. Remember… We grouped Lists of Players by state, so that’s what we’re getting after all our collector calls 😉

So what we truly want to end up with, is not a Map, but a List. So how do we get the values from a Map? We invoke the .values() method on our Map.

And voila! We have our result set.

So, with four lines of code, it took me about 40 lines of text to explain what was going on. That’s both the upside AND the downside of Java streams. They’re efficient in terms of execution and conciseness, but there’s a lot going on in the background that you need to understand.

Another Java Streams Example

As promised, I’d like to talk about another example.

This time, let’s just say that we want to get a list of Players so are over the age of 20. Maybe the reason for this is that you want to throw an adults only party?

First, let’s assume that we have the appropriate age property coded into our Player object. I’m too lazy to re-post the entire Player object with this new property, but we should all be familiar with how to add a property to an object at this point, right?

Now, how do we get a list of Players who are older than 20 with streams? Let’s have a look:

players.stream()
       .filter(p -> p.getAge() > 20)
       .collect(Collectors.toList());

Here we can see the use of the filter method.

Filtering is pretty straight forward. The Filter method takes one parameter, a Predicate object.

A Predicate is a special interface provided by Java as of version 8. Recall that we talked about another one of these special interfaces in the last lesson… it was called the Consumer interface.

The difference between a Consumer interface and a Predicate interface is that the Consumer doesn’t return anything, but the Predicate returns a Boolean.

This is important, because if we were to code a method for filtering, we would need to return a Boolean value to say if we should or shouldn’t include a particular data point.

So, this is why the filter method takes a Predicate.

Here’s a peek at what the Predicate looks like:

@FunctionalInterface
public interface Predicate<T> {
  boolean test (T t);
}

So using Lambda expressions, we can concisely write out an appropriate filter like we did above with: filter(p -> p.getAge() > 20).

Then finally, we fire off our collect method which will allow us to return all our our filtered data. In this case we use the Collectors.toList() utility method to convert our data points into a List of Players (very handy).

Example Using Map Functions

So, we’ve seen how to use some neat utility methods with the Collectors object, and we’ve seen how to filter results. Now let’s have a look at how we can use the mapping functionality with streams.

With the mapping functions, you can translate your larger data set into a smaller one and return it.

A great example can be shown with the getPlayerIds() method that we’ll build.

Let’s assume that we want to get a List of the Player IDs, we can stream them and return them with the map function:

public List<Long> getPlayerIds()
{
  return players.stream()
                .map(Player::getId) // instead of returning all the players, just return the player IDs
                .collect(Collectors.toList());
}

Hopefully this one is fairly straight-forward to understand. The key here is that we’re wanting to return a Player‘s ID, so we use the map function to facilitate the translation from Player to Long.

Another Random Mapping Example

There are other mapping functions that we can make use of, like the mapToInt method. Let’s have a look at a simple example:

Stream.of("a1", "a2", "a3")
    .map(s -> s.substring(1)) // returns a stream of Strings: ["1", "2", "3"]
    .mapToInt(Integer::parseInt) // returns a stream of Integers (an IntStream): [1, 2, 3]
    .max() // returns an OptionalInt: 3
    .ifPresent(System.out::println);  // prints 3