How to Fix Duplicate Data from Hibernate Queries

Java-PodcastWe’ve covered a lot of topics on Hibernate already on this blog in a nice sequential format. If you haven’t had the pleasure of going through all the tutorials, I’d suggest going to this page and having a flip through all the topics.

How to Fix Duplicate Data from Hibernate Queries

This problem was the bane of my existence when I first started using Hibernate because I had no idea where the problem was coming from.

If you’re executing a query and finding that you have a bunch of duplicate records and haven’t a clue why, then you’re in the right place.

You see the problem is typically caused by having left joins (or optional joins) in your objects. When you have a base object, like say User and it joins to another table/object in an optional One-to-Many or optional Many-to-Many format, then you may get duplicates.

Consider this scenario… A User objects joins to the LoginHistory object, which keeps track of all the times a particular User has logged into the system. And let’s say our user has logged in many times. You’ll have a situation where you have many records in the LoginHistory table.

So what happens when you run a query that joins to the LoginHistory table? Well it will return as many rows as there are entries for that User in the LoginHistory table.

So because of this, Hibernate doesn’t massage the data for you, it just returns exactly what it got from the database. The ball is in your court to tell Hibernate what to do with records it has retrieved.

There are two solutions to this problem:

  1. Declare your joining object as a Set
  2. Make use of Distinct Root Entity Results Transformer

The Problem at a Glance

So here’s an example of the problem in action. Below you’ll see an outline of the optional one-to-many join between the User class and the LoginHistory class.

User.java

import java.util.List;

import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.OneToMany;
import javax.persistence.Table;

@Entity
@Table(name="users")
public class User
{
  private Long userId;
  private String username;
  private String password;
  private List<LoginHistory> loginHistory;

  @Id
  @GeneratedValue(strategy=GenerationType.AUTO)
  @Column(name="user_id")
  public Long getUserId()
	{
		return userId;
	}
	public void setUserId(Long userId)
	{
		this.userId = userId;
	}
	public String getUsername()
	{
		return username;
	}
	public void setUsername(String username)
	{
		this.username = username;
	}
	public String getPassword()
	{
		return password;
	}
	public void setPassword(String password)
	{
		this.password = password;
	}
	@OneToMany(cascade=CascadeType.ALL, mappedBy="user", fetch=FetchType.EAGER)
	public List<LoginHistory> getLoginHistory()
	{
		return loginHistory;
	}
	public void setLoginHistory(List<LoginHistory> loginHistory)
	{
		this.loginHistory = loginHistory;
	}


}

LoginHistory.java

package com.howtoprogramwithjava.example.persistence;

import java.util.Date;

import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.JoinColumn;
import javax.persistence.ManyToOne;
import javax.persistence.Table;

@Entity
@Table(name="login_history")
public class LoginHistory
{
  private Long loginHistoryId;
  private Date loggedIn;
  private User user;

  @Id
  @GeneratedValue(strategy=GenerationType.AUTO)
  @Column(name="login_history_id")
	public Long getLoginHistoryId()
	{
		return loginHistoryId;
	}
	public void setLoginHistoryId(Long loginHistoryId)
	{
		this.loginHistoryId = loginHistoryId;
	}

	@Column(name="logged_in")
	public Date getLoggedIn()
	{
		return loggedIn;
	}
	public void setLoggedIn(Date loggedIn)
	{
		this.loggedIn = loggedIn;
	}

	@ManyToOne(cascade=CascadeType.ALL, optional=true, fetch=FetchType.EAGER)
	@JoinColumn(name="user_id")
	public User getUser()
	{
		return user;
	}
	public void setUser(User user)
	{
		this.user = user;
	}
}

And then we have the entry in a DAO class that will query for Users by passing in a username.

  public List<User> getUserByUsername (String username)
  {
  	Session session = sessionFactory.getCurrentSession();
  	return session.createCriteria(User.class).add(Restrictions.eq("username", username)).list();
  }

So in the code directly above, we are querying the database for Users that have a username matching the String that’s being passed in.

The problem with this is that once it runs this query, it will return multiple rows if the User it finds has logged into the system more than once.

Again, it does this because it’s an optional (left) join. If you were to take a look at the query built by Hibernate, it could look something like this (assuming the username being passed in is tpage):

select * from users
left join login_history on login_history.user_id = users.user_id
where users.username = 'tpage';

This query returns multiple results (three results to be exact, in my database) as I’ve inserted three separate rows into the Login_history table that point back to the tpage user.

Alright, so hopefully you fully understand the problem, now let’s talk about some solutions to this problem.

Declaring the Join as a Set

On the parent side of the relationship, you’ll be declaring a collection of objects that embodies your *-to-Many relationship. What you’ll need to do here is to use a Set as the backing collection as opposed to something like a List (which is what we were using in the User class above).

Here’s an example of how to set it up correctly:

import java.util.Set;

import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.OneToMany;
import javax.persistence.Table;

@Entity
@Table(name="users")
public class User
{
  private Long userId;
  private String username;
  private String password;
  private Set<LoginHistory> loginHistory;

  @Id
  @GeneratedValue(strategy=GenerationType.AUTO)
  @Column(name="user_id")
  public Long getUserId()
	{
		return userId;
	}
	public void setUserId(Long userId)
	{
		this.userId = userId;
	}
	public String getUsername()
	{
		return username;
	}
	public void setUsername(String username)
	{
		this.username = username;
	}
	public String getPassword()
	{
		return password;
	}
	public void setPassword(String password)
	{
		this.password = password;
	}
	@OneToMany(cascade=CascadeType.ALL, mappedBy="user")
	public Set<LoginHistory> getLoginHistory()
	{
		return loginHistory;
	}
	public void setLoginHistory(Set<LoginHistory> loginHistory)
	{
		this.loginHistory = loginHistory;
	}
}

Again, the key is to use a Set as the collection of child entities. So in our example above we used Set<LoginHistory>

Distinct Root Entity Results Transformer

This sounds like pure gibberish, but thankfully it’s not hard to implement. All you need to do is make sure that you set a ResultsTransformer when you’re building your query.

Remember that DAO query we talked about near the beginning of this article? I’ll show you that query again for the sake of completion:

  public List<User> getUserByUsername (String username)
  {
  	Session session = sessionFactory.getCurrentSession();
  	return session.createCriteria(User.class)
        .add(Restrictions.eq("username", username))
        .list();
  }

This query is alright, but as we’ve seen, it doesn’t work well when we have an option join and we’re dealing with a List of entities.

In the event that we NEED to use a List of entities, we can alter this query to use the ResultsTransformer. Let’s see what the corrected code looks like:

  public List<User> getUserByUsername (String username)
  {
  	Session session = sessionFactory.getCurrentSession();
  	return session.createCriteria(User.class)
        .add(Restrictions.eq("username", username))
        .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
        .list();
  }

You see the difference? We just added one additional line to the query: setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY) and voila! You will no longer have duplicates in your returned result set.