Hibernate.orgCommunity Documentation

Chapter 4. Batch Processing

Table of Contents

4.1. Batch inserts
4.2. Batch updates
4.3. StatelessSession
4.4. Hibernate Query Language for DML
4.4.1. HQL for UPDATE and DELETE
4.4.2. HQL syntax for INSERT
4.4.3. More information on HQL

The following example shows an antipattern for batch inserts.

Example 4.1. Naive way to insert 100000 lines with Hibernate

Session session = sessionFactory.openSession();

Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    session.save(customer);
}
tx.commit();
session.close();

This fails with exception OutOfMemoryException after around 50000 rows on most systems. The reason is that Hibernate caches all the newly inserted Customer instances in the session-level cache. There are several ways to avoid this problem.


Before batch processing, enable JDBC batching. To enable JDBC batching, set the property hibernate.jdbc.batch_size to an integer between 10 and 50.

Note

Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

If the above approach is not appropriate, you can disable the second-level cache, by setting hibernate.cache.use_second_level_cache to false.

When you make new objects persistent, employ methods flush() and clear() to the session regularly, to control the size of the first-level cache.


When you retriev and update data, flush() and clear() the session regularly. In addition, use method scroll() to take advantage of server-side cursors for queries that return many rows of data.


StatelessSession is a command-oriented API provided by Hibernate. Use it to stream data to and from the database in the form of detached objects. A StatelessSession has no persistence context associated with it and does not provide many of the higher-level life cycle semantics. Some of the things not provided by a StatelessSession include:

Features and behaviors not provided by StatelessSession

  • a first-level cache

  • interaction with any second-level or query cache

  • transactional write-behind or automatic dirty checking

Limitations of StatelessSession

  • Operations performed using a stateless session never cascade to associated instances.

  • Collections are ignored by a stateless session.

  • Operations performed via a stateless session bypass Hibernate's event model and interceptors.

  • Due to the lack of a first-level cache, Stateless sessions are vulnerable to data aliasing effects.

  • A stateless session is a lower-level abstraction that is much closer to the underlying JDBC.


The insert(), update(), and delete() operations defined by the StatelessSession interface operate directly on database rows. They cause the corresponding SQL operations to be executed immediately. They have different semantics from the save(), saveOrUpdate(), and delete() operations defined by the Session interface.

DML, or Data Markup Language, refers to SQL statements such as INSERT, UPDATE, and DELETE. Hibernate provides methods for bulk SQL-style DML statement execution, in the form of Hibernate Query Language (HQL).


The FROM clause can only refer to a single entity, which can be aliased. If the entity name is aliased, any property references must be qualified using that alias. If the entity name is not aliased, then it is illegal for any property references to be qualified.

Joins, either implicit or explicit, are prohibited in a bulk HQL query. You can use sub-queries in the WHERE clause, and the sub-queries themselves can contain joins.


In keeping with the EJB3 specification, HQL UPDATE statements, by default, do not effect the version or the timestamp property values for the affected entities. You can use a versioned update to force Hibernate to reset the version or timestamp property values, by adding the VERSIONED keyword after the UPDATE keyword.


Note

If you use the VERSIONED statement, you cannot use custom version types, which use class org.hibernate.usertype.UserVersionType.


Method Query.executeUpdate() returns an int value, which indicates the number of entities effected by the operation. This may or may not correlate to the number of rows effected in the database. An HQL bulk operation might result in multiple SQL statements being executed, such as for joined-subclass. In the example of joined-subclass, a DELETE against one of the subclasses may actually result in deletes in the tables underlying the join, or further down the inheritance hierarchy.


Only the INSERT INTO ... SELECT ... form is supported. You cannot specify explicit values to insert.

The properties_list is analogous to the column specification in the SQL INSERT statement. For entities involved in mapped inheritance, you can only use properties directly defined on that given class-level in the properties_list. Superclass properties are not allowed and subclass properties are irrelevant. In other words, INSERT statements are inherently non-polymorphic.

The select_statement can be any valid HQL select query, but the return types must match the types expected by the INSERT. Hibernate verifies the return types during query compilation, instead of expecting the database to check it. Problems might result from Hibernate types which are equivalent, rather than equal. One such example is a mismatch between a property defined as an org.hibernate.type.DateType and a property defined as an org.hibernate.type.TimestampType, even though the database may not make a distinction, or may be capable of handling the conversion.

If id property is not specified in the properties_list, Hibernate generates a value automatically. Automatic generation is only available if you use ID generators which operate on the database. Otherwise, Hibernate throws an exception during parsing. Available in-database generators are org.hibernate.id.SequenceGenerator and its subclasses, and objects which implement org.hibernate.id.PostInsertIdentifierGenerator. The most notable exception is org.hibernate.id.TableHiLoGenerator, which does not expose a selectable way to get its values.

For properties mapped as either version or timestamp, the insert statement gives you two options. You can either specify the property in the properties_list, in which case its value is taken from the corresponding select expressions, or omit it from the properties_list, in which case the seed value defined by the org.hibernate.type.VersionType is used.