Introduction to Spring Data
Dr. Mark Pollack
• The current data landscape
• Project Goals
• Project Tour
Enterprise Data Trends
Enterprise Data Trends
Unstructured Data
•No predefined data model
•Often doesn’t fit well in RDBMS
Pre-Aggregated Data
•Computed during data collection
•Running Averages
The Value of Data
• Value from Data Exceeds Hardware & Software costs
• Value in connecting data sets
– Grouping e-commerce users by user agent
Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/418.9 (KHTML, like Gecko) Safari/419.3
The Data Revolution
• Extremely difficult/impossible to scale writes in RDBMS
– Vertical scaling is limited/expensive
– Horizontal scaling is limited or requires $$
• Shift from ACID to BASE
– Basically Available, Scalable, Eventually Consistent
• NoSQL datastores emerge as “point solutions”
– Amazon/Google papers
– Facebook, LinkedIn …
“Not Only SQL”
NOSQL \no-seek-wool\ n. Describes ongoing trend where
developers increasingly opt for non-relational databases
to help solve their problems, in an effort to use the right
tool for the right job.
Query Mechanisms:
Key lookup, map-reduce, query-by-example, query language, traversals
Big Data
• “Big data” refers to datasets whose size is beyond the
ability of typical database software tools to capture, store,
manage, and analyze.
• A subjective and moving target.
• Big data in many sectors today range from 10’s of TB to
multiple PB
Reality Check
Reality Check
Project Goals
Spring Data - Background and Motivation
• Data access landscape has changed considerably
• RDBMS are still important and predominant
– but no longer considered a “one size fits all” solution
• But they have limitations
– Hard to scale
• New data access technologies are solving problems
RDBMS can’t
– Higher performance and scalability, different data models
– Often limited transactional model and relaxed consistency
• Polyglot persistence is becoming more prevalent
– Combine RDBMS + other DBs in a solution
Spring and Data Access
• Spring has always provided excellent data access support
Transaction Management
Portable data access exception hierarchy
JDBC – JdbcTemplate
ORM - Hibernate, JPA, JDO, Ibatis support
Cache support (Spring 3.1)
• Spring Data project started in 2010
• Goal is to “refresh” Spring’s Data Access support
– In light of new data access landscape
Spring Data Mission Statement
89% of all virtualized applications
Provides a familiar and consistent
in the world run on VMware.
Spring-based programming model
Gartner, December 2008
for Big Data,
NoSQL, and relational
stores while retaining store-specific
features and capabilities.
Spring Data Mission Statement
89% of all virtualized applications
Provides a familiar and consistent
in the world run on VMware.
Spring-based programming model
Gartner, December 2008
for Big Data,
NoSQL, and relational
stores while retaining store-specific
features and capabilities.
Spring Data Mission Statement
89% of all virtualized applications
in the world run on VMware.
Gartner, December 2008
features and
Spring Data – Supported Technologies
• Relational
– JDBC Extensions
• Big Data
– Hadoop
HDFS and M/R
– Splunk
• Access
– Repositories
– QueryDSL
Spring Data – Have it your way
• Database specific features
are accessed through
familiar Spring Template
• Shared programming
models and data access
– Repository Model
• Common CRUD across data
– Integration with QueryDSL
• Typesafe query language
– REST Exporter
• Expose repository over HTTP
in a RESTful manner.
Project Tour
Spring Data JDBC Extensions – Oracle Support
• Fast Connection Failover
• Simplified configuration for
Advanced Queuing JMS
support and DataSource
• Single local transaction for
messaging and database
• Easy Access to native XML,
Struct, Array data types
• API for customizing the
connection environment
Enables the construction of typesafe SQL-like queries for multiple
backends including JPA, JDO,
MongoDB, Lucence, SQL and plain
collections in Java - Open Source, Apache 2.0
Problems using Strings for a query language
• Using strings is error-prone
• Must remember query syntax, domain classes, properties
and relationships
• Verbose parameter binding by name or position
• Each back-end has its own query language and API
• Note: .NET has LINQ
QueryDSL Features
• Code completion in IDE
• Almost no syntactically invalid queries allowed
• Domain types and properties can be references safely (no
• Helper classes generated via Java annotation processor
• Much less verbose than JPA2 Criteria API
QCustomer customer = QCustomer.customer;
JPQLQuery query = new JPAQuery(entityManger)
Customer bob = query.from(customer)
Using QueryDSL for JDBC
• Incorporate code-generation into your build process
– To create a query meta-model of domain classes or Tables (JDBC)
• For SQL
QAddress qAddress = QAddress.address;
SQLTemplates dialect = new HSQLDBTemplates();
SQLQuery query = new SQLQueryImpl(connection, dialect)
List<Address> results = query.list(new QBean<Address>(Address.class,
Querydsl Predicate
Spring JDBC Extension – QueryDslJdbcTemplate
• Wrapper around JdbcTemplate that supports
– Using Querydsl SQLQuery classes to execute queries
– Integrates with Spring’s transaction management
– Automatically detects DB type and set SQLTemplates
– Spring RowMapper and ResultSetExtractors for
mapping to POJOs
– Executing insert, updates and deletes with Querdsl’s
SQLInsertClause, SQLUpdateClause, and
Spring JDBC Extension – QueryDslJdbcTemplate
// Query with join
QCustomer qCustomer = QCustomer.customer;
SQLQuery findByIdQuery = qdslTemplate.newSqlQuery()
.leftJoin(qCustomer._addressCustomerRef, qAddress)
JPA and Repositories
Mediates between the domain and data mapping layers using a
collection-like interface for accessing domain objects.
Spring Data Repositories
• We remove the busy work of developing a repository
For Example…
public interface CustomerRepository {
Customer findOne(Long id);
Customer save(Customer customer);
Customer findByEmailAddress(EmailAddress emailAddress);
public class Customer {
@GeneratedValue(strategy = GenerationType.AUTO)
private Long id;
@Column(unique = true)
private EmailAddress emailAddress;
@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
@JoinColumn(name = "customer_id")
private Set<Address> addresses = new HashSet<Address>();
// constructor, properties, equals, hashcode omitted for brevity
Traditional JPA Implementation
public class JpaCustomerRepository implements CustomerRepository {
private EntityManager em;
public Customer findOne(Long id) {
return em.find(Customer.class, id);
public Customer save(Customer customer) {
if (customer.getId() == null) {
return customer;
} else {
return em.merge(customer);
Traditional JPA Implementation
. . .
public Customer findByEmailAddress(EmailAddress emailAddress) {
TypedQuery<Customer> query =
em.createQuery("select c from Customer c where c.emailAddress = :email", Customer.class);
query.setParameter("email", emailAddress);
return query.getSingleResult();
Spring Data Repositories
• A simple recipe
Map your POJO using JPA
Extend a repository (marker) interface or use an annotation
Add finder methods
Configure Spring to scan for repository interfaces and
create implementations
• Inject implementations into your services and use as
Spring Data Repository Example
public interface CustomerRepository extends Repository<Customer, Long> {
Customer findOne(Long id);
Customer save(Customer customer);
Customer findByEmailAddress(EmailAddress emailAddress);
@RepositoryDefinition(domainClass=Customer.class, idClass=Long.class)
public interface CustomerRepository { . . . }
// Marker Interface
Spring Data Repository Example
• Boostratp with JavaConfig
public class ApplicationConfig {
• Or XML
<jpa:repositories base-package="com.oreilly.springdata.jpa" />
• And Spring will create an implementation the interface
Spring Data JPA - Usage
• Wire into your transactional service layer as normal
Query Method Keywords
• How does findByEmailAddres work…
Spring Data Repositories - CRUD
public interface CrudRepository<T, ID extends Serializable> extends Repository<T, ID> {
T save(T entity);
Iterable<T> save(Iterable<? extends T> entities);
T findOne(ID id);
boolean exists(ID id);
Iterable<T> findAll();
long count();
void delete(ID id);
void delete(T entity);
void delete(Iterable<? extends T> entities);
void deleteAll();
Paging, Sorting, and custom finders
public interface PagingAndSortingRepository<T, ID extends Serializable> extends
CrudRepository<T, ID> {
Iterable<T> findAll(Sort sort);
Page<T> findAll(Pageable pageable);
public interface PersonRepository extends CrudRepository<Person,BigInteger> {
// Finder for a single entity
Person findByEmailAddress(String emailAddress);
// Finder for a multiple entities
List<Person> findByLastnameLike(String lastName);
// Finder with pagination
Page<Person> findByFirstnameLike(String firstName, Pageable page);
Spring Data JPA – Customize Query Methods
• Query methods use method naming conventions
– Can override with Query annotation
– Or method name references JPA named query
public interface CustomerRepository extends CrudRepository<Customer,Long> {
// previous methods omitted…
@Query("select p from Person p where p.emailAddress = ?1")
Person findByEmailAddress(String emailAddress);
@Query("select p from Person p where p.firstname = :firstname or p.lastname = :lastname")
Person findByLastnameOrFirstname(@Param("lastname") String lastname,
@Param("firstname") String firstname);
Spring Data JPA – Other features
Specifications using JPA Criteria API
LockMode, override Transactional metadata, QueryHints
Auditing, CDI Integration
QueryDSL support
Querydsl and JPA
• Easier and less verbose and JPA2 Criteria API
– “equals property value” vs. “property equals value”
– Operations via a builder object
CriteriaBuilder builder = entityManagerFactory.getCriteriaBuilder();
CriteriaQuery<Person> query = builder.createQuery(Person.class);
Root<Person> men = query.from( Person.class );
Root<Person> women = query.from( Person.class );
Predicate menRestriction = builder.and(
builder.equal( men.get( Person_.gender ), Gender.MALE ),
builder.equal( men.get( Person_.relationshipStatus ), RelationshipStatus.SINGLE )
Predicate womenRestriction = builder.and(
builder.equal( women.get( Person_.gender ), Gender.FEMALE ),
builder.equal( women.get( Person_.relationshipStatus ),RelationshipStatus.SINGLE )
query.where( builder.and( menRestriction, womenRestriction ) );
Querydsl and JPA
JPAQuery query = new JPAQuery(entityManager);
QPerson men = new QPerson("men");
QPerson women = new QPerson("women");
Querydsl Predicates
query.from(men, women).where(men.gender.eq(Gender.MALE),
QueryDSL - Repositories
public interface QueryDSLPredicateExecutor<T>
long count(com.mysema.query.types.Predicate predicate);
T findOne(Predicate predicate);
List<T> findAll(Predicate predicate);
List<T> findAll(Predicate predicate, OrderSpecifier<?>... orders);
Page<T> findAll(Predicate predicate, Pageable pageable);
public interface ProductRepository extends Repository<Product,Long>,
QueryDslPredicateExecutor<Product> { … }
Product iPad = productRepository.findOne("iPad"));
Predicate tablets = product.description.contains("tablet");
Iterable<Product> result = productRepository.findAll(tablets);
Tooling Support
Code Tour - JPA
NoSQL Data Models
• Familiar, much like a hash table
• Redis, Riak, Voldemort,…
• Amazon Dynamo inspired
Column Family
• Extended key/value model
– values can also be key/value pairs
• HBase, Cassandra
• Google Bigtable inspired
• Collections that contain semi-structured data: XML/JSON
• CouchDB, MongoDB
{ id: ‘4b2b9f67a1f631733d917a7b"),’
author: ‘joe’,
tags : [‘example’, ‘db’],
comments : [ { author: 'jim', comment: 'OK' },
{ author: ‘ida', comment: ‘Bad' }
{ id: ‘4b2b9f67a1f631733d917a7c"),
author: ‘ida’, ...
{ id: ‘4b2b9f67a1f631733d917a7d"),
author: ‘jim’, ...
• Nodes and Edges, each of which may have properties
• Neo4j, Sones, InfiniteGraph
• Advanced key-value store
• Values can be
– Strings (like in a plain key-value store).
– Lists of strings, with O(1) pop and push operations.
– Sets of strings, with O(1) element add, remove, and existence
– Sorted sets that are like Sets but with a score to take elements
in order.
– Hashes that are composed of string fields set to string values.
• Operations
– Unique to each data type – appending to list/set, retrieve slice
of a list…
– Many operations performed in (1) time – 100k ops/sec on
entry-level hardware
– Intersection, union, difference of sets
– Redis is single-threaded, atomic operations
• Optional persistence
• Master-slave replication
• HA support coming soon
Spring Data Redis
• Provide ‘defacto’ API on top of multiple drivers
• RedisTemplate
– Connection resource management
– Descriptive method names, grouped into data type categories
• ListOps, ZSetOps, HashOps, …
– No need to deal with byte arrays
• Support for Java JDK, String, JSON, XML, and Custom serialization
– Translation to Spring’s DataAccessException hierarchy
• Redis backed Set, List, Map, capped Collections, Atomic Counters
• Redis Messaging
• Spring 3.1 @Cacheable support
• List Operations
RedisTemplate<String, Person> redisTemplate;
Person p = new Person("George", “Carlin");
redisTemplate.opsForList().leftPush("hosts", p);
Redis Support Classes
• JDK collections (java.util & java.util.concurrent)
– List/Set/(Blocking)Queues/(Blocking)Deque
Set<String> t = new DefaultRedisSet<String>(“timeline“, connection);
t.add(new Post("john", "Hello World"));
RedisSet<String> fJ = new DefaultRedisSet<String>("john:following", template);
RedisSet<String> fB = new DefaultRedisSet<String>("bob:following", template);
// followers in common
Set s3 = fJ.intersect(fB);
• Atomic Counters
– AtomicLong & AtomicInteger backed by Redis
Code Tour - Redis
• Column-oriented database
– Row points to “columns” which are actually key-value pairs
– Columns can be grouped together into “column families”
• Optimized storage and I/O
• Data stored in HDFS, modeled after Google BigTable
• Need to define a schema for column families up front
– Key-value pairs inside a column-family are not defined up
Using HBase
$ ./bin/hbase shell
> create 'users', { NAME => 'cfInfo'}, { NAME => 'cfStatus' }
> put 'users', 'row-1', 'cfInfo:qUser', 'user1'
> put 'users', 'row-1', 'cfInfo:qEmail', [email protected]'
> put 'users', 'row-1', 'cfInfo:qPassword', 'user1pwd'
> put 'users', 'row-1', 'cfStatus:qEmailValidated', 'true‘
> scan 'users'
row-1 column=cfInfo:qEmail, timestamp=1346326115599, [email protected]
row-1 column=cfInfo:qPassword, timestamp=1346326128125, value=user1pwd
row-1 column=cfInfo:qUser, timestamp=1346326078830, value=user1
row-1 column=cfStatus:
Configuration configuration = new Configuration(); // Hadoop configuration
HTable table = new HTable(configuration, "users");
Put p = new Put(Bytes.toBytes("user1"));
p.add(Bytes.toBytes("cfInfo"), Bytes.toBytes("qUser"), Bytes.toBytes("user1"));
Configuration configuration = new Configuration(); // Hadoop configuration
HTable table = new HTable(configuration, "users");
Put p = new Put(Bytes.toBytes("user1"));
p.add(Bytes.toBytes("cfInfo"), Bytes.toBytes("qUser"),
p.add(Bytes.toBytes("cfInfo"), Bytes.toBytes("qEmail"),
Bytes.toBytes("[email protected]"));
p.add(Bytes.toBytes("cfInfo"), Bytes.toBytes("qPassword"),
• HTable class is not thread safe
• Throws HBase-specific exceptions
Spring Hadoop - HBase
• Configuration support
• HBaseTemplate
– Resource Management
– Translation to Spring’s DataAccessException hierarchy
– Lightweight Object Mapping similar to JdbcTemplate
• RowMapper, ResultsExtractor
– Access to underlying resource
• TableCallback
HBaseTemplate - Configuration
<configuration id="hadoopConfiguration">
<hbase-configuration id="hbaseConfiguration"
configuration-ref="hadoopConfiguration" />
<beans:bean id="hbaseTemplate"
<beans:property name="configuration" ref="hbaseConfiguration" />
HBaseTemplate - Save
public User save(final String userName, final String email, final String password) {
return hbaseTemplate.execute(tableName, new TableCallback<User>() {
public User doInTable(HTable table) throws Throwable {
User user = new User(userName, email, password);
Put p = new Put(Bytes.toBytes(user.getName()));
p.add(CF_INFO, qUser, Bytes.toBytes(user.getName()));
p.add(CF_INFO, qEmail, Bytes.toBytes(user.getEmail()));
p.add(CF_INFO, qPassword, Bytes.toBytes(user.getPassword()));
return user;
HBaseTemplate – POJO Mapping
private byte[] qUser = Bytes.toBytes("user");
private byte[] qEmail = Bytes.toBytes("email");
private byte[] qPassword = Bytes.toBytes("password");
public List<User> findAll() {
return hbaseTemplate.find(tableName, "cfInfo", new RowMapper<User>() {
public User mapRow(Result result, int rowNum) throws Exception {
return new User(Bytes.toString(result.getValue(CF_INFO, qUser)),
Bytes.toString(result.getValue(CF_INFO, qEmail)),
Bytes.toString(result.getValue(CF_INFO, qPassword)));
Code Tour - HBase
Document Database
– JSON-style documents
– Schema-less
Documents organized in collections
Full or partial document updates
Index support – secondary and compound
Rich query language for dynamic queries
GridFS for efficiently storing large files
Geo-spatial features
Map/Reduce for aggregation queries
– New Aggregation Framework in 2.2
Replication and Auto Sharding
Spring Data - MongoDB
• MongoTemplate
– Fluent Query, Criteria, Update APIs
– Translation to Spring’s DataAccessException hierarchy
Cross-store persistence
Log4J Logging Adapter
MongoOperations Interface
MongoTemplate - Usage
MongoTemplate - MapReduce
• Sample document
{ "_id" : ObjectId("4e5ff893c0277826074ec533"), "x" : [ "a", "b" ] }
{ "_id" : ObjectId("4e5ff893c0277826074ec534"), "x" : [ "b", "c" ] }
{ "_id" : ObjectId("4e5ff893c0277826074ec535"), "x" : [ "c", "d" ] }
• MapFunction – count the occurance of each letter in the
function () {
for (var i = 0; i < this.x.length; i++) {
emit(this.x[i], 1);
MongoTemplate - MapReduce
• Reduce Function – sum up the occurrence of each letter
across all docs
function (key, values) {
var sum = 0;
for (var i = 0; i < values.length; i++)
sum += values[i];
return sum;
• Execute MapReduce
MapReduceResults<ValueObject> results = mongoOperations.mapReduce("collection",
Mapping Annotations
• @Document
– Marks an entity to be mapped to a document (optional)
– Allows definition of the collection the entity shall be
persisted to
– Collection name defaults to simple class name
• @Id
– Demarcates id properties
– Properties with names id and _id auto-detected
Mapping Annotations
• @Index / @CompoundIndex
– Creates Indexes for one or more properties
• @Field
– Allows customizing the key to be used inside the document
– Define field order
• @DBRef
– Creates references to entities in separate collection
– Opposite of embedding entities inside the document
Mongo Repositories
• Same as before with JPA
• Added functionality that is MongoDB specfic
– Geolocation, @Query
public interface ProductRepository extends CrudRepository<Product, Long>,
QueryDslPredicateExecutor<Product> {
Page<Product> findByDescriptionContaining(String description, Pageable pageable);
@Query("{ ?0 : ?1 }")
List<Product> findByAttributes(String key, String value);
Code Tour - Mongo
• Graph Database – focus on connected data
– The social graph…
• Schema-free Property Graph
• ACID Transactions
• Indexing
• Scalable ~ 34 billion nodes and relationships, ~1M/traversals/sec
• REST API or embeddable on JVN
• High-Availability
• Declarative Query Language - Cypher
Spring Data Neo4j
• Use annotations to define
graph entitles
• Entity state backed by
graph database
• JSR-303 bean validation
• Query and Traversal API
• Cross-store persistence
– Part of object lives in
RDBMS, other in Neo4j
• Exception translation
• Declarative Transaction
• Repositories
• QueryDSL
• Spring XML namespace
• Neo4j-Server support
Classic Neo4j Domain class
Spring Data Neo4j Domain Class
public class Tag {
@GraphId private Long id;
@Indexed(unique = true)
private String name;
Spring Data Neo4j Domain class
public class Tag {
@GraphId private Long id;
@Indexed(unique = true)
private String name;
• @NodeEntity
– Represents a node in the
– Fields saved as properties
on node
– Instantiated using Java
‘new’ keyword, like any
– Also returned by lookup
– Type information stored in
the graph
Spring Data Neo4j Domain Class
public class Customer {
@GraphId private Long id;
private String firstName, lastName;
private String emailAddress;
private Set<Address> addresses = new HashSet<Address>();
Resource Management
Convenience Methods
Declarative Transaction Management
Exception Translation to DataAccessException hierarchy
Works also via REST with Neo4j-Server
Multiple Query Languages
– Cypher, Gremlin
• Fluent Query Result Handling
Neo4jTemplate - Usage
Customer dave = Customer("Dave", "Matthews", "[email protected]"));
Product iPad = Product("iPad", "Apple tablet device").withPrice(499));
Product mbp = Product("MacBook Pro", "Apple notebook").withPrice(1299)); Order(dave).withItem(iPad,2).withItem(mbp,1));
<bean id="graphDatabaseService"
<constructor-arg value="http://localhost:7474/db/data" />
<neo4j:config graphDatabaseService="graphDatabaseService" />
• Implicitly creates a Neo4jTemplate instance in the app
Spring Data REST
• Export CrudRepository methods via REST semantics
– PUT, POST = save()
– GET = find*()
– DELETE = delete*()
• Support JSON as the first-class data format
• JSONP and JSONP+E support
• Implemented as Spring MVC application
Spring Data REST
• Discoverability
– “GET /” results in a list of resources available from this level
• Resources are related to one another by “links”
– Links have a specific meaning in different contexts
– HTML and Atom synidcation format has <link rel=“” href=“”/>
• Use Spring HATEOAS as basis for creating
Spring Data REST - Example
curl -v http://localhost:8080/spring-data-rest-webmvc/
"links" : [{
"rel" : "person",
"href" : "http://localhost:8080/spring-data-rest-webmvc/person"
curl -v http://localhost:8080/spring-data-rest-webmvc/person
"content": [ ],
"links" : [ {
"rel" : "",
"href" : "http://localhost:8080/spring-data-rest-webmvc/person/search"
} ]
Spring Data REST - Example
curl -v http://localhost:8080/spring-data-rest-webmvc/person/search
"links" : [ {
"rel" : "person.findByName",
"href" : "http://localhost:8080/spring-data-restwebmvc/person/search/findByName"
} ]
curl -v http://localhost:8080/spring-data-restwebmvc/person/search/findByName?name=John+Doe
[ {
"rel" : "person.Person",
"href" : "http://localhost:8080/spring-data-rest-webmvc/person/1"
} ]
Spring Data REST - Example
curl -v http://localhost:8080/spring-data-rest-webmvc/person/1
"name" : "John Doe",
"links" : [ {
"rel" : "profiles",
"href" : "http://localhost:8080/spring-data-restwebmvc/person/1/profiles"
}, {
"rel" : "addresses",
"href" : "http://localhost:8080/spring-data-restwebmvc/person/1/addresses"
}, {
"rel" : "self",
"href" : "http://localhost:8080/spring-data-rest-webmvc/person/1"
} ],
"version" : 1
Spring for Hadoop - Goals
• Hadoop has a poor out of the
box programming model
• Applications are generally a
collection of scripts calling
command line apps
By providing a familiar and
consistent programming and
configuration model
Across a wide range of use cases
HDFS usage
Data Analysis
• Spring simplifies developing
Hadoop applications
– Workflow (Spring Batch)
– Event Streams (Spring Integration)
Allowing you to start small and
Relationship with other Spring Projects
Free Spring Data JPA Chapter –
O’Reilly Spring Data Book -
• Spring Data
• Querydsl
• Example Code
– Many more listed on individual project pages
Thank You!

SpringSource 2GX 2009