1. MONGODB
MongoDB is an open-source document database
with built in replication, high availability, auto-sharding and map/reduce
mechanisms.Install MongoDB from
http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-x/
2. ELASTICSEARCH
Elastic search is a powerful open
source, real-time search and analytics engine, designed from the ground up to
be used in distributed environments with reliability and scalability as
a must have. Looks great as a search engine.
Export ES_HOME=/Users/xxx/Downloads/elasticsearch-1.4.2
3. MONGODB RIVER PLUGIN FOR ES
Elastic search provides ability to enhance the
basic functionality by plugins, which are easy to use and develop. They can be
used for analysis, discovery, monitoring, data synchronization and many others.
Rivers is a group of plugins used for data synchronization between
database and elastic search
The
first is a dependency called Mapper Attachments. You can install via the ES
plugin script:
$ES_HOME/bin/plugin
-install elasticsearch/elasticsearch-mapper-attachments/2.4.3
The second plugin is the ES 'river' for Mongo. The syntax to install it is slightly different as it's a third-party plugin:
$ES_HOME/bin/plugin
-install
com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.9
Restart the server
sh elastic search restart
Elastic Search is on board and running –
3 minutes in my case. It’s recommended to use the second machine to avoid sharing
resources, but for test deployments a single one is good enough.
Tell elastic search to index the “person”
collection in testmongo database by issuing the following command in your
terminal.
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta'
-d '{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "person"
},
"index": {
"name": "mongoindex",
"type": "person"
}
}'
We’ve got ElasticSearch automatically synchronizing
data with MongoDB
2 minutes in my case.
4. Setup MONGODB As Replica Set.
MongoDB -- setting up replica set on the local host( Mac OS X)
mongod --port 27017
--dbpath /data/db --replSet rs0
rs.initiate() -- > this will initiate new replica set.
Create a Mongo Database: testmongo
Create a Mongo Collection under testmongo: person
5. Now finally start the Search.
Use this command to
search the data from terminal or go to URL directly to see below response
Response from Elastic
Search :
{
"took": 42,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index":
"mongoindex",
"_type": "person",
"_id":
"552ee22d9f829d905d5f180f",
"_score": 1.4054651,
"_source": {
"lastName":
"Doe",
"_id":
"552ee22d9f829d905d5f180f",
"firstName":
"John"
}
}
]
}
}
After inserting document in MongoDB configured
as replica set, it is also stored in oplog collection.The mentioned collection
is operations log configured as capped collection, which keeps a rolling
record of all operations that modify the data stored in databases. River plugin
monitors this collection and forwards new operations to elasticsearch according
to its configuration. That means that all insert, update and delete operations
are forwarded to elasticsearch automatically..
We can easily check what we have in ES using head plugin, which can
be installed with the help of command:
./plugin -install mobz/elasticsearch-head
Some elasticsearch plugins provide web interface that can be reached
using endpoint /_plugin:
Summarizing, we have MongoDB configured as replica set, Elasticsearch with River that pulls data from database to index, and finally everything is prepared for sharding and replication.