Thursday, April 16, 2015

ElasticSearch and Mongo through river – scalable data store and search engine platform

1. MONGODB
MongoDB is an open-source document database with built in replication, high availability, auto-sharding and map/reduce mechanisms.Install MongoDB from 
http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-x/

2. ELASTICSEARCH
Elastic search is a powerful open source, real-time search and analytics engine, designed from the ground up to be used in distributed environments with reliability and scalability as a must have. Looks great as a search engine.

Download Elastic Search from: https://www.elastic.co OR Use below command from terminal.
Export ES_HOME=/Users/xxx/Downloads/elasticsearch-1.4.2

3. MONGODB RIVER PLUGIN FOR ES
Elastic search provides ability to enhance the basic functionality by plugins, which are easy to use and develop. They can be used for analysis, discovery, monitoring, data synchronization and many others. Rivers is a group of plugins used for data synchronization between database and elastic search
The first is a dependency called Mapper Attachments. You can install via the ES plugin script:
$ES_HOME/bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/2.4.3

The second plugin is the ES 'river' for Mongo. The syntax to install it is slightly different as it's a third-party plugin:
$ES_HOME/bin/plugin -install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.9

Restart the server 
                        sh elastic search restart

Elastic Search is on board and running – 3 minutes in my case. It’s recommended to use the second machine to avoid sharing resources, but for test deployments a single one is good enough.

Tell elastic search to index the “person” collection in testmongo database by issuing the following command in your terminal.

    "type": "mongodb",
    "mongodb": {
        "db": "testmongo",
        "collection": "person"
    },
    "index": {
        "name": "mongoindex",
        "type": "person"
    }
}'

We’ve got ElasticSearch automatically synchronizing data with MongoDB
2 minutes in my case.

4. Setup MONGODB As Replica Set.

MongoDB -- setting up replica set on the local host( Mac OS X)

mongod --port 27017 --dbpath /data/db --replSet rs0
rs.initiate()  -- > this will initiate new replica set.

Create a Mongo Database: testmongo
Create a Mongo Collection under testmongo: person

5. Now finally start the Search.

Use this command to search the data from terminal or go to URL directly to see below response

Response from Elastic Search :

{
  "took": 42,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.4054651,
    "hits": [
      {
        "_index": "mongoindex",
        "_type": "person",
        "_id": "552ee22d9f829d905d5f180f",
        "_score": 1.4054651,
        "_source": {
          "lastName": "Doe",
          "_id": "552ee22d9f829d905d5f180f",
          "firstName": "John"
        }
      }
    ]
  }
}
After inserting document in MongoDB configured as replica set, it is also stored in oplog collection.The mentioned collection is operations log configured as capped collection, which keeps a rolling record of all operations that modify the data stored in databases. River plugin monitors this collection and forwards new operations to elasticsearch according to its configuration. That means that all insert, update and delete operations are forwarded to elasticsearch automatically..

We can easily check what we have in ES using head plugin, which can be installed with the help of command:
./plugin -install mobz/elasticsearch-head

Some elasticsearch plugins provide web interface that can be reached using endpoint /_plugin:





Summarizing, we have MongoDB configured as replica set, Elasticsearch with River that pulls data from database to index, and finally everything is prepared for sharding and replication.

Create ElasticSearch cluster on single machine

I wanted to figure out how to create a multi-node ElasticSearch cluster on single machine. So i followed these instructions First i did...