I want to make life of job-changers and human-researchers easier. I want to make easily searchable projects view. I want to have it on Elasticsearch, cause I want to proof, you can easily do something small with it, so it’s not only for very BIG data projects.
Following piece of code available is on Github.
Run on your Linux machine
It’s easy as piece of cake. You only need docker
and it’s younger brother docker-compose
. Then just this little script allows me to run whole the machine.
version: "2"
services:
kibana:
image: kibana
ports:
- 5601:5601
elasticsearch:
image: elasticsearch
Write in YAML, index to ES
Thinking of domain right now, what do I need when thinking of my Bio? I will simplify it to project
object. This could be exemplary bio of someone..
Projects:
Commercial:
Truck browser:
started at: 2013-01
finished at: 2013-06
tasks:
- writing code
- testing my code
learned:
- JDK8
- Junit Test version 5 is cool
challenges:
- dealing with frontside, CSS :(
- hard to deploy on prod server
technologies:
- Spring 3.2
- JDK8
- handmade JS and CSS
measure of success:
- 90% of test coverage
- 2 hundred of happy users in Polish workshops
Private:
Mini REST service for my CD Collection:
started at: 2013-02
# still working on it
tasks:
- care about whole app
learned:
- NodeJS + Express = fast web or Rest app written in JS!
challenges:
- which library on npm to choose?!
technologies:
- NodeJS
- Express
- javascript
measure of success:
- REST app in 6 days
How to index it? First, make a JSON out of it.
Python in data wranigling and ES I/O
Data wrangiling or conversion is so easy. Let’s make JSON out of Yaml notation here. With yaml library it’s easy.
import yaml
fname = "projects.yml"
with open(fname) as f:
doc = yaml.load(f)
Then let’s proof we can connect to ElasticSearch. After importing official library, we can see this line
luk@luk-UX410UAK:~/prj/searchmybio$ python indexmyprojects.py
--Listing commercial projects--
Truck browser
--Listing side-projects--
Mini REST service for my CD Collection
{'name': 'HmNOtOU', 'cluster_name': 'elasticsearch', 'cluster_uuid': 'tRdhrV3gR0OnUfZMkPrpqQ', 'version': {'number': '5.5.0', 'build_hash': '260387d', 'build_date': '2017-06-30T23:16:05.735Z', 'build_snapshot': False, 'lucene_version': '6.6.0'}, 'tagline': 'You Know, for Search'}
Good, connection is done!
Index time!
Before we’ll start to index docs, we have to arrange place for them. Any schema? No.. Elasticsearch can do it for us. We just have to create index, easily with some py script. After that..
res = es.indices.create("searchmybio_luke")
..we can query ES in Kibana tools. So let’s now fill it! After few refactors we are ready to insert or PUT docs to their place. Indexing from within a script is done by this one-liner: All docs are there. Any proof? Query it:
GET searchmybio_luke/project/_search
{
"query": {
"match": {
"technologies": "nodejs"
}
}
}
Even you can write test and run it against this index.
query_find_nodejs = {
"query": {
"match": {
"technologies": "nodejs"
}
}
}
res = self.es.search(index='searchmybio_luke', doc_type='project', body=query_find_nodejs)
hits = res['hits']['hits']
self.assertEquals(len(hits), 1)
That’s all. In less then one hour we were able to run Elasticsearch on our machine and index docs from yaml directly to index.
Here is screenshot from my Kibana
Top comments (0)