DEV Community

ouzkagan
ouzkagan

Posted on

Some Useful Pymongo Snippets

## Table Of Contents
1. Display all databases available
2. Display all collections available
3. Display one document from collection
4. Display number of documents in the collection
5. Display top 10 document's specific field
6. Find 10 first authors name in ascending alphabetical order
7. Display the quantity of documents that has not regex pattern
8. Display/find number of documents uploaded between dates
9. Find documents by text search
10. Find documents includes pattern/regex in a field
11. Group by field and count documents, then sort by best
12. Update document if exists otherwise insert new document

I have a papers collection stored in Mongodb Atlas database. Example document:

{
    '_id': ObjectId('5fa9a4db76fdd8d66273c643'),
    'id': '0704.0001',
    'submitter': 'Pavel Nadolsky',
    'authors': "C. Bal\\'azs, E. L. Berger, P. M. Nadolsky, C.-P. Yuan",
    'title': 'Calculation of prompt diphoton production cross sections at Tevatron and\n  LHC energies',
    'comments': '37 pages, 15 figures; published version',
    'journal-ref': 'Phys.Rev.D76:013009,2007',
    'doi': '10.1103/PhysRevD.76.013009',
    'report-no': 'ANL-HEP-PR-07-12',
    'categories': 'hep-ph',
    'abstract': '  A fully differential calculation in perturbative quantum chromodynamics is\npresented for the production of massive photon pairs at hadron colliders. All\nnext-to-leading order perturbative contributions from quark-antiquark,\ngluon-(anti)quark, and gluon-gluon subprocesses are included, as well as\nall-orders resummation of initial-state gluon radiation valid at\nnext-to-next-to-leading logarithmic accuracy. The region of phase space is\nspecified in which the calculation is most reliable. Good agreement is\ndemonstrated with data from the Fermilab Tevatron, and predictions are made for\nmore detailed tests with CDF and DO data. Predictions are shown for\ndistributions of diphoton pairs produced at the energy of the Large Hadron\nCollider (LHC). Distributions of the diphoton pairs from the decay of a Higgs\nboson are contrasted with those produced from QCD processes at the LHC, showing\nthat enhanced sensitivity to the signal can be obtained with judicious\nselection of events.\n',
    'update_date': '2008-11-26',
    'authors_parsed': [
        ['Balázs', 'C.', ''],
        ['Berger', 'E. L.', ''],
        ['Nadolsky', 'P. M.', ''],
        ['Yuan', 'C. -P.', '']
    ]
}
Enter fullscreen mode Exit fullscreen mode

There are common use cases for my database related projects. So I prepared a list of snippets to be helpful when I needed it.

Initialization of pymongo:

from pymongo import MongoClient

#Connection to the Database
full_dns_name = 'mongodb://***'
username = 'test'
password = 'test'
authSource = 'admin'

client = MongoClient(host=full_dns_name, username=username, password=password, authSource=authSource)
Enter fullscreen mode Exit fullscreen mode

1. Display all databases available:

#display all databases available
db_list = list(client.list_databases())
print(db_list)

#or

for db in client.list_databases():
    print(db)
Enter fullscreen mode Exit fullscreen mode

2. Display all collections available:

#display all collections available
for db in client.list_databases():
    name = db['name']
    for col in client[name].list_collections():
        print(col)
Enter fullscreen mode Exit fullscreen mode

3. Display one document from collection:

#display one document from "Papers" collection
db = client.arxiv
papers_col = db.papers
doc = papers_col.find_one()
print(doc)
Enter fullscreen mode Exit fullscreen mode

4. Display number of documents in the collection:

#display number of documents in the collection
number_of_doc = papers_col.count_documents({})
print(number_of_doc)
Enter fullscreen mode Exit fullscreen mode

5. Display top 10 document's specific field:

#display 10 articles titles
articles = list(papers_col.find({}, {'title': 1}).limit(10))
Enter fullscreen mode Exit fullscreen mode

6. Find 10 first authors name in ascending alphabetical order:

from pymongo import ASCENDING
#"Submitter" attribute is author's name
#Display 10 first authors name in ascending alphabetical order

# sort, get 10
articles = list(
    papers_col.find({}, {'submitter': 1})
    .limit(10)
    .sort([('submitter', ASCENDING)])
)
print(articles)
Enter fullscreen mode Exit fullscreen mode

7. Display the quantity of documents that has not regex pattern:

#Display the quantity of articles that has not published by "Damien Chablat"
pattern = re.compile(r'Damien Chablat')
articles = papers_col.count_documents({ 'submitter': { '$not': pattern } } )
print(articles)
Enter fullscreen mode Exit fullscreen mode

8. Display/find number of documents uploaded between dates:

#"update_date" attibute contain documents upload date informations (yyyy-mm-dd format)
# Display number of article upload on 2014

from datetime import date

first_date = date.isoformat(date(2014,1,1))
last_date = date.isoformat(date(2015,1,1))

articles_count = papers_col.count_documents({'update_date':{'$gte':first_date,'$lt':last_date}})
print(articles_count)
Enter fullscreen mode Exit fullscreen mode

9. Find documents by text search:

# Display an article title where "Machine Learning" is metionned in the abstract

papers_col.create_index([("abstract", TEXT)])
articles = papers_col.find({"$text":{"$search": "Machine Learning"}},{'abstract':1})
Enter fullscreen mode Exit fullscreen mode

10. Find documents includes pattern/regex in a field:

# Display an article title where "Machine Learning" is metionned in the abstract

pattern = re.compile(r'Machine Learning')
articles = papers_col.find({ 'abstract': { '$regex': pattern } } )

print(list(articles))
Enter fullscreen mode Exit fullscreen mode

11. Group by field and count documents, then sort by best:

#Display the amount of publications/articles for the 10 best submitters

pipeline = [
    { "$group": {"_id": "$submitter", "count": {"$sum": 1}} },
    { "$sort": { "count": -1 } },
    { '$limit': 10 }
]

articles = list(papers_col.aggregate(pipeline))
print(articles)
Enter fullscreen mode Exit fullscreen mode

12. Update document if exists otherwise insert new document:

def  update_or_create_paper(paper_data):
    # update 'data' if custom 'id' exists otherwise insert new document
    return collection.find_one_and_update({"id": paper_data['id']},
                               {"$set": {"data": {**paper_data}}},
                               upsert=True)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)