DEV Community: MKSeymour

Love to write code, don't like writing comments...Intro to Docly

MKSeymour — Mon, 19 Oct 2020 22:19:49 +0000

Docly is a useful code commenting tool. It generates docstrings for your python functions to improve readability and maintainability.

It can be run in two ways. The first one just checks to see if there are any comments that need to be modified. The second way actually modifies the comments.

Beta is now open, you can test Docly on your own code 👉http://thedocly.io/

Docly yours 😉,

1 line of code to check all your code documentation

MKSeymour — Mon, 13 Jul 2020 16:11:37 +0000

If {
✅ you have code in Python
✅ you are not sure which documentation of the code is reliable
✅ don't have time to check for it manually
}

Then : definitely look for codeBERT library

codeBERT is an open-source package to automatically check if your code documentation is up-to-date. codeBERT uses Machine Learning on code to classify the docstring of functions.
Given a function f and a doc string d a code-bert predicts whether f and d are matching or not.

CodistAI

@aicodist

A few days ago @Altimor showed how GPT3 could write code.
Now, this is how #codeBERT can read your code and automatically review your documentation.
When #MLonCode can help get rid of the documentation pain!
🕹 Repo: github.com/autosoft-dev/c…
🎬youtu.be/oDqW1JHmaYY

#codequality

23:46 PM - 06 Jul 2020

4 5

So far the accuracy of the library is pretty stable around 93-95% when it comes to detect if a docstring is outdated.

New NLP technics increase gender bias by 40%

MKSeymour — Sun, 21 Jun 2020 12:43:08 +0000

I was looking for engineers' pictures on google images. Over the first 30 images: 24 were representing men only, 3 women only and 1 a group of developers.

I'm pretty sure the ratio of women in developers' population if way higher than 10%. Yet my browser was utterly biased towards male representation. I tried with other queries like pre-school teacher, college teacher, nurse... they were all gender biased.

This is not a big surprise, Information Retrieval models learn the bias of the dataset they are trained on. What's surprising is that this is not getting any better over time, on the contrary. According to a recent study from Kepler University, the newest NLP models (like BERT) intensify gender bias.

NLP models ingests gender-bias

Over the last 2 years, increasingly performant Natural Language Processing (NLP) algorithms have come out (like BERT or BLEURT). Computer can learn better than ever about languages and their meaning. Give a model a text it will sum it up for you and answer any questions. State-of-the-art NLP models can infer a lot about the world where are living it. And this world is full of bias.

The bias in AI models is the inevitable accumulation of biases in human history and language.

When you train an NLP model, as any ML model, you feed it with data which are text. The model will learn the relationships between words. For example, {nurse- doctor - hospital} or {sand-beach} will be highly correlated. So using that model, if you make a search about 'hospital', you will get results of doctors and nurses along.

Problem is that the data we feed in is biased. For example, a text about 'doctors' will mainly use words related to men (he, man, guy...). So the model trained will learn that there is a correlation between 'doctor' and 'man'.

Documents have gender, and models learn them.

What is the gender of my document?

For any document, you can calculate the magnitude of the gender represented in it. Measure degree of presence of male-female concepts in it. There are different tehcnics. I'll take the TF (Term Frequency) method which is similar to TF used for word embedding. With that method, you define a set of words correlated with one of the gender female or male (e.g female = {she, woman, lady, pregnant...}, male = {he, man, sir, father ...} ). The TF method is about counting the frequency of the set of terms standing for each gender.
For document d we get magnitude of female an male in the document.

By subtracting male and female magnitude we get the overall magnitude of a document. This way of calculation treats gender as a signed value. If positive the document is about man mainly and if negative, about female mainly.

Late-NLP models make gender-bias worse

Kepler University study put forward that the more elaborate and performant was a NLP model, the more it will intensify the gender bias.

They took a set of queries made on Bing and the anwsers provided. Same way you can classify the gender of a document, you can also classify the gender of a query. So they get a set of male, female, non-gendered and multiple genders queries.

They created Informaton Retrieval models based on different embeddingg technics (BM25, KNMR, MatchPyramid, BERT...). For the same set of queries,they compared the gender-bias induce by the different NLP embedding technics.

It was pretty surprising, all queries answers were biased towards male and the latest models like BERT increased the bias by 40%.

AI is a tool and as an AI engineer, we must always consider who will be using our systems and for what purpose.

How can we reduce gender bias in NLP systems

An 'easy' way to fight it, is to use more balanced datasets. Gender-swapping technic is a solution, it consists of swapping the pronoun of a sentence to create more consistent data for the female gender. For example, 'he is a doctor' would be replaced by 'she is a doctor'.

This topic is becoming a quickly-growing area of research. Google recently released their method to reduce gender-bias in transation: A scalable approach to reduce gender-bais (Apr, 2020). Since 2018 they've been working on reducing gender bias for neutral languages like English but were struggling to make it scale to other languages. To my knowledge, this technic is applied to translation and not yet to browsing.

As our awareness grows, we must realize our role in creating technology that works for the many, not the few.

Dev tools 1.0 : we came a looong way 🐱‍👤

MKSeymour — Mon, 04 May 2020 14:19:43 +0000

Let’s do some digging into the first-generation of computers, and meet with punched-cards. Back then, they were the basic unit of data.

If you dreamed about running a program, you’d rather be patient and highly careful. This is why. First, as a developer, you would write your instructions in an old programming language (like Cobol or Fortran). It was written on a piece of paper, the coding sheet. Then you’d pass over your coding sheet to a keypunch operator that would “encode” it into a source deck. The source deck was a bunch of punched cards and each card was one line of instruction 😥 Then (yes I know that’s a lot)…the source deck was compiled by the computer operator into a program deck. And finally, that program deck could be fed to a computer to run!

😣 Have you ever seen a program running perfectly on the first try?… Well you can be sure you had to repeat the whole process again and again and again before it worked.

These steps were individually really long and cumbersome. The first generation of dev tools helped to solve these challenges.

To fix bugs, programmers had to go through the source deck. Initially, it was just a bunch of cards with some square holes. Reading back the original program was taking ages. So, keypunch printers were invented. On top of each punched card the translation in natural language was written. It was the first program reading interface.

What about the code complexity? With one instruction per card, it was easy to get entangled in complex programs. To monitor the complexity programmers stored the source deck in boxes (first source code hosting system 😎). The size of the boxes was actually built-in to fit the maximum batch size for a computer to run. This was an easy way for programmers to see when they had to refactor their code.

At the early stages of Computers, programmers were dealing daily with cards, boxes, rubber bands, and request forms. That might not seem highly sexy but dev tools had to be physical back then.

🏓 If you want to play with old school punched card, you should give a try to The Virtual Keypunch (https://www.masswerk.at/keypunch/)

Thanks for reading!