DEV Community

Cover image for GitHub's Copilot; Can the original author please stand up?
Tom Byrer
Tom Byrer

Posted on

GitHub's Copilot; Can the original author please stand up?

If something is free, you’re the product

~ Richard Serra, 1973

This famous line was in regards to the 'free' broadcast TeleVision programs that most everyone spent hours in front of per week, if not hours per day. (People watched news & entertainment on free broadcast stations before the internet and cable.) In USA, almost all programming was supported by advertising. Early ads were simple, but as more psychological research was done those ads became more and more effective in influencing public perceptions. Entire genres of TV were catered to specific groups. Early 'soap operas' that were shown mid-day targeted stay-at-home mothers were & are full of ads for household cleaning products. Saturday morning cartoons were full of ads to get children to get their parents to buy sugary cereals and the latest toys. In fact, some cartoons were blatantly 30 minute ads about toys (GI Joe, Transformers, etc).

Today, TV has has become more targeted thanks to cable and satellite offering 100s of different channels for every market. Channels just for sports fans, others just for women, children, SiFi fans, etc. You can notice how each channel runs the same type of ads constantly, eg news channels have many ads related medicine, since those watchers tend to be older folks concerned about their health.

Niche Marketing Techniques

I like to check out newsletters for web developers (since I'm a webdev myself) to check out the latest articles, videos, and software related to the platforms and programming languages I'm interested in. Most of the newsletters I read have ads for products, services, and job offers I sometimes are interested in. So it is a win-win-win; advertisers are curated by the newsletter author to deliver to a niche market content many people want to read both the free and paid content.

Freemium: Niche Marketing for Web Services

There is another 'advertising' model for many of the commercial services that we developers use; the 'free-tier' or 'freemium'. This model opens a limited service available to any & all with an email address, in hopes that you will buy the paid tiers later. This helps the users to test out products by spending only time. It also helps the service provider test out their platform, gets contact info for latter follow-ups, raise awareness, and increase community size. More on this later.

Code Sharing History

There were free code sharing platforms long before GitHub arrived. In the pre-internet days, programmers would use dial-up modems to connect to a BBS where you could download both programs an source code. There was 'sneaker-net', where people would copy and swap disks and tapes full of code. Early internet opened the door for FTP sites (often hosted by universities) to host archives of source code. Forums, IRC, and Usenet provided more 'social' avenues for people to share code, tips, and suggested revisions.

Sourceforge one of the larger portals to allow programmers both share their source code, compiled programs, and gather feedback to add more features and fix bugs. Their user-base grew as more larger projects (like Linux distros, audio, video and code editors) made Sourceforge their home. They seem to be supported by ads, but I still wonder how deep in the red they run. I still visit there on occasion, mostly for FileOptimizer.

GitHub Hits the Jackpot

GitHub had great timing; they took a semi-new but powerful revision tracking system (git) add a feedback forum (issues), hosting, and mini social profile (to help personalize) wrapped together by a back-end language that was quick to develop on (Ruby).

IMHO, their clean interface and generous ad-free freemium level attracted many many programmers and entire organizations to host their code on GitHub. They made git fun to use and easy to hack in quick changes. Their search and categories allowed many people to quickly find code to solve problems they had, or even provide new ideas. GitHub became the primary portal for code hosting and sharing, enabling many programmers and companies to flourish.

Though not thought of as a 'social media site' in the likes of Facebook, Twitter, etc, it is a place for many cultures to intermingle. Very common to people from different countries, religions, beliefs, etc to gather and create solutions together.

GitHub does have a paid tier that many pay for, but I doubt that alone was worth the $7.5billon MicroSoft paid for GitHub. It would be the user-base and all the code there that made GH worth that much.

But Free Code Is Not (Usually) Free

Most (but not all) code on GitHub has some level of copy protection. MIT, creative commons, GPL all have 'strings attached' to their license. Usually the requirement is that the copyright notice (& authors' name) is kept with copies of the code, but may include greater restrictions for commercial use... And these restrictions will carry over to derived programs. So if you use code that includes another library that has a dependency of a 'no-commercial-use' license and you want to use said code in your job or side-hustle, you may find yourself in a legal situation.

Trust me, lawyers and company heads do not like legal ambiguity. I've lost several job opportunities as soon as I mention OSS.

GitHub Gives Away Your Code Without Permission

I started to see sites and tools that scraped GitHub repos and presented that copyrighted code as uncopyrighted code without attribution, hmmm about a year ago (?) as top search results. But I've seen scrapper sites directly quote eintire StackOverflow answers long before that, so this wrong is nothing new.

But what is new, is a major corporation (MicroSoft/GitHub) doing the same thing. They call it 'AI trained' but like those 'find the traffic light' Captcha tests, humans are training the AI. & the results are sometimes direct quotes of code, which is unauthorized. Is some the code shown from fully 'all rights reserved' private codebases hosted on GitHub? One PM I received from a major contributor to npm (JavaScript libraries) is that others are 'seeing your code signature everywhere'.

(Another example)

I'm not the only one concerned:




Hopefully Copilot is not giving away passwords... but maybe they are? Or atleast encryption methods?

I'm Not a Lawyer

There is legal precedent for music that taking even a few bars of music is a copyright violation. We should keep this in mind...

Aside: Programmers are Almost Forced to 'Borrow'

Sad fact is; those using code-grabbing tools like this may end up being more productive, leaving those who don't use said tools behind....

Alternatives to GitHub

GitLab is seen an influx lately, since their git-platform is OSS that you can self-host. They do have a solid commercial product that Goldman Sachs, ticketmaster & others use.

There are other GitHub clones written in Go (Gitea) and V (Gitly)

Or Do Not Share at All

If your code is on the internet, someone can take it. People are willing to steal; I know someone who had his hardware reverse-engineered and 100s of clones made, putting him out of business.

Only real solution is not to share your code at all. Which makes me a bit sad, but in a world where doctors can't share medical advice on FaceBook anymore, and scientists get their conversations censored, this is the world we live in. :(

Top comments (3)

Collapse
 
matijasos profile image
Matija Sosic

This is a really good overview of the topic and this specific issue. Potentially leaking secrets from the code is especially worrying. Thanks for the informative write-up!

Collapse
 
ombratteng profile image
Ole-Martin Bratteng

I wonder how Copilot differs from similar AI assisted tools like tabnine.com/ and kite.com/

Both brag about building their model on millions of files. They must've gotten them somewhere.
Tabnine shows it by allowing you to search for code on their site. tabnine.com/code/javascript/module...

Collapse
 
tombyrer profile image
Tom Byrer

Good question.

Seems Copilot will emit entire blocks of code at once, vs just 1 line of the other tools?

There is another website/tool that displays entire blocks of code...