top of page

Is GitHub Using Your Code to Train AI? Copilot Facts Every Developer Should Know in 2025

  • Writer: emrerdin0
    emrerdin0
  • Aug 18
  • 3 min read
Warning: After reading this post, you will probably want to change your GitHub settings.

If you've been coding for the past two years, you've almost certainly encountered GitHub Copilot. This "AI-powered programmer's assistant" can predict the entirety of a single line you type, complete functions for you, and sometimes even write entire classes in the blink of an eye.


But have you ever wondered: Where does this miraculous power come from?

The answer may surprise you—and probably disturb you.


Copilot's "Secret" Data Source: Your Codes

It's not really a secret at all. Microsoft officially admits that it uses billions of lines of publicly available code on GitHub to train Copilot. This means that every line of code you've written in the past and shared publicly on GitHub is likely part of the massive dataset that feeds Copilot's brain.


github microsoft
Github Microsoft

Technical Fact: How Does Copilot "Think"?

Behind Copilot is a massive language model (LLM) derived from GPT-3 called OpenAI Codex. This system's operating logic is as follows:


  1. Data Mining Process:

    1. All public repos on GitHub were scanned

    2. Python, JavaScript, Go, Rust – code in each language analyzed

    3. Everything from syntax rules to design patterns has been learned


  2. Statistical Learning:

    1. The model extracted patterns by analyzing millions of code samples

    2. Calculated which piece of code is most likely to come next

    3. The result: A system that can predict the next step with surprising accuracy based on the code you wrote.


The problem starts right here.


The Great Legal Battle: Fair Use or License Violation?

This issue has become a massive debate that divides the software world in two. Both sides offer valid arguments:


Microsoft's Defense: "This Is Legal Transformation"

Microsoft and OpenAI argue that their actions fall under the "Fair Use" doctrine in US copyright law:


"We took the raw code and created an entirely new service. It's a transformative use case, and it's legal."


Developers' Counter-Argument: "Licenses Are Being Ignored"

Here's the thing: Public code doesn't mean unclaimed code. Almost all of the code on GitHub is protected by open source licenses:

  • MIT License: "Use my code, but mention my name"

  • GNU GPL: "Use it, but your project must also be open source"

  • Apache License 2.0: Attribution and patent rights requirements


The problem: Copilot doesn't comply with any of these licenses. The code it recommends doesn't include any original author information or license information.


2024-2025 Current Developments: How Have the Situation Changed?


2024 Results:

  • Most DMCA claims in Matthew Butterick v. GitHub case dismissed

  • But some fundamental claims still remain


2025 Bombshell Decision: In February 2025, in Thomson Reuters v. Ross Intelligence, a federal court ruled that copyright use in AI training was not fair use . This was the first major precedent-setting decision for the industry.


Official Institutions Step In

The US Copyright Office published a comprehensive report on generative AI training in May 2025. This topic is now being discussed at an official level, not just among tech companies.


How to Protect Yourself: Practical Steps

Let's move from theory to practice. What can you do as a developer?


Risk 1: The Possibility of Your Secret Codes Being Leaked

The real danger: The bits of code you write and the suggestions you receive while using Copilot can be sent to Microsoft to improve the service.


Critical step - Do now:

  1. Log in to your GitHub account

  2. Click on the Settings > Copilot menu

  3. Uncheck "Allow GitHub to use my code snippets for product improvements"

This simple change will stop your codes from being sent as telemetry data.


Risk 2: License Contamination Trap

If the code Copilot recommends is GPL licensed and you include it in your closed source project, you may be asked to open source your entire project.


Protection strategies:

  1. Never copy blindly - Understand each suggestion and rewrite it with your own logic

  2. Activate the filters - set the "Suggestions matching public code" option to "Block"

  3. When in doubt, research – Google specific codes to find their original source


Conclusion: It's Time to Become a Conscious Developer

GitHub Copilot is a powerful tool, but using it blindly carries significant risks. In 2025, we no longer have the luxury of saying, "I didn't know."


If you found this article helpful, don't forget to share it with other developers. We all benefit from keeping each other informed.

 
 
 

© 2025 Emre Erdin All Rights Reserved

bottom of page