GPT-5 is not even close to replacing software engineers

I’ve been optimistic about the future of artificial intelligence for quite some time. For the past few months, I’ve let AI drive my engineering workflows in different ways. Whether it be as an operator or as a “pair programmer”, I’ve tried many different ways to incorporate these new AI tools into my work and life as a developer. I was especially excited/scared when Sam Altman and the OpenAI team announced that GPT-5 was the “death of software engineering”. Wait, I’m a software engineer… am I getting fired? I decided I needed to double down in trying to prove these insane claims.

What I found was very interesting. Not only does GPT-5 and AI tools make development harder, it actually makes the results worse. I say this as a software engineer at Microsoft who’s been thoroughly invested in this technology since the “boom” of AI tools earlier this year – we are not there yet. In fact, these AI tools actually prove why we need software engineers more than ever.

What I’ve tried re: AI in coding

Here I’d like to immediately address the concern of, “maybe you didn’t try the right tools”. I went through many phases with many different AI tools, including but not limited to:

Claude Code on Claude Max ($200 month plan) with Opus 4/Sonnet 4 – used this as my main driver for side projects for over a month. Mainly used Opus as it had a much deeper understanding of context, fell back to Sonnet when limits were hit occasionally.
Cursor on/off for around 4 months – I started using cursor back when 3.7 Sonnet was the shiny new thing, and came back to it after their updates, replacing Claude Code as my daily driver for side projects. I’ve used Cursor with mainly the Claude models, but also been using it with GPT-5 recently.
Github Copilot w/ latest models – This is the one I’m the most used to, because I can’t really use anything else for Microsoft code for security reasons. While subjectively it’s not as “good” as Cursor and the other ones, it always uses the latest models the instant they come out because I have the beta or something for being an employee. I use this for all of my work – making test cases, finding bugs in the code, cleaning stuff up – everything.
ChatGPT/Claude Opus for reasoning – a lot of the times I turn to the desktop apps to reason out some complicated theory I have to solve in my code. It gets it right once in a blue moon, but most of the time it just reassures me that yes, this problem is indeed hard, not even the AI can give me an answer I want.

More importantly than “using” the tools, I intentionally made it an effort to sit in the “cockpit” as a developer – letting AI do the heavy lifting, reasoning, and everything else that Sam Altman claims that replaces software engineers. I intentionally lived “the future” that everyone is raving about to see just how great everyone was saying it’d be.

Problems at work

This is kind of the reason I’m making this article. Not to entirely place blame on the AI since, obviously I’m the driver, but it’s interesting to me how using AI as a companion to code actually made my code worse. Not only that, it introduced regressions! Thankfully, since we have insanely robust pipelines/error catching, they didn’t make it very far in deployment before being flagged down and eventually funneled to me.

So what exactly happened? A few days ago I get pinged on a sev 2 incident that was opened on behalf of another team. For those of you who aren’t familiar with severities in software engineering, sev 2 is basically “oh shit this entire feature is completely broken”. And this is exactly what it ended up being – my code was properly identified as the cause for, basically an entire page in Copilot failing to load. This struck me as very odd, considering my code had actually nothing to do with that particular page, rather a pretty small feature in some small menu.

After freaking out the entire morning, I found the culprit. A function, listed as a dependency as a useEffect. This function was memoized via useCallback, however, that function ALSO depended on several OTHER functions from different libraries, which I can only assume definitely weren’t memoized. What this looked like was the useEffect firing off endlessly because functions compare by their reference in React! While you can list functions as dependencies in useEffects if they’re perfectly memoized, it’s often frowned upon for this exact reason – you want to trigger useEffects when something changes. It doesn’t make much sense to say, “when a function changes fire this off”, anyways.

Anyways – guess who made that code? Definitely not me! It was ChatGPT 5! I read the pane from earlier in the week where it explained its reasoning, that since it was using a memoized function, it’s fine! However, as I mentioned earlier, it absolutely was NOT fine! ChatGPT 5 completely missed the broken code here – and since it’s technically one of those weird React quirks, there was no unit test to catch it.

This situation was, in my experience, a perfect summary of using AI to engineer. It’s really really good at getting most of the way there, after all it actually identified the need to memoize the function deps. However, engineering is just really complex. It’s hard. There are so many intermingling dependencies, so many things that you can’t just catch with your eye. Things that you have to learn through years of experience, trial and error. And, AI doesn’t quite have that skill in my mind. I might be guilty of writing bad code occasionally, but trust me, I’d be caught dead before I put a function as a dependency inside a useEffect.

I lied actually – that’s not what prompted me to write this! What actually prompted this was a bug filed on our team, actually a completely broken functionality. Used to work before some unrelated code I made (telemetry, literally telemetry), but doesn’t now. Weird I thought because, why would I touch functionality-related code for a telemetry task? Turns out, I didn’t! The AI did! I forget if I was using Opus 4 or GPT 5 for this specific task, but yeah, I fully made and merged a PR that introduced a regression because I trusted the AI to add some stuff in that file. I do look through my code, and we also have a reviewer process where multiple other people need to review our code as well – and none of us actually caught that issue the AI introduced.

Of course, I don’t love all of this happening. This is still “my code”, my name is still attached to all of this stuff. It’s not really that I’m writing code that breaks things, it’s that I’m using a tool that I’m told “replaces software engineers” that actively makes my code worse.

Expectation vs reality

I’ve done my fair share of dogging on AI, ChatGPT in particular – but it’s not to say these aren’t extremely powerful tools. I’m someone who doesn’t like to spend a drop of time on anything boring, so this is where AI imo really shines. All of this to say, what bothers me isn’t necessarily the power of these tools, it’s the overconfidence that AI bolsters in general, combined with the promises these companies make. Claims like, “this replaces software engineering”, “bro you can literally vibe code your whole app with this” are extremely incorrect and doesn’t accurately capture the strengths of AI.

If you don’t know how to engineer, AI will not make you a software engineer. This is notably different than saying, “If you can’t code AI can’t make you a coder” – I think to an extent, AI can help you do things that you don’t have technical background for. I’ve coded in languages I know virtually nothing about with AI, and am able to make pretty solid programs out of them. But, if I didn’t know how to engineer as a profession, I’d most definitely be cooked. This whole “vibe coding” culture that is hopefully dying down by now I think is a bit misleading. It’s definitely inspiring to have AI as a partner to take on tasks you wouldn’t otherwise want to do. But I think it’s important to note that, you still need to grow the skills yourself. You still need to learn how to engineer – even though AI is a really good teacher, I don’t think it’s a really good “doer”. At least the bugs it keeps introducing in my code at Microsoft validate that opinion.

The reality is, AI doesn’t have a brain. It’s incredibly smart – I mean it’s trained on pretty much the entire world – but it can’t think for itself. It can’t remember experiences, it can’t associate correct answers with rewards. It can’t learn like us – both in the literal sense, and in the sense where we just don’t introduce random code for no reason. I’ve seen so many Twitter posts where Cursor like, deletes all of their system files or something? Just things that are not even conceivably possible to do as a human with a working brain.

How to best use AI right now

Like I said – with all of this said, I still think that there’s utility in this newfound technology. It’s still game changing, just not in the way that you see in the news. Out of all of the different ways I’ve tried to use AI, I have found that the second you start to question if AI can replace “a human” in a certain role, it most definitely cannot.

AI can replace work, AI cannot replace people.

If you have a task that is insanely boring and you think, “maybe the AI can do this TASK for me”, it actually probably can! AI is very good at doing tasks. However, if you think “This job seems not needed, maybe an AI can replace an entire person with a working brain”, I think that’s wrong. Because, it doesn’t have a brain, and unless it has an infinitely (or, equivalent to billions of neurons that we have in our brains) large context window, it won’t be able to do complex reasoning well at all.

The best way to use AI is the best way to use a computer. You don’t sit at your computer and go, “Finally, now I can relax and have to do absolutely nothing”. You still need to like, do stuff, it just might be easier to visit a website or do a search than physically drag yourself to a library. Same with AI – it might be easier to interpret insanely large datasets/codebases, help you improve in a domain you aren’t familiar with at a rudimentary level, or even just be a better Google search. The amount of times AI has saved me in Google searches is so impressive – I feel like ChatGPT is like how Google is “intended” to be – a fully operational, infinite database of everything.

Conclusion

It’s interesting to be living through such a fundamental technology shift. Especially at Microsoft, the entire company is essentially turning into “AI-first”, and that’s evident even in my day-to-day work. With all of this though, there’s a certain level of fear that people have. What if I lose my job? What’s the point of doing anything if AI can do it? All of these fears are valid – this new technology resembles sentience in a way that we haven’t really seen at all, ever. The confidence that ChatGPT has when spitting out the most incorrect hallucinations imaginable is comparable to a conspiracy theorist or something – it all feels real, it speaks so confidentially there very well could be a person behind it, in our eyes.

But the truth is, I think we as a human species will prevail. After all, if we don’t keep our humanity, what do we even have? It’s already so abundantly clear people refuse to accept AI in lots of domains such as the arts. The amount of seething rage people experience when they drive by an AI painted mural, or look at an AI generated advertisement is very hilarious and also very real – people don’t like seeing things that have the possibility of replacing something that people work really hard on, work their whole lives for.

People still value people. That’s why AI won’t replace people.

foo for thought