In partnership with

I guess it was too soon to call this 4.0 -- but don't let the 3.1 fool you.

This was way more than just a minor upgrade.

This was one of the biggest capability jumps we’ve seen in a while — especially if you care about reasoning, research, and actually shipping well-built, high-quality work.

Everyone has been talking about 1 particular unbelievable improvement with this new update.

Imagine going from scoring 31.1% in a reasoning test... to 77.1% and being the absolute best in the same test just a few months later -- but this is what Gemini 3.1 Pro just shocked the world with.

More than a 100% upgrade in capabilities.

And this is abstract reasoning we're talking about -- not memorization or "glorified autocomplete". It had to solve problems with completely new logic patterns, problems it had never seen before -- or something like before.

This is huge.

And this makes the 1 million context window it has even more lethal for coding and every other use case we can think of.

It's vastly superior to its predecessor in every way. The graphics and SVG generation are so good -- which is also a huge win for web developers.

1. Web browsing got dramatically better: 59.2% →...

This one is just as important.

On BrowseComp — a benchmark that measures how well a model can use web tools and navigate information — Gemini 3.1 Pro jumped from 59.2% to 85.9% -- overtaking all Claude models, including the recently released Sonnet 4.6.

That’s huge.

The difference between those two numbers isn’t cosmetic. It’s the difference between:

  • Surface-level summaries vs. actual synthesis

  • Grabbing the first answer vs. cross-checking sources

  • Losing context across tabs vs. maintaining a clear research thread

If you use AI for research, competitive analysis, trend tracking, sourcing stats, or building content from multiple references, this upgrade matters a lot.

Better browsing doesn’t just mean “it can search.” It means it’s better at deciding what to search for, what to ignore, and how to combine findings into something coherent.

That’s a big shift.

2. This reasoning upgrade is not a joke

And neither was the test that measured it.

On ARC-AGI-2 — a standard benchmark designed to test abstract reasoning (not pattern regurgitation, but actual problem-solving) — Gemini jumped from 31.1% to 77.1%.

That’s not incremental improvement. That’s a different class of performance.

What does that mean in real life?

It means:

  • Fewer moments where the model “almost” understands your problem but misses a key constraint.

  • Better step-by-step thinking when tasks require multiple logical hops.

  • Stronger performance on planning, debugging, and structured workflows.

  • More reliable outputs when you're building agents or automation.

If you’ve ever felt like an AI model lost the thread halfway through a complex task -- this is the kind of upgrade that directly addresses that frustration.

3.

Dictate prompts and tag files automatically

Stop typing reproductions and start vibing code. Wispr Flow captures your spoken debugging flow and turns it into structured bug reports, acceptance tests, and PR descriptions. Say a file name or variable out loud and Flow preserves it exactly, tags the correct file, and keeps inline code readable. Use voice to create Cursor and Warp prompts, call out a variable like user_id, and get copy you can paste straight into an issue or PR. The result is faster triage and fewer context gaps between engineers and QA. Learn how developers use voice-first workflows in our Vibe Coding article at wisprflow.ai. Try Wispr Flow for engineers.

Find out why 100K+ engineers read The Code twice a week.

That engineer who always knows what's next? This is their secret.

Here's how you can get ahead too:

  • Sign up for The Code - tech newsletter read by 100K+ engineers

  • Get latest tech news, top research papers & resources

  • Become 10X more valuable

Keep Reading