🤖 Hallucinating AI

And how AI writes over 50% of code at Google

If you saw my email last week, I mentioned that I'm creating my own weekly newsletter. Here, I'll share the latest programming and tech news, deliver AI updates, and offer insights into the tech job market.

Today marks the first edition.

I'm always open to feedback on how I can improve this and provide maximum value, so feel free to reply to this email with any feedback.

AI

The Next Big Leap
Remember when OpenAI’s GPT-4 released last March? It was a game-changer, and could handle everything from chatting to coding. Well, Anthropic, one of OpenAI’s rivals, has just announced their own upgrade: Claude 3.5 Sonnet.

Claude 3.5 Sonnet isn’t quite the giant leap forward that some may have been hoping for, but it’s definitely a step in the right direction. It’s better at math, coding, and logic, and it’s faster, understands language nuances more deeply, and even has a better sense of humor.

Perfect for anyone building apps and services on top of Anthropic’s models. But, it’s still not the huge leap we’re all waiting for.

We’ve been expecting OpenAI’s GPT-5 for over a year now. GPT-4 was a beast, costing over $100 million to train, and GPT-5 is rumored to be even bigger and pricier. But, so far, progress in AI has been more about clever innovations in model design and training than sheer size and power.

Anthropic’s Claude 3.5 Sonnet is already outperforming models from OpenAI, Google, and Facebook on benchmarks like GPQA and HumanEval. But the improvements are modest, just a few percentage points.

Finding new data sources to train these models isn’t exactly easy, and scaling them will run into the billions. OpenAI recently launched GPT-4o, which focused mostly on providing a natural, humanlike interface rather than groundbreaking problem-solving abilities.

So, what’s next? Companies like Scale AI are working on better ways to measure AI intelligence, their goal is to showcase real-world applications with measurable business impacts.

Which AI do you use most?

Login or Subscribe to participate in polls.

BIG TECH

Google Software Engineers now use AI in over 50% of code
There have been huge changes in the application of AI for software developers from 2019. Back then, developers would be aware of fadvancements in machine learning but many of them would not be using it in their workflow.

Fast forward to 2024, and there's widespread excitement among software engineers about AI's role in coding.

Google's latest blog reveals how AI enhancements have transformed software engineering at the company. Now, 50% of code characters are written by AI, meaning that the same amount of code is completed with AI assistance as is manually typed by developers. This shift allows developers to spend more time on code design, even though they still need to review AI suggestions.

% of code written by AI at Google

Significant advancements include AI addressing over 8% of code review comments and automatically adapting pasted code to fit the surrounding context, accounting for about 2% of code in the IDE.

Additional AI applications involve instructing the IDE to make code edits via natural language commands and predicting fixes for build failures. Other potential uses, such as offering tips for code readability, follow a similar trend.

Quick Links

🍪 Demystifying cookies and tokens - A deep dive into tokens and cookies and why they are important.

🖥️ The Benefits of Using RTK Query - Built on top of Redux Toolkit, RTK Query is a great way to manage data fetching and caching in React.

🤞 Promises From The Group Up - In order to understand promises in JavaScript, you need to need to work from the ground up.

🦆 Stack Overflow Users Are Revolting - OpenAI and Stack Overflow announced a deal that will allow ChatGPT to learn from Stack Overflow’s answers, users are not happy.

RESEARCH

Detecting hallucinations in large language models
Large language models like ChatGPT and Gemini can sometimes produce "hallucinations", basically content that's just untrue...

These issues, especially "confabulations" where models confidently provide wrong answers, pose significant risks in fields like law and medicine. Current methods to improve accuracy aren't fully effective. Researchers are now exploring statistical methods, like entropy-based uncertainty estimators, to better detect these critical errors.

Until next week,

Travis.

Reply

or to participate.