Coding capabilities of ChatGPT-4 improved?

Martin Průcha, 17. 06. 2024


Just a year ago one could hardly imagine that a ChatGPT or an alternative would be able to write an actual working code, and those who proclaimed otherwise were deemed as a bit of idealists. Yet, now we stand in the midst of this revolution and it is clear that this change has come.

Recent Benchmarks and Performance

In a comparison between Llama-3 and GPT-4, Prescouter notes that GPT-4 scored surprisingly high – it showed 85.9% accuracy in the HumanEval benchmark for programming, which tests language models on a set of tasks, focusing on human-like reasoning and real-world problems programmers need to face. As a sidenote, Llama 3 fell slightly behind on this benchmark with 81.7%. 

Just about 6 months ago, models like Parsel (ChatGPT-4 + CodeT) and L2MAC achieved a Pass@1 rate on HumanEval of around 85 %and 90 % respectively, already demonstrating impressive capabilities. However, the latest model from May, LDB (ChatGPT-4 based on seed programs from Reflexion), has surged ahead, attaining an exceptional Pass@1 rate of 96.9, notes Papers as code. This is almost a situation where the machine is able to code more precisely than human…!

Personal Experience

In my own experience with ChatGPT-4, I requested help with writing code for graphing linear regression with control and treatment groups, having drawn data from a .csv file. After two hours of prompting and clarifying my requirements, I was able to obtain a working result. What was interesting, once ChatGPT understood what I wanted after him, he actually tried to graph the linear regression inside the session, though unsuccessfully. However, pasting the code to VS Code and editing a little solved the problem and the result worked seamlessly.

My involvement was mainly in understanding the concepts, having basic programming knowledge, and dealing with minor issues like library loading errors. The experience definitely underscored the ability of ChatGPT-4 to aid in complex tasks when provided with precise guidance and iterative feedback.

Conclusion

ChatGPT-4 not only surpasses its predecessors in problem-solving and debugging but also produces more efficient and concise code. The key takeaway from these developments is that the future of programming lies in the synergy between human creativity and machine intelligence. By effectively communicating ideas to these powerful tools, we can unlock unprecedented levels of innovation and productivity.

author: Oldřich Příklenk

picture: ChatGPT


More posts