7 months in, how are my predictions doing?

Well, The Economist country of the year is syria (With Argentina the runner up). Let's look at my "12 predictions between now (back when now was may 2025) and May 2026. 

1. Did context windows become the bottleneck again?  They definitely didn't grow as expected. Gemini has gone from 1.5 to 3.0 and still is 1M context window (one can only assume they have been investing more of their time in getting back to the Gemini 2.5 Pro Preview 05-06 sweet spot). 

Claude Code has pretty good compaction . Tool use makes use of paged limits. In this and other ways, we have worked around context limits. It is not as obvious a bottleneck, because we can work around it in ways we couldn't when it was 4096 token limits. But I would say we are on track to have something of a bottleneck in this respect, though I am a bit less confident in this prediction then I was then. Even though context windows seem to have hit a wall, in that model providers aren't very interested in expanding them, it seems. 

Context windows are def a bottleneck. But I think 2026 will have alot of interesting techniques to address it. I will write an article in a few months about techniques i have seen already

2. Did "RAG roar back"?It either roared back or only got more important. But, it is not something people talk about like they used to. it is more assumed and comes in many forms, but is part of the standard agentic profile. In the last 6 months, the nature of my latest clients has made the broader AI landscape equally on par on my radar as the world of Gemini models, Vertex AI, ADK, etc... And that makes it even more stark. Progressive disclosure is a big part of why Claude Skills are good. Moving from  I was surprised to find almost no good managed RAG on AWS. Retrieval is definitely part and parcel of almost any agentic system. 

 Claude code does not have semantic search as a code-search tool.  Though maybe it does for certain specific parts of its architecture. i will get back to you about that as i feel like i did see something like that with memory architecture 

3. Did data science as a tool become a standard agent tool? Nope. And alot of coding agents including claude code are relatively bad at operating within ipynb files. That is something that needs work. That could be a good project to do. 

In large part though, basically everything you can do in a terminal became that so yes, but the point is moot. Agents got much better much faster then expected. 

4. Did AI power scientific breakthroughs? Yes. 

6. Has slop continued? Yes, and yet people are starting to miss the old slop. People are nostalgic for will smith eating spaggetti which is now sort of a CS AI benchmark

7. Did coding styles and culture will change as AI-first coding stacks and vibe-coding stacks become widespread? I would like to do some research on this but I think so. My stack no longer uses an IDE. It also does not use agent frameworks such as langchain. I use filesystem alot more then i used to. 


Predictions that were too obvious to pat myself on the back about:

"AI will power major scientific breakthroughs. Obviously."


Predictions I got mostly right so far or still believe in:

"Context windows may become a bottleneck again. Context windows have grown, but our aspirations have, too. "

"RAG will fade, then come roaring back. As a direct consequence of #1, RAG will start to swing back to being indispensable. The key difference is that this time around, most people will (hopefully) rely on out-of-the-box managed RAG solutions."

Agents will integrate more deeply with enterprise data.

Slop will continue. Platforms for community content, email clients, and any other type of content feed will rise and fall based on how good they are at not letting slop find its way into your feed and inbox.

"Coding styles and culture will change as AI-first coding stacks and vibe-coding stacks become widespread."

Predictions that didn't materialize, yet, but I still believe in :

"Data science and traditional ML will become standard agent tools. AI Begetting AI."

"Another chip shortage will surely come. A safe enough bet. I hope not, though."


Predictions I have less faith in, in some way:

"Fine-tuned and quantized edge models will help solve many problems."


No native semantic search in claude code

 But it does have a new helper -- LSP.

But it occurs to me that Claude Code is maybe best at small repos that were themselves built with claude code. I would have to think about whether I have experienced exactly this... But it makes sense because without semantic search, claude code just greps based on keywords and folder names it deems standard. 


Minevra Jobs, Minevra Quizzes, Minevra Consulting...

Minevra Jobs, Minevra Quizzes, Minevra Consulting... What do these products and services have in common. For one, they have no paying customers (though we do have our first few users and pro bono consulting clients!). But, no, what I wanted to say was: It names product lines like Anthropic seems to:  Note that the way anthropic names its products and original standard: Claude Code and Claude Skills. It uses such generic and commonplace words that you are forced to say the full thing. You must say Claude. And you must capitalize both because otherwise people will misunderstand what you are referring to. Their naming strategy makes you repeat their main brand - Claude - and makes you write it in a well formed way. Though this trend is not the case for their models -- opus, haiku, sonnet ... so they might not have thought about it the way I mentioned, given that it is only a partial trend. 



Software Benchmarks: hall of fame

I love these types of things. 

Classics:

Tom's Diner

By Suzanne Vega. Due to this song, she has been called "The mother of the MP3".  It was used by the engineers that worked on mp3 compression the perfect  "I was ready to fine-tune my compression algorithm...somewhere down the corridor, a radio was playing 'Tom's Diner.' I was electrified. I knew it would be nearly impossible to compress this warm a cappella voice."[9]  It is also a nice short length, which is ideal when engineers wanted to test it

I am sitting in the morning at the diner on the corner

I am waiting at the counter for the man to pour the coffee

And he fills it only halfway, and before I even argue

He is looking out the window at somebody coming in

`curl -o suzanne_vega.mp3 https://museumofportablesound.com/wp-content/uploads/2020/07/mopswaxcylindersuzannevega.mp3`

Here is a video that I like, because it is 2 and a half minutes long, and because it has no words. https://www.youtube.com/watch?v=4-ISLpKhQJI 

Lena Forsen 

Image of a model which appeared in tens of thousands of publications and conference proceedings on imaging research from its first use in the 1970s until being retired in the 2010s due to objectifying women. It had become popular in the imaging community because it contains several features – such as light and dark illumination regions, sharp edges, flat regions, smoothly varying regions and textured regions intertwined with each other – which made it useful when testing the behaviour of image-processing or reconstruction algorithms. 

AI Specific:

Pelican on a Bicycle

Will Smith Pasta Video


----

I have a few of my own: 


Dspy is an exciting idea but is still in it's infancy

 Like all followers of the main AI bloggers, I saw Simon Willisons endorsement of DSPy. It reminds me of MLFlow. Early on in my pivot to AI, I recall many a langchain + Mlflow tutorial. It is premature. Evals. Evals. Have i ever actually created a good evals system. Nope.  A few things needed within DSPy. First of all, it has to be able to better handle complete garbage output from the student models (And the teacher models, for that matter -- not to mention the judge reflection models). What i mean is suppose I am creating an AI pipeline that does something. The example doesnt matter but lets say it is categorizing emails in bulk. I should be able to use Dspy to come up with a prompt that works through an optimizer like MIProV2. And if i pick a model that at DSPy's first attempt at a prompt for the task returns something, The optimizer should not only be expecting it to not be the correct category but structurally, not at all something that is what the system should be able to handle. When evaluating this library,  I guess the problem i see is that it seems useful in very unusual, specific circumstances. It is not robust enough t needs easy ways to have the optimizer first solve an easy problem before solving the harder problem. it needs easy ways to start with smart models and move to dumb models, and vice versa.  The dspy signatures dont solve problems. They are for simple tasks. 

The coolest thing about this stream of consiousness article was that when writing it had the links filled out . Kind of neat. 




ARTificial: Why Copyright Is Not the Right Policy Tool to Deal with Generative AI

 Mantegna argues that copyright law is the wrong tool for governing generative AI, especially for text, because copyright was never designed to regulate systems that learn statistical structures rather than copy expressive works. Attempts to stretch copyright to cover AI training and outputs risk (1) failing to meaningfully protect authors, (2) entrenching corporate intermediaries, (3) destabilizing fair use, (4) shrinking the public domain, and (5) degrading AI quality through defensive data practices. She insists that ethical harms (consent, attribution, labor displacement) are real—but conflating “unethical” with “illegal” produces bad law and ultimately worsens outcomes for creators and society.


https://yalelawjournal.org/forum/artificial-why-copyright-is-not-the-right-policy-tool-to-deal-with-generative-ai

IYKYK

https://gist.github.com/GideonPotok/9d8de616ee20571d1d38ea760c5b99a2