Tell HN: Is this AI doom loop? I thought o1-preview was amazing, until it wasn't

12 points by moomoo11 4 days ago

I get that we expect more from the tools we use. But honestly.. I'm so fed up with AI at this point.

Every new model I've used, I thought it was amazing for a few days. Until... it stopped being useful and hallucinating and giving me non-working and terrible code. I feel like I always start giving it simple/rote problems, and when I'm fed up with something difficult I really wish I could turn to an AI and have it help me..

The latest of this fad is o1-preview. At first, I thought wow. This thing is amazing. It could "come up" with answers for me, and for the first time in many months I felt like I was going to use AI tools again.

Until...

Yesterday I was struggling with some event handler hell-mode (lots of event synchronization on a very complex UI) on a web app I'm working on. It is pretty complex, and one thing I do appreciate about these AI tools is that I have a chance to describe my issue in detail which helps me understand my own problem better. But, I was stuck and I gave as much detail as possible.

I ran out of credits trying to see if the AI could help me (I hate JS, sorry) but it kept giving me, in order:

1. Code that didn't work at all.

2. Code that didn't work at all.

3. Completely new code that didn't work at all.

4. The previous code that didn't work at all.

5. Completely new code that didn't work at all.

6. The same code that didn't work at all, with phantom variables this time that weren't used..

7. Completely new code that didn't work at all, again with phantom variables.

I gave up at this point, and decided f this, I will do it myself.

I woke up the next morning (today), had my morning coffee and exercised a bit, and just took out a couple sheets of paper and diagrammed my problem out and solved it myself. It took me about 40 minutes to get a completely working solution.

Why does every AI model inevitably lead to this? Am I expecting too much?

What I have found it is great for, and what I am using in my own product, is generating database queries. It seems great for this.

But for solving actual, complicated problem? I think that is something humans will always have to do.

When people hype up that the AI solved something for them, I wonder were they lazy like me working with something complex, or were they lazy and didn't even try on something simple?

proc0 4 days ago

Right, I think it's a limitation of deep learning. Transformers and gigantic leaps in scaling have allowed AI models to reach impressive capabilities, but at its core it's limited by its training data. This is not how actual intelligence works. People don't need terabytes of information to learn how to communicate and reason.

For this reason the current AI trend will have a bigger impact on creative tasks, rather than critical technical ones. It's already great for generating art quickly, at least for the concepting phase, and other creative assets that can afford to be generic. Solving technical problems, on the other hand, requires reasoning beyond what can be extracted from training data.

We'll need a new paradigm of AI in order to have a chance at creating models that properly reason. Even without detail knowledge of the brain, we can safely speculate that the reason and language areas are extremely efficient compare to cutting edge LLMs, which means there are algorithms more complex and efficient than simple artificial neural connections that just sum weights with a bias.

eschneider 4 days ago

Yes. You're expecting too much. Generative AI models don't "understand" your problem, they don't even "understand" how to program. They're just fitting whatever data they've seen to your input.

gwoolhurme 3 days ago

To his defense, that is how it’s marketed. That this new model can reason.
- loveparade 3 days ago
  
  Yeah, but "reason" is not a well-defined term. It means different things to different people in different contexts. It's just marketing speech. You can easily argue that all ML models, even those from 50 years ago, can reason to some extent.
  
  gwoolhurme 3 days ago
  
  Fully agree that’s kind of my point though. It’s a very tall order for some people. Like the OP

mnk47 2 days ago

> The latest of this fad is o1-preview

Not for programming it's not. It's confusing, but o1-preview is currently pretty broken for many tasks, or in the words of Sam Altman [0], "deeply flawed". o1-mini is the recommended model [1] for programming exercises and is superior to o1-preview in OpenAI's programming benchmarks [2].

[0] - https://analyticsindiamag.com/ai-news-updates/sam-altman-say...

[1] - https://help.openai.com/en/articles/9824965-using-openai-o1-...

[2] - https://openai.com/index/openai-o1-mini-advancing-cost-effic...

Personally, I'm sticking with Claude Sonnet 3.5 until more people figure out how to use these new models effectively. OpenAI employees said on launch day that traditional prompting techniques might not work with this one so we'll just have to experiment.

jprete 4 days ago

It never occurred to me before that chatbot randomness might have the same reward structure as a slot machine, but apparently it does. OpenAI got you to spend all your credits on seven attempts at this one problem. I'm not saying you're addicted, but I wonder about the people who absolutely gush over it.

mergisi 3 days ago

AI can be frustrating when it falls short on complex tasks, especially with long or intricate code. To get the most out of it, you need to break down big problems into smaller pieces and choose the right AI for your needs—like OpenAI o1, Claude Sonnet. Also, leveraging dev tools like Cursor AI can help enhance productivity. For tasks like SQL generation, specialized tools like https://ai2sql.io/ work great. AI isn't perfect for everything, but when used selectively, it can still be super helpful.

skydhash 2 days ago

Gambling can be frustrating when it falls short on achieving wealth, especially to buy a big and beautiful house. To get the most out of it, you need to break down big objectives into smaller ones and choose the right game for your needs-- like Poker, Blackjack. Also, leveraging techniques like card counting can help increase chance. For objectives like buying a nice gift, specialized games like sport betting work great. Gambling isn't perfect for everything, but when used selectively, it can still be super helpful.
- meiraleal 20 hours ago
  
  You are replying to chatgpt

segmondy 3 days ago

It's a limitation of your approach. If you have an idea what you are trying to do, a good detail might produce the correct code. But if it doesn't, then your next prompt will not be asking for it. But explaining to the AI how it went wrong, suggesting and guiding it towards a different path. It will often help you resolve the issue. If you have no domain knowledge or just keep regenerating responses in home that one will be correct, you are going to be wasting time and money.

resource0x 4 days ago

It's time for "How many programmers does it take to screw in a lightbulb using AI" jokes.

infamouscow 13 hours ago

None, because it's a hardware problem.

rvz 4 days ago

> When people hype up that the AI solved something for them, I wonder were they lazy like me working with something complex, or were they lazy and didn't even try on something simple?

The truth they won't tell you is that they have likely invested in that AI tool and they are hyping it up with their VC friends that 'It works' even when they know it doesn't.

Each time I talk to the AI bros about these limitations, they retort to their whataboutisms with 'But humans hallucinate too!', 'The human brain is the same as an LLM' nonsense excuses.

LLMs do not 'understand' your problems nor can they reason about them. O1 is no different and instead of buying into the scam and prompting endlessly with garbage results and attempting to replace your co-worker, actual programmers can write the plan for the code themselves and solve it. Especially for unseen code or new changing syntax for it.

Whenever I see someone promoting another AI tool, I always see who invested and 9/10 of the time it is funded by VCs and ex-FAANG engineers yet again on the snake oil grift. (And they know it but will never admit it.)

muzani 4 days ago

It might be just an OpenAI thing where they ramp up the power a week after the demo to get more subscriptions, then bring it down gradually. The forums now happily gaslight you into thinking it's a conspiracy theory, despite the evidence. It's easy to catch - just share an amazing input/output response with a friend, then try the exact same thing a month later.

It's one of the arguments for using open source AI even though it's still a little behind - at least when you're running it on your own system, you know if you're the problem.

solardev 4 days ago

Does Claude work any better for you?

moomoo11 3 days ago

I have tried Claude and I find that it is more "to the point", but still suffers from giving the wrong/unusable answers.
Both claude and chatgpt are good for rote tasks, but claude is definitely more succinct.

b20000 4 days ago

these models are search engines with some interpolation thrown in

AnimalMuppet 2 days ago

It sounds like you're finding AI to be a useful (if infuriating) rubber duck. If you don't expect it to be more than that, it's useful for what it is.