We need a clearer framework for AI-assisted contributions to open source
samsaffron.comComments
It’s an extension of the asymmetric bullshit principle IMO, and I think now all workplaces / projects need norms about this.
I wonder how it would look if open source projects required $5 to submit a PR or ticket and then paid out a bounty to the successful or at least reasonable PRs. Essentially a "paid proof of legitimacy".
Unfortunately, there is no community equivalent of PoS—the only alternative is introducing different barriers, like ID verification, payment, in-person interviews, private invite system, etc., which often conflict with the nature of anonymous volunteer communities.
Such communities are perhaps one of the greatest things the Web has given us, and it is sad to see them struggle.
(I can imagine LLM operators jumping on the opportunity to sell some of these new barriers, to profit from selling both the problematic product and a product to work around those problems.)
That is their business model. Use AI to create posts in LinkedIn, mails in a corporate environment, etc. And then use AI to summarize all that text.
AI creates a problem and then offers a solution.
My current approach is to look at new sources lie The Guardian, Le Monde, AP news, etc. I know that they put the work, sadly places like Reddit and such are just becoming forums that discuss garbage news with bot comments. (I could use AI to identify non-bot comments and news sources, but it does not really work even if it says that it does, and I should not have to do that in the first place either).
Badly. You will alienate most legitimate contributors, and only leave spam bots subsidized by revenue from other sources
If I can say I trust you, the websites you trust will be prioritised for me and marked as reliable (no AI slop, actual humans writing content).
But why should this expectation be honored? If someone spends close to zero effort generating a piece of code and lobs it over the fence to me, why would I even look at it? Particularly if it doesn't even meet the requirements for a pull request (which is what it seems like the article is talking about)?
I don't think the definition of collaboration includes making close to zero effort and expecting someone else to expend considerable effort in return.
But if you stop looking at PRs entirely, you eliminate the ability for new contributors to join a project or make changes that improve the project. This is where the conflict comes from.
After a minute (or whatever length of time makes sense for the project), decide whether you're not fully confident that the PR is worth your time to continue reviewing, with the default answer being "no" if you're on the fence. Unless it's a yes, you got a bad vibe; close it and move on. Getting a PR merged will require more effort in making the case that there's value in keeping it open, which restores some of the balance that's been lost in the effort having been pushed to the review side.
No more drive-by PRs.
Where is the problem? If I don't have the time to review a PR, I simply reject it. Or if I am flooded in PRs, I only take those from people from which I know that their PRs are of high quality. In other words: your assumption "expecting people to review and act upon it" is wrong.
Even though I would bet that for the kind of code that I voluntarily write in my free time, using an LLM to generate lots of code is much less helpful because I use such private projects to try out novel things that are typically not "digested stuff from the internet".
So, the central problem that I rather see is the license uncertainties for AI-generated code.
Don't throw the baby out with the bathwater.
Like a recognition that there's value there, but we're passing the frothing-at-the-mouth stage of replacing all software engineers?
I still don't see how it's useful for generating features and codebases, but as a rubber ducky it ain't half bad.
What has helped has been to turn off ALL automatic AI, e.g. auto complete, and bind it to a shortcut key to show up on request... And forget it exists.
Until I feel I need it, and then it's throw shit at the wall type moment but we've all been there.
It does save a lot of time as a google on steroid, and wtf-solver. But it's a tool best kept in its box, with a safety lock.
That's one way of looking at it.
Another way to look at it is GPT3.5 was $600,000,000,000 ago.
Today's AIs are better, but are they $600B better? Does it feel like that investment was sound? And if not, how much slower will future investments be?
This just smells like classic VC churn and burn. You are given it and have to spend it. And most of that money wasn't actually money, it was free infrastructure. Who knows the actual "cost" of the investments, but my uneducated brain (while trying to make a point) would say it is 20% of the stated value of the investments. And maybe GPT-5 + the other features OpenAI has enabled are $100B better.
But everyone who chipped in $$$ is counting against these top line figures, as stock prices are based on $$$ specifically.
> but my uneducated brain (while trying to make a point) would say it is 20% of the stated value of the investments
An 80% drop in valuations as people snap back to reality would be devastating to the market. But that's the implication of your line here.
I'm sure there's still some improvements that can be made to the current LLMs, but most of those improvements are not in making the models actually better at getting the things they generate right.
If we want more significant improvements in what generative AI can do, we're going to need new breakthroughs in theory or technique, and that's not going to come by simply iterating on the transformers paper or throwing more compute at it. Breakthroughs, almost by definition, aren't predictable, either in when or whether they will come.
E.g. OpenAI went from "AGI has been achieved internally" to lying with graphs (where they cut off graphs at 50% or 70% to present minor improvements as breakthroughs).
The growth can easily be logarithmic
A different way to say it. Imagine if programming a computer was more like training a child or a teenager to perform a task that requires a lot of human interaction; and that interaction requires presenting data / making drawings.
As a parent, this sounds miserable.
GPT-5 and GPT-5-codex are significantly cheaper than the o-series full models from OpenAI, but outperform them.
I won't get into whether the improvements we're seeing are marginal or not, but whether or not that's the case, these examples clearly show you can get improved performance with decreasing resource cost as techniques advance.
But that's exactly the problem!
Right now, AI performs poorly enough that only a small fraction of users is willing to pay money for it, and (despite tech companies constantly shoving it in everyone's face) a large portion of the user base doesn't even want to adopt it for free.
You can't spend hundreds of billions of dollars on marginal improvements in the hope that it'll hopefully eventually become good enough for widespread adoption. Nobody is going to give OpenAI a trillion dollars to grow their user base 50x over the next 15 years. They are going to need to show significant improvements - and soon, or the bubble will pop.
You mean what they have conceded so far to be what they mean. Every new model they start to see that they have to give up a little more.
I get value from it everyday like a lawyer gets value from LexisNexis. I look forward to the vibe coded slop era like a real lawyer looks forward to a defendant with no actual legal training that obviously did it using LexisNexis.
The funny thing is you're clearly within the hyperbolic pattern that I've described. It could plateau, but denying that you're there is incorrect.
You assume the curve is exponential.
We assume the curve is logarithmic.
We are not the same
I'm genuinely curious as to what's going through your mind and if people readily give you this.
I suspect you're asking dishonestly but I can't simply assume that.
You should delete this comment.
It feels like people and projects are moving from a pure “get that slop out of here” attitude toward more nuance, more confidence articulating how to integrate the valuable stuff while excluding the lazy stuff.
I really like the way Discourse uses "levels" to slowly open up features as new people interact with the community, and I wonder if GitHub could build in a way of allowing people to only be able to open PRs after a certain amount of interaction, too (for example, you can only raise a large PR if you have spent enough time raising small PRs).
This could of course be abused and/or lead to unintended restrictions (e.g. a small change in lots of places), but that's also true of Discourse and it seems to work pretty well regardless.
I use it like this: If a PR is LLM-generated, you as a maintainer either merge it if it's good or close if it's not. If it's human-written, you may spend some time reviewing the code and iterating on the PR as you used to.
Saves your time without discarding LLM PRs completely.
It's like the ship of theseus
I don't like it but I can hardly blame them.
Usually engagement-bait titles are cover for uninteresting articles, but yeah in this case it's way more interesting than the title to me anyway.
i guess it makes it even more obvious when people are discussing the title instead of the actual piece, which is routine on HN but not always obvious! Although to be fair, the title describes one part of the piece, sure. the part with the least original insight.
From https://news.ycombinator.com/newsguidelines.html: "Please use the original title, unless it is misleading or linkbait" (note that word unless)
First of all, if you want innovation, why are you forcing it into a single week? You very likely have smart people with very good ideas, but they’re held back by your number-driven bullshit. These orgs actively kill innovation by reducing talent to quantifiable rows of data.
A product hobbled together from shit prototype code very obviously stands out. It has various pages that don’t quite look/work the same, Cross-functional things that “work everywhere else” don’t in some parts.
It rewards only the people who make good presentations, or pick the “current hype thing” to work on. Occasionally something good that addresses real problems is at least mentioned but the hype thing will always win (if judged by your SLT)
Shame on you if the slop prototype is handed off to some other team than the hackathon presenters. Presenters take all the promotion points, then implementers have to sort out a bunch of bullshit code, very likely being told to just ship the prototype “it works you idiots, I saw it in the demo, just ship it.” Which is so incredibly short sighted.
I think the depressing truth is your executives know it’s all hobbled together bullshit, but that it will sell anyway, so why invest time making it actually good? They all have their golden parachutes, what do they care about the suckers stuck on-call for the house-of-cards they were forced to build, despite possessing the talent to make it stable? All this stupidity happens over and over again, not because it is wise, or even the best way to do this, the truth is just a flaccid “eh, it’ll work though, fuck it, let’s get paid.”
We have to do better than that before congratulating ourselves about all the wonderful "innovation".
The biggest place I've seen AI created code with tests produce a false positive is when a specific feature is being tested, but the test case overwrites a global data structure. Fixing the test reveals the implementation to be flawed.
Now imagine you get rewarded for shipping new features a test code, but are derided for refactoring old code. The person who goes to fix the AI slop is frowned upon while the AI slop driver gets recognition for being a great coder. This dynamic caused by AI coding tools is creating perverse workplace incentives.
Now that being said a person should feel free to do what they want with their code. It’s somewhat tough to justify the work of setting up infrastructure to do that on small projects, but AI PRs aren’t likely a big issue fit small projects.
Some people will absolutely just run something, let the AI work like a wizard and push it in hopes of getting an "open source contribution".
They need to understand due diligence and reduce the overhead of maintainers so that maintainers don't review things before it's really needed.
It's a hard balance to strike, because you do want to make it easy on new contributors, but this is a great conversation to have.
Alas…
...that's just scratching the surface.
The problem is that LLMs make mistakes that no single human would make, and coding conventions should anyway never be the focus of a code review and should usually be enforced by tooling.
E.g. when reading/reviewing other people's code you tune into their brain and thought process - after reading a few lines of (non-trivial) code you know subconsciously what 'programming character' this person is and what type of problems to expect and look for.
With LLM generated code it's like trying to tune into a thousand brains at the same time, since the code is a mishmash of what a thousand people have written and published on the internet. Reading a person's thought process via reading their code doesn't work anymore, because there is no coherent thought process.
Personally I'm very hesitant to merge PRs into my open source projects that are more than small changes of a couple dozen lines at most, unless I know and trust the contributor to not fuck things up. E.g. for the PRs I'm accepting I don't really care if they are vibe-coded or not, because the complexity for accepted PRs is so low that the difference shouldn't matter much.
A couple of weeks ago I needed to stuff some binary data into a string, in a way where it wouldn't be corrupted by whitespace changes.
I wrote some Rust code to generate the string. After I typed "}" to end the method: 1: Copilot suggested a 100% correct method to parse the string back to binary data, and then 2: Suggested a 100% correct unit test.
I read both methods, and they were identical to what I would write. It was as if Copilot could read my brain.
BUT: If I relied on Copilot to come up with the serialization form, or even know that it needed to pick something that wouldn't be corrupted by whitespace, it might have picked something completely wrong, that didn't meet what the project needed.
I cannot justify to myself writing code by hand when there is literally no difference in the output from how I would have done it myself. It might as well be reading my mind, that's what it feels like.
For me, vibe coding is essentially a 5x speed increase with no downside. I cannot believe how fast I can churn out features. All the stuff I used to type out by hand now seems impossibly boring. I just don't have the patience to hand-code anymore.
I've stuck to vanilla JavaScript because I don't have the patience to wait for the TypeScript transpiler. TS iteration speed is too slow. By the time it finishes transpiling, I can't even remember what I was trying to do. So you bet I don't have the patience to write by hand now. I really need momentum (fast iteration speed) when I code and LLMs provide that.
Obviously, I suck at business and marketing. I only had one relatively financially successful product (my open source project, ironically) but I'm definitely able to build features quickly and in a stable way according to spec.
I really liked the paragraph about LLMs being "alien intelligence"
> Many engineers I know fall into 2 camps, either the camp that find the new class of LLMs intelligent, groundbreaking and shockingly good. In the other camp are engineers that think of all LLM generated content as “the emperor’s new clothes”, the code they generate is “naked”, fundamentally flawed and poison.
I like to think of the new systems as neither. I like to think about the new class of intelligence as “Alien Intelligence”. It is both shockingly good and shockingly terrible at the exact same time.
Framing LLMs as “Super competent interns” or some other type of human analogy is incorrect. These systems are aliens and the sooner we accept this the sooner we will be able to navigate the complexity that injecting alien intelligence into our engineering process leads to.
It's a similitude I find compelling. The way they produce code and the way you have to interact with them really feels "alien", and when you start humanizing them, you get emotions when interacting with it and that's not correct.
I mean, I do get emotional and frustrated even when good old deterministic programs misbehaved and there was some bug to find and squash or work-around, but the LLM interactions can bring the game to a complete new level. So, we need to remember they are "alien".These new submarines are a lot closer to human swimming than the old ones were, but they’re still very different.
If we agree that we are all humans and assume that all the other humans are conscious as one is, I think we can extrapolate that there is generic "human intelligence" concept. Even if it's pretty hard do nail it down, and even if there are several definitions of human intelligence out there.
For the other part of the comment, not too familiar with Discourse opensource approach but I guess that those rules are there mainly for employees, but since they develop in the open and public, they make them public as well.
I've found myself wanting line-level blame for LLMs. If my teammate committed something that was written directly by Claude Code, it's more useful to me to know that than to have the blame assigned to the human through the squash+merge PR process.
Ultimately somebody needs to be on the hook. But if my teammate doesn't understand it any better than I do, I'd rather that be explicit and avoid the dance of "you committed it, therefore you own it," which is better in principle than in practice IMO.
1. Someone raises a PR
2. Entry-level maintainers skim through it and either reject or pass higher up
3. If the PR has sufficient quality, the PR gets reviewed by someone who actually has merge permissions
[pedantry] It bothers me that the photo for "think of prototype PRs as movie sets" is clearly not a movie set but rather the set of the TV show Seinfeld. Anyone who watched the show would immediately recognize Jerry's apartment.
https://nypost.com/2015/06/23/you-can-now-visit-the-iconic-s...
It looks a bit different wrt. the stuff on the fridge and the items in the cupboard
https://www.reddit.com/r/seinfeld/comments/yfbmn2/sony_pictu...
In any case, though, neither one is a movie set.
Will the contributor respond to code-review feedback? Will they follow-up on work? Will they work within the code-of-conduct and learn the contributor guidelines? All great things to figure out on small bugs, rather than after the contributor has done significant feature work.
There are plenty of open source projects where it is difficult to get up to speed with the intricacies of the architecture that limits the ability of talented coders to contribute on a small scale.
There might be merit in having a channel for AI contributions that casual helpers can assess to see if they pass a minimum threshold before passing on to a project maintainer to assess how the change works within the context of the overall architecture.
It would also be fascinating to see how good an AI would be at assessing the quality of a set of AI generated changes absent the instructions that generated them. They may not be able to clearly identify whether the change would work, but can they at least rank a collection of submissions to select the ones most worth looking at?
At the very least the pile of PRs count as data of things that people wanted to do, even if the code was completely unusable, placing it into a pile somewhere might be minable for the intentions of erstwhile contributors.
This feels extremely counterproductive and fundamentally unenforceable to me.
But it's trivially enforceable. Accept PRs from unverified contributors, look at them for inspiration if you like, but don't ever merge one. It's probably not a satisfying answer, but if you want or need to ensure your project hasn't been infected by AI generated code you need to only accept contributions from people you know and trust.
>This feels extremely counterproductive and fundamentally unenforceable to me. Much of the code AI generates is indistinguishable from human code anyway. You can usually tell a prototype that is pretending to be a human PR, but a real PR a human makes with AI assistance can be indistinguishable.
Isn't that exactly the point? Doesn't this achieve exactly what the whole article is arguing for?
A hard "No AI" rule filters out all the slop, and all the actually good stuff (which may or may not have been made with AI) makes it in.
When the AI assisted code is indistinguishable from human code, that's mission accomplished, yeah?
Although I can see two counterarguments. First, it might just be Covert Slop. Slop that goes under the radar.
And second, there might be a lot of baby thrown out with that bathwater. Stuff that was made in conjunction with AI, contains a lot of "obviously AI", but a human did indeed put in the work to review it.
I guess the problem is there's no way of knowing that? Is there a Proof of Work for code review? (And a proof of competence, to boot?)
And from the point of view of the maintainers, it seems a terrible idea to set up rules with the expectation that they will be broken.
Or, the decentralized, no rulers solution: clone the repo on your own website and put your patches there instead.
"Forced you to lie"?? Are you serious?
If the project says "no AI", and you insist on using AI, that's not "forcing you to lie"; that's you not respecting their rules and choosing to lie, rather than just go contribute to something else.
In a live setting, you could ask the submitter to explain various parts of the code. Async, that doesn’t work, because presumably someone who used AI without disclosing that would do the same for the explanation.
There is NOTHING inevitable about this stuff.
https://discuss.samsaffron.com/t/your-vibe-coded-slop-pr-is-...
It's not rocket science.
/s
I am the founder and a product person so it helps in reducing the number of needed engineers at my business. We are currently doing $2.5M ARR and the engineers aren't complaining, in fact it is the opposite, they are actually more productive.
We still prioritize architecture planning, testing and having a CI, but code is getting less and less important in our team, so we don't need many engineers.
That's a bit reductive. Programmers write code; engineers build systems.
I'd argue that you still need engineers for architecture, system design, protocol design, API design, tech stack evaluation & selection, rollout strategies, etc, and most of this has to be unambiguously documented in a format LLMs can understand.
While I agree that the value of code has decreased now that we can generate and regenerate code from specs, we still need a substantial number of experienced engineers to curate all the specs and inputs that we feed into LLMs.
We can (unreliably) write more code in natural english now. At its core it’s the same thing: detailed instructions telling the computer what it should do.
More productive isn't the opposite of complaining.
Tells me all I need to know about your ability for sound judgement on technical topics right there.
> the engineers aren't complaining
You're missing a piece of the puzzle here, Mr business person.
> code is getting less and less important in our team
> the engineers aren't complaining
lays off engineers for ai trained off of other engineer's code and says code is less important and engineers aren't complaining.
They can focus on other things that are more impactful in the business rather than just slinging code all day, they can actually look at design and the product!
Maximum headcount for engineers is around 7, no more than that now. I used to have 20, but with AI we don't need that many for our size.
If I survived having 65% of my colleagues laid off you'd better believe I wouldn't complain in public.
I'd also be looking for a new job that values the skills I've spent a decade building.
I wonder if the remaining engineers' salary increased by the salary of the laid off coworkers'
Someone barking orders at you to generate code because they are too stupid to be able to read it is not very fun.
These people hire developers because their own brains are inferior, and now they think they can replace them because they don't want to share the wages with them.
Never does.
I don't see how you could think 7 engineers would love the workload of 20 engineers, extra tooling or not.
Have fun with the tech debt in a few years.
Management may see a churn of a few years as acceptable. If management makes 1$M in that time.. they wont care. "Once I get mine, I don't care"
Like my old CEO who moved out of state to avoid a massive tax bill, got his payout, became hands off, and let the company slide to be almost worthless.
Or at my current company there is no care for quality since we're just going to launch a new generation of product in 3 years. We're doing things here that will CAUSE a ground up rewrite. We're writing code to rely on undocumented features of the mcu that the vendor have said 'we cannot guarantee it will always behave this way' But our management cycles out every 3-4 years. Just enough time to kill the old, champion the new, get their bonus, and move on. Bonuses are handed out every January. Like clockwork there's between 3-7 directors and above who either get promoted or leave in February.
I don't see how any business person would see any value in engineering that extends past their tenure. They see value in launching/delivering/selling, and are rolling the dice that we're JUST able to not cause a nation wide outage or brick every device.
So AI is great... as long as I've 'gotten mine' before it explodes
If your attitude is consistently "idk the AI made it" and you refuse to review it yourself. For 1, I am insulted that you think I should pick up your slack, and 2, I'm going to judge you and everything you put out even more harshly - for my own sanity, and trying to keep debt under control.
Judgement isn't a bad thing, it's how we decide good from bad. Pretending that it is because it uniquely discriminates against bad practise only proves to me that it's worth doubling down on that judgement.
* - I won't necessarily say/do anything different, but I am more careful - and I do start to look for patterns / ways to help.
Your opinion, if I had to guess, is that generative “AI” can be good and useful. My opinion is that it’s an insult to humanity that causes considerable harm and should not be used. These are both valid opinions to have, although they disagree with each other.
Don’t fall into the trap of “I’m objectively correct, everyone else just has opinions”.
You and I know that using AI is a metric to consider when judging ability and quality.
The difference is that it's not judgment but a broadcast, announcement.
In this case a snotty one from Discourse.
I mention that it lingers because I think that is a real psychological effect that happens.
Small announcements like this carry over into the future and flood any evaluation of yourself which can be described as torture and sabotage since it has an effect on decisions you make sometimes destroying things.
Your comparison to torture and sabotage is unfounded to the point of being simply bizarre.
"Slop" doesn't seem to be Yiddish: https://www.etymonline.com/word/slop, and even if it was, so what?