A little insane, but in a good way.
If I remember correctly, the properties the API returns are comment_score
and post_score
.
Lemmy does have karma, it is stored in the DB, and the API returns it. It just isn’t displayed on the UI.
It only handles HTML currently, but I like your idea, thank you! I’ll look into implementing reading PDFs as well. One problem with scientific articles however is that they are often quite long, and they don’t fit into the model’s context. I would need to do recursive summarization, which would use much more tokens, and could become pretty expensive. (Of course, the same problem occurs if a web page is too long; I just truncate it currently which is a rather barbaric solution.)
I think the incentives are a bit different here. If we can keep the threadiverse nonprofit, and contribute to the maintenance costs of the servers, it might stay a much friendlier place than Reddit.
Lemmy actually has a really good API. Moderation tools are pretty simple though.
Did I miss something? Or is this still about Beehaw?
Made the switch 4 years ago. No regrets.
This describes 99% of AI startups.
The company I work for was considering using Mendable for AI-powered documentation search. I built a prototype using OpenAI embeddings and GPT-3.5 that was just as good as their product in a day. They didn’t buy Mendable :)
First, thank you for the detailed response.
Second, I think you finally convinced me to delete my FB. I will link to this comment wherever possible to show people what a terrible company Meta is.
After all, they said we need quality content to attract new users
They got gregnant
I’m the author of that bot. It will have an opt-out option, I implemented it as soon as someone suggested it:
https://programming.dev/comment/305938
Don’t spread sensationalist lies.
Oh wow, I’ve just realized it was OP I talked to in the comments. I immediately replied to their suggestion. What a clown 🤡
Can you tell us more about what they are like?
Thank you, that’s a reasonable suggestion, I added it to the comment template:
TL;DR: (AI-generated 🤖)
Yes, they have promised explicitly not to use API data for training.
Thank you, I’ll take a look at these models, I hope I can find something a bit cheaper but still high-quality.
I implemented it. The feature will be available right from the start. The bot will reply this if the user has disabled it:
🔒 The author of this post or comment has the #nobot hashtag in their profile. Out of respect for their privacy settings, I am unable to summarize their posts or comments.
Oh, I’ve just realized that it’s also possible if the video doesn’t have a transcript. You can download the audio and feed it into OpenAI Whisper (which is currently the best available audio transcription model), and pass the transcript to the LLM. And Whisper isn’t even too expensive.
Not sure about the legality of it though.
LLMs can do a surprisingly good job even if the text extracted from the PDF isn’t in the right reading order.
Another thing I’ve noticed is that figures are explained thoroughly most of the time in the text so there is no need for the model to see them in order to generate a good summary. Human communication is very redundant and we don’t realize it.