Show HN: Analyzing top HN posts with language models https://ift.tt/ku3IHTb

Show HN: Analyzing top HN posts with language models Hi HN, I spent a few weeks looking at the top HN posts of all time. This included exploration, clustering, creating visualizations, and zooming in on what (to me personally) seems like some of the best discussions on here. Three things in this post: 1- The interesting groups of HN posts 2- The interactive visualizations that you can explore in your browser 3- The data from this exploration -- this includes CSV of the titles as well as the text embeddings of 3,000 Ask HN articles. Blog post about this whole process here: [1] ============ 1- The interesting groups of HN posts From the exploration, Ask HN proved the most interesting. These are the top four groups of topics I found insightful. Each group contains about 400 posts. - Life experiences and advice threads [2] - Technical and personal development [3] - Software career insights, advice, and discussions [4] - General content recommendations (blogs/podcasts) [5] ============ 2- The interactive visualizations that you can explore in your browser - Top 10,000 Hacker News articles of all time [6] - Top 3,000 posts in Ask HN [7] ============ 3- The data from this exploration CSV file of top 3K Ask HN posts: [8] The sentence embeddings of the titles of those posts: [9] This is a colab notebook containing the code examples (including loading these two data files): [10] ============ If you've ever wanted to get into language models, this is a good place to start. Happy to answer any questions June 10, 2022 at 05:47AM

Comments