Down the Rabbit Hole
26 Oct 2020I've always wondered how far I can travel on Youtube. If I spent enough time on the platform with the intention of getting lost, where would I end up? Unfortunately, I haven't had the time to spend a whole doing so, and Youtube's algorithms are quite good at keeping me in my content circle. I also want to document and quanitfy this journey across the Youtube landscape to see the inflections points within the journey. When does the content start to dramatically change? Does the content ever break out of the grasp of the recommendation engine? Will I encounter some strange as yet unseen content? Only one way to find out.
A few years ago I had tried a similar experiment, but without the pointed rigour or technical know-how. Armed with some basic Python, and advanced search skills, I set about trying to figure out how I would traverse across the platform, all while documenting my journey. The idea was simple enough. Open a video on Youtube, and then click on the video under 'Up Next' on the right side of the screen. Repeat and record the metadata, and a screenshot of each video. I hacked together a quick little Selenium powered Python script, which would store all the data within a CSV file.
However, I immediately ran into a strange problem. For reasons I don't understand the videos would either repeat or only 2 videos would play one after another. I assume this has to do either do with my script, or a strange behaviour of Youtube's algorithms.
First attempt
To get around this issue, I changed the code so that it would select any video from the suggestions on the right hand side. This helped fix the problem, and during my testing, I also figured out a way to get metadata such as a title, views, tags, and the thumbnail from the video. In the previous iteration, the screenshot would often capture an image of the ad, and not the video. With the new algorithm set up, I took another shot at travelling across the internet.
New code
Added a progress bar
The first run wasn't too spectacular, with the content of the videos staying relatively within the same genre and kind of content. A 100 iterations doesn't appear to be enough to derail or test the algorithm.
100 iterations. See full file
here.
I tried 300 iterations on the next attempt. This time the results were a little more interesting. Right towards thre end there were a few videos which weren't very popular, evident by their view count. Another longer run would be needed to explore this.
300 iterations. See full file
here.
Each website visit takes about 10 seconds, which includes an induced delay to make sure the website loads properly. So 500 iterations would take approximately 1.5 hours, maybe more depending how the CPU load.
500 iterations. See full file
here.
The last run with 500 iterations did not produce any noteworthy results, perhaps due to the expansive nature of gaming content. Perhaps more runs are needed, with diverse topics and
The results may not be as satisfying as I'd have wanted them to be, but it's an interesting research process, and the CSV format is conducive to creating an interactive website from the data.