We want to build an AI agent that can retrieve and read online information and build a knowledge graph from it. Therefore, it makes sense, as the first step, to retrieve online information. So the question now is what online information should we retrieve and how are we going to retrieve these information?
In terms of WHAT, over the last year or so, I have been finding myself reading medium articles every single day. They are short and easy to consume. Some of the practical programming ones are also very useful. So I thought it would be a good place to start scraping and analysing medium articles and focus specifically on technology-related articles!
We are going to build the whole thing using python. We have three available python packages to choose from: Selenium, Scrapy, and BeautifulSoup. That’s going to be our HOW. Specifically, we will be using selenium to login to my premium medium account, go to each medium article, scrape all the headings and paragraphs, and save them to a text file for future analysis. Okay, that’s the game plan. Let’s see how it goes on the execution part. More on this in the next blog post.