Long time ago Craigslist allowed accessing their post via RSS. It was possible to append
For my experiment, the python-craigslist was an option anymore and I needed a different solution. I spend a few minutes looking at network request Craigslist was sending, and it was clear that making sense out of it would require a lot of effort. What I wanted was something that can act the same way as a browser only can be driven programmatically.
Enter the headless browser
When I described what I wanted, I realized this was an exact definition of a headless browser – a browser that can run without a graphical user interface. I knew Chrome could run in the headless mode and could be controlled from a Node.js project as I had played with it a few years earlier. Because it had been a while, I wanted to check how people do this these days. Sure enough, I quickly found puppeteer – a Node.js library that allows interacting with headless Chrome. I quickly created a new Node.js project, configured it to use TypeScript and voila – with a few lines of code:
I was able to get links to listings from my query:
Obviously this is only a simple prototype but could be useful to conduct simple experiments.