JavaScript – Code, the Universe and Everything…

In the last few months, I have published more than a dozen posts on dev.to. Soon after I started, I realized that the analytics provided out-of-the-box was missing some features. One that I have been missing the most is the ability to see a daily breakdown of read posts.

Fortunately, the UI is not the only way to access stats. They are also available via the DEV Community API (Dev.to API). I decided to spend a few hours this weekend to see what it would take to use the Dev.to API to build the feature I was missing the most. I wrote this post to share my learnings.

Project overview

For my project, I decided to build a JavaScript web application with the Express framework and EJS templates. I did this because I wanted a dashboard with some nice-looking graphs. After I started, I realized that building a dashboard would be a waste of time because printing the stats would yield almost the same result (i.e. I could ship without it). In retrospect, my prototype could have been just a command-line application, which would have halved my effort.

DEV Community API crash course

I learned most about what I needed by investigating how the DEV dashboard worked. Using Chrome Developer Tools, I discovered two endpoints that were key to achieving my goal:

retrieving a list of articles
getting historical stats for a post

Both endpoints require authorization. API authorization mandates setting the api-key header in the HTTP requests. To get your API Key, go to Settings, click on Extensions on the left side, and scroll to the DEV Community API Keys at the bottom of the page. Here, you can see your active keys or generate a new key:

Once you have your API key, you can send API requests using fetch as follows:

function initiateDevToRequest(url, apiKey) {
  return fetch(url, {
    headers: {
      "api-key": apiKey,
    },
  });
}

Retrieving posts

To retrieve a list of published articles, we need to send an HTTP Request to the articles endpoint:

https://dev.to/api/articles/me/published

A successful response is a JSON payload that contains details about published articles, including their IDs and titles.

Side note: there is a version of this API that does not require authorization. You request a list of articles for any user with the following URL: https://dev.to/api/articles?username=moozzyk

Fetching stats

To fetch stats, we need to send a request like this:

https://dev.to/api/analytics/historical?start=2024-02-10&article_id=1769817

The start parameter indicates the start date, while the article_id defines which article we want the stats for.

Productivity tip

You can test APIs requested with the GET method by pasting the URL directly in the browser, as browser authorization does not rely on the api-key header.

<rant>

I found the DEV Community API situation quite confusing. I was pointed to Forem by a web search and initially did not understand the connection between dev.to and Forem. In addition, the Forem’s API page contradicts itself about which API version to use. Finally, it turned out that API documentation does not include the endpoints I use in my project (but hey, they work!).

</rant>

Implementation

Once I figured out the APIs, I concluded that I can implement my idea in three steps:

send a request to the articles endpoint to retrieve the list of articles
for each article, send a request to the analytics endpoint to fetch the stats
group stats by date and show them to the user

Throttling

In my first implementation, I created a fetch request for each article and used Promise.all to send all of them in parallel. I knew it was generally not a brilliant idea because Promise.all does not allow to limit concurrency, but I hoped it would work for my case as I had fewer than 20 articles. I was wrong. With this approach, I only got stats for at most two articles. All other requests were rejected with the 429: Too many requests errors. My requests were throttled even after I changed my code to send one request at a time. To fix this problem, I added delay between requests like this:

  const statResponses = [];
  // Poor man's rate limiting to avoid 429: Too Many Requests
  for (const article of articles) {
    const resp = 
      await initiateArticleStatRequest(article, startDate);
    statResponses.push(resp.ok ? await resp.json() : {});
    await new Promise((resolve) => setTimeout(resolve, 200));
  }

This is not great but works good enough for a handful of articles.

Side note: I noticed that even the UI Dashboard fails to load quite frequently due to throttling

Result

Here is the result – stats for my posts for the past seven days, broken by day:

It is not pretty, but it does have all the information I wanted to get.

Do It Yourself

If you want to learn more about the implementation or just try the project, the code is available on github.

Important: The app reads the API Key from the DEVTO_API_KEY environment variable. You can either set it before starting the app, or configure it in the .env file and start the app with node --env-file=.env index.js

Hopefully you found this useful. If you have any questions drop a comment below.

Update: this project is now available on npm: https://www.npmjs.com/package/craigslist-automation

Long time ago Craigslist allowed accessing their post via RSS. It was possible to append &format=rss to the Craigslist’s URL query string to get a programmatic access to posts. Unfortunately, Craigslist stopped supporting RSS a few years ago and it does not seem like it (or a replacement) is going to be available anytime soon, if ever. With RSS gone, the community stepped up and created python-craigslist – a Python package that allows accessing Craigslist posts from a Python program. I remember experimenting with it some time ago and it worked pretty well. I tried it again last night and to my surprise I couldn’t get any results for my queries. I checked the project’s repo, and I quickly found an issue that looked exactly like mine. The issue points out that the HTML that Craigslist returns no longer contains posts but a message mentioning that to see the page a browser with JavaScript support is required. This breaks the python-craigslist library as it just sends HTTP requests and simply parses the returned HTML. It seems, Craigslist no longer serves results as plain old HTML but is using JavaScript to build the post gallery dynamically. Not being a web developer, it surprised me to see the same behavior when using a browser – out of curiosity I loaded the “cars+trucks” for sale post gallery, checked the page source, and saw the same message as mentioned in the GitHub issue. However, after inspecting the DOM with the built-in developer tools, I could see individual posts.

For my experiment, the python-craigslist was an option anymore and I needed a different solution. I spend a few minutes looking at network request Craigslist was sending, and it was clear that making sense out of it would require a lot of effort. What I wanted was something that can act the same way as a browser only can be driven programmatically.

Enter the headless browser

When I described what I wanted, I realized this was an exact definition of a headless browser – a browser that can run without a graphical user interface. I knew Chrome could run in the headless mode and could be controlled from a Node.js project as I had played with it a few years earlier. Because it had been a while, I wanted to check how people do this these days. Sure enough, I quickly found puppeteer – a Node.js library that allows interacting with headless Chrome. I quickly created a new Node.js project, configured it to use TypeScript and voila – with a few lines of code:

	import * as puppeteer from "puppeteer";

	(async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.goto(
	"https://seattle.craigslist.org/search/cta?query=blazer%20k5",
	{
	waitUntil: "networkidle0",
	}
	);
	let elements = await page.$$("a.post-title");
	console.log(elements.length);
	await Promise.all(
	elements.map(async (e) => {
	let href = await e.getProperty("href");
	console.log(await href.jsonValue());
	})
	);
	await browser.close();
	})();

view raw craigslist-puppeteer.ts hosted with ❤ by GitHub

I was able to get links to listings from my query:

Obviously this is only a simple prototype but could be useful to conduct simple experiments.

Tag: JavaScript

Better DEV stats with Dev.to API