Category Archives: NodeJS

Craigslist automation

Long time ago Craigslist allowed accessing their post via RSS. It was possible to append &format=rss to the Craigslist’s URL query string to get a programmatic access to posts. Unfortunately, Craigslist stopped supporting RSS a few years ago and it does not seem like it (or a replacement) is going to be available anytime soon, if ever. With RSS gone, the community stepped up and created python-craigslist  – a Python package that allows accessing Craigslist posts from a Python program. I remember experimenting with it some time ago and it worked pretty well. I tried it again last night and to my surprise I couldn’t get any results for my queries. I checked the project’s repo, and I quickly found an issue that looked exactly like mine. The issue points out that the HTML that Craigslist returns no longer contains posts but a message mentioning that to see the page a browser with JavaScript support is required. This breaks the python-craigslist library as it just sends HTTP requests and simply parses the returned HTML. It seems, Craigslist no longer serves results as plain old HTML but is using JavaScript to build the post gallery dynamically. Not being a web developer, it surprised me to see the same behavior when using a browser – out of curiosity I loaded the “cars+trucks” for sale post gallery, checked the page source, and saw the same message as mentioned in the GitHub issue. However, after inspecting the DOM with the built-in developer tools, I could see individual posts.  

For my experiment, the python-craigslist was an option anymore and I needed a different solution. I spend a few minutes looking at network request Craigslist was sending, and it was clear that making sense out of it would require a lot of effort. What I wanted was something that can act the same way as a browser only can be driven programmatically.  

Enter the headless browser 

When I described what I wanted, I realized this was an exact definition of a headless browser – a browser that can run without a graphical user interface. I knew Chrome could run in the headless mode and could be controlled from a Node.js project as I had played with it a few years earlier. Because it had been a while, I wanted to check how people do this these days. Sure enough, I quickly found puppeteer – a Node.js library that allows interacting with headless Chrome. I quickly created a new Node.js project, configured it to use TypeScript and voila – with a few lines of code:

import * as puppeteer from "puppeteer";
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(
"https://seattle.craigslist.org/search/cta?query=blazer%20k5",
{
waitUntil: "networkidle0",
}
);
let elements = await page.$$("a.post-title");
console.log(elements.length);
await Promise.all(
elements.map(async (e) => {
let href = await e.getProperty("href");
console.log(await href.jsonValue());
})
);
await browser.close();
})();

I was able to get links to listings from my query:

Obviously this is only a simple prototype but could be useful to conduct simple experiments.

Advertisement
Tagged , , ,

The SignalR for ASP.NET Core JavaScript Client, Part 2 – Outside the Browser

Last time we looked at using the ASP.NET Core SignalR TypeScript/JavaScript client in the browser. I mentioned, however, that the new client no longer has dependencies that prevent from using it outside the browser. So, today we will try taking the client outside the browser and use it in a NodeJS application. We will add a NodeJS client for the SignalR Chat service we created last time. Initially we will write the client in JavaScript and then we will convert it to TypeScript.

Let’s start from creating a new folder in the SignalRChat repo and adding a new node project:

mkdir SignalRChatNode
cd SignalRChatNode
npm init

We will call the application signarlchatnode and we will leave all other options set to default values. (6425ec1)

Our application will read messages typed by the user and send them to the server. To handle user input we will use node’s readline module. To see that things, work, let’s just add code to prompts the user for the name and displays it in the console. We will use it a starting point of our application (34bc493).

const readline = require('readline');
let rl = readline.createInterface(process.stdin, process.stdout)

rl.question('Enter your name: ', name => {
console.log(name);
  rl.close();
});

To communicate with the SignalR server we need to add the SignalR JavaScript client to the project using the following command (7875c07):

npm install @aspnet/signalr-client --save

We can now try starting the connection like this (3228a10):

const readline = require('readline');
const signalR = require('@aspnet/signalr-client');

let rl = readline.createInterface(process.stdin, process.stdout);

rl.question('Enter your name: ', name => {
  console.log(name);

  let connection = new signalR.HubConnection('http://localhost:5000/chat');
  connection.start()
  .catch(error => {
    console.error(error);
    rl.close();
  });
});

The code looks good but if you try running it, it will immediately fail with the following error:

Error: Failed to start the connection. ReferenceError: XMLHttpRequest is not defined
ReferenceError: XMLHttpRequest is not defined

What happened? The new JavaScript client no longer depends on the browser but still uses standard libraries like XmlHttpRequest or WebSocket to communicate with the server. If these libraries are not provided the client will fail. Fortunately, the required functionality can be easily polyfilled in the NodeJS environment. For now, we will just stick the polyfills on the global object. It’s not beautiful by any means but will do the trick. We are discussing how to make it better in the future but at the moment this is the way to go.

Depending on the features of SignalR you plan to use you will need to provide appropriate polyfills. Currently the absolute minimum is XmlHttpRequest. SignalR client uses it to send the initial OPTIONS HTTP request which initializes the connection on the server side and for the long polling transport. So, if use the long polling transport only, XmlHttpRequest is the only polyfill you will need to provide . If you want to use the WebSockets transport you will need a WebSocket polyfill in addition to XmlHttpRequest. (We are thinking about skipping sending the OPTIONS request for WebSockets. If this is implemented you will not need the XmlHttpRequest polyfill when using the WebSockets transport.) For ServerSentEvents transport you will need an EventSource polyfill. Finally, if you happen to use binary protocols (e.g. MessagePack) over the ServerSentEvent transport you will need polyfills for atob/btoa functions. For simplicity, we will use the WebSocket transport in our application so we will add only polyfills for XmlHttpRequest and WebSockets:

npm install websocket xmlhttprequest --save

and make them available globally via:

XMLHttpRequest = require('xmlhttprequest').XMLHttpRequest;
WebSocket = require('websocket').w3cwebsocket;

If we run the code now we will see something like this:

moozzyk:~/source/SignalRChat/SignalRChatNode$ node index.js
Enter your name: moozzyk
moozzyk
Information: WebSocket connected to ws://localhost:5000/chat?id=0d015ce4-3a78-4313-9343-cb6183a5e8ea
Information: Using HubProtocol 'json'.

which tells us that the client was able to connect successfully to the server. (946f85d)

Now, we need to add some code to handle user input and interact with the server and our Node SignalR Chat client is ready. (I admit that the user interface is not very robust but should be enough for the purpose of this post). You can now talk to browser clients from your node client and vice versa (0f7f71f):

Screen Shot 2017-09-30 at 6.57.14 PM

Now let’s convert our client to TypeScript. We will start from creating a new TypeScript project with tsc --init. In the generated tsconfig.json file we will change the target to es6. We will also add an empty index.ts file and delete the existing index.js file (we will no longer need the index.js file since we will now be generating one by compiling the newly created index.ts). (b83cf92) If you now run tsc you should see an empty index.js file created as a result of compiling the index.ts file.  The last thing to do is to actually convert our JavaScript code to TypeScript. We could just translate it one-to-one but we can do a little better. TypeScript supports async/await which makes writing asynchronous code much easier. Since many of SignalR client methods return Promises we can just await these calls instead of using .then/.catch functions. Here is how our node SignalRChat client written in TypeScript looks like (2a6d0e9):

import * as readline from "readline"
import * as signalR from "@aspnet/signalr-client"

(<any>global).XMLHttpRequest = require("xmlhttprequest").XMLHttpRequest;
(<any>global).WebSocket = require("websocket").w3cwebsocket;

let rl = readline.createInterface(process.stdin, process.stdout);

rl.question("Enter your name: ", async name => {
  console.log(name);
  let connection = new signalR.HubConnection("http://localhost:5000/chat");

  connection.on("broadcastMessage", (name, message) => {
    console.log(`${name}: ${message}`);
    rl.prompt(true);
  });

  try {
    await connection.start();
    rl.prompt();

    rl.on("line", async input => {
      if (input === "!q") {
        console.log("Stopping connection...");
        connection.stop();
        rl.close();
        return;
      }
      await connection.send("send", name, input);
    });
  }
  catch (error) {
    console.error(error);
    rl.close();
  }
});

You can run it by executing the following commands:
tsc
node index.js

Today we learned how to use the ASP.NET Core SignalR client in the NodeJS environment. We created a small node JavaScript application that was able to communicate with browser clients which. Finally, we converted the JavaScript code to TypeScript and learn a little bit about the TypeScript’s async/await feature.