Craigslist automation

Update: this project is now available on npm: https://www.npmjs.com/package/craigslist-automation

Long time ago Craigslist allowed accessing their post via RSS. It was possible to append &format=rss to the Craigslist’s URL query string to get a programmatic access to posts. Unfortunately, Craigslist stopped supporting RSS a few years ago and it does not seem like it (or a replacement) is going to be available anytime soon, if ever. With RSS gone, the community stepped up and created python-craigslist  – a Python package that allows accessing Craigslist posts from a Python program. I remember experimenting with it some time ago and it worked pretty well. I tried it again last night and to my surprise I couldn’t get any results for my queries. I checked the project’s repo, and I quickly found an issue that looked exactly like mine. The issue points out that the HTML that Craigslist returns no longer contains posts but a message mentioning that to see the page a browser with JavaScript support is required. This breaks the python-craigslist library as it just sends HTTP requests and simply parses the returned HTML. It seems, Craigslist no longer serves results as plain old HTML but is using JavaScript to build the post gallery dynamically. Not being a web developer, it surprised me to see the same behavior when using a browser – out of curiosity I loaded the “cars+trucks” for sale post gallery, checked the page source, and saw the same message as mentioned in the GitHub issue. However, after inspecting the DOM with the built-in developer tools, I could see individual posts.  

For my experiment, the python-craigslist was an option anymore and I needed a different solution. I spend a few minutes looking at network request Craigslist was sending, and it was clear that making sense out of it would require a lot of effort. What I wanted was something that can act the same way as a browser only can be driven programmatically.  

Enter the headless browser 

When I described what I wanted, I realized this was an exact definition of a headless browser – a browser that can run without a graphical user interface. I knew Chrome could run in the headless mode and could be controlled from a Node.js project as I had played with it a few years earlier. Because it had been a while, I wanted to check how people do this these days. Sure enough, I quickly found puppeteer – a Node.js library that allows interacting with headless Chrome. I quickly created a new Node.js project, configured it to use TypeScript and voila – with a few lines of code:

import * as puppeteer from "puppeteer";
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(
"https://seattle.craigslist.org/search/cta?query=blazer%20k5",
{
waitUntil: "networkidle0",
}
);
let elements = await page.$$("a.post-title");
console.log(elements.length);
await Promise.all(
elements.map(async (e) => {
let href = await e.getProperty("href");
console.log(await href.jsonValue());
})
);
await browser.close();
})();

I was able to get links to listings from my query:

Obviously this is only a simple prototype but could be useful to conduct simple experiments.

The SignalR for ASP.NET Core JavaScript Client, Part 2 – Outside the Browser

Last time we looked at using the ASP.NET Core SignalR TypeScript/JavaScript client in the browser. I mentioned, however, that the new client no longer has dependencies that prevent from using it outside the browser. So, today we will try taking the client outside the browser and use it in a NodeJS application. We will add a NodeJS client for the SignalR Chat service we created last time. Initially we will write the client in JavaScript and then we will convert it to TypeScript.

Let’s start from creating a new folder in the SignalRChat repo and adding a new node project:

mkdir SignalRChatNode
cd SignalRChatNode
npm init

We will call the application signarlchatnode and we will leave all other options set to default values. (6425ec1)

Our application will read messages typed by the user and send them to the server. To handle user input we will use node’s readline module. To see that things, work, let’s just add code to prompts the user for the name and displays it in the console. We will use it a starting point of our application (34bc493).

const readline = require('readline');
let rl = readline.createInterface(process.stdin, process.stdout)

rl.question('Enter your name: ', name => {
console.log(name);
  rl.close();
});

To communicate with the SignalR server we need to add the SignalR JavaScript client to the project using the following command (7875c07):

npm install @aspnet/signalr-client --save

We can now try starting the connection like this (3228a10):

const readline = require('readline');
const signalR = require('@aspnet/signalr-client');

let rl = readline.createInterface(process.stdin, process.stdout);

rl.question('Enter your name: ', name => {
  console.log(name);

  let connection = new signalR.HubConnection('http://localhost:5000/chat');
  connection.start()
  .catch(error => {
    console.error(error);
    rl.close();
  });
});

The code looks good but if you try running it, it will immediately fail with the following error:

Error: Failed to start the connection. ReferenceError: XMLHttpRequest is not defined
ReferenceError: XMLHttpRequest is not defined

What happened? The new JavaScript client no longer depends on the browser but still uses standard libraries like XmlHttpRequest or WebSocket to communicate with the server. If these libraries are not provided the client will fail. Fortunately, the required functionality can be easily polyfilled in the NodeJS environment. For now, we will just stick the polyfills on the global object. It’s not beautiful by any means but will do the trick. We are discussing how to make it better in the future but at the moment this is the way to go.

Depending on the features of SignalR you plan to use you will need to provide appropriate polyfills. Currently the absolute minimum is XmlHttpRequest. SignalR client uses it to send the initial OPTIONS HTTP request which initializes the connection on the server side and for the long polling transport. So, if use the long polling transport only, XmlHttpRequest is the only polyfill you will need to provide . If you want to use the WebSockets transport you will need a WebSocket polyfill in addition to XmlHttpRequest. (We are thinking about skipping sending the OPTIONS request for WebSockets. If this is implemented you will not need the XmlHttpRequest polyfill when using the WebSockets transport.) For ServerSentEvents transport you will need an EventSource polyfill. Finally, if you happen to use binary protocols (e.g. MessagePack) over the ServerSentEvent transport you will need polyfills for atob/btoa functions. For simplicity, we will use the WebSocket transport in our application so we will add only polyfills for XmlHttpRequest and WebSockets:

npm install websocket xmlhttprequest --save

and make them available globally via:

XMLHttpRequest = require('xmlhttprequest').XMLHttpRequest;
WebSocket = require('websocket').w3cwebsocket;

If we run the code now we will see something like this:

moozzyk:~/source/SignalRChat/SignalRChatNode$ node index.js
Enter your name: moozzyk
moozzyk
Information: WebSocket connected to ws://localhost:5000/chat?id=0d015ce4-3a78-4313-9343-cb6183a5e8ea
Information: Using HubProtocol 'json'.

which tells us that the client was able to connect successfully to the server. (946f85d)

Now, we need to add some code to handle user input and interact with the server and our Node SignalR Chat client is ready. (I admit that the user interface is not very robust but should be enough for the purpose of this post). You can now talk to browser clients from your node client and vice versa (0f7f71f):

Screen Shot 2017-09-30 at 6.57.14 PM

Now let’s convert our client to TypeScript. We will start from creating a new TypeScript project with tsc --init. In the generated tsconfig.json file we will change the target to es6. We will also add an empty index.ts file and delete the existing index.js file (we will no longer need the index.js file since we will now be generating one by compiling the newly created index.ts). (b83cf92) If you now run tsc you should see an empty index.js file created as a result of compiling the index.ts file.  The last thing to do is to actually convert our JavaScript code to TypeScript. We could just translate it one-to-one but we can do a little better. TypeScript supports async/await which makes writing asynchronous code much easier. Since many of SignalR client methods return Promises we can just await these calls instead of using .then/.catch functions. Here is how our node SignalRChat client written in TypeScript looks like (2a6d0e9):

import * as readline from "readline"
import * as signalR from "@aspnet/signalr-client"

(<any>global).XMLHttpRequest = require("xmlhttprequest").XMLHttpRequest;
(<any>global).WebSocket = require("websocket").w3cwebsocket;

let rl = readline.createInterface(process.stdin, process.stdout);

rl.question("Enter your name: ", async name => {
  console.log(name);
  let connection = new signalR.HubConnection("http://localhost:5000/chat");

  connection.on("broadcastMessage", (name, message) => {
    console.log(`${name}: ${message}`);
    rl.prompt(true);
  });

  try {
    await connection.start();
    rl.prompt();

    rl.on("line", async input => {
      if (input === "!q") {
        console.log("Stopping connection...");
        connection.stop();
        rl.close();
        return;
      }
      await connection.send("send", name, input);
    });
  }
  catch (error) {
    console.error(error);
    rl.close();
  }
});

You can run it by executing the following commands:
tsc
node index.js

Today we learned how to use the ASP.NET Core SignalR client in the NodeJS environment. We created a small node JavaScript application that was able to communicate with browser clients which. Finally, we converted the JavaScript code to TypeScript and learn a little bit about the TypeScript’s async/await feature.

The SignalR for ASP.NET Core JavaScript Client, Part 1 – Web Applications

The first official release of SignalR for ASP.NET Core – alpha1 – was just released. In this release, all SignalR components were rewritten to make SignalR simpler, easier to use and more reliable.

The SignalR JavaScript client has always been a fundamental part of SignalR. Unfortunately, it has a few limitations which made it hard to extend or use outside the browser. The rewrite allowed to introduce changes which allow to take the client outside the browser (no more dependency on jQuery, YAY!) and open new scenarios. And this is what this blog post will focus on. I split the post to two parts. In the first part I will show how to use the client in a web application from both JavaScript and TypeScript. In the second, part we will look at NodeJS.

The plan for this part is to recreate the chat application from the tutorial on the previous version of SignalR and then to convert it to use the new SignalR Server and JavaScript client. The sample is simple enough to allow us to focus on SignalR aspects rather than on application intricacies. As a bonus, we will see what the experience of porting an application from the previous version of SignalR is. I created a github repo for the application where each commit is a step described in this post. I will refer to particular commits from this post to show changes for a given step.

Setting up the Server

Let’s start from creating an empty ASP.NET Core application. We can do that from command line by running the dotnet new web command. (See this step on github).

Once the application is created we can start the server with dotnet run and make sure it works by navigating to http://localhost:5000 from a browser.

After we ensured that the application runs we can add SignalR server components. First, we need to add a reference to the SignalR package to the SignalRChat.csproj file (See this step on github).

Now we can add the Chat Hub class – we will just copy the code from tutorial and tweak a few things. This is how the hub class looks after the changes:

using System;
using Microsoft.AspNetCore.SignalR;
namespace SignalRChat
{
    public class ChatHub : Hub
    {
        public void Send(string name, string message)
        {
            // Call the broadcastMessage method to update clients.
            Clients.All.InvokeAsync("broadcastMessage", name, message);
        }
    }
}

The changes we made were only cosmetic – we removed the reference to the System.Web namespace, added 'Core' to the Microsoft.AspNet.SignalR so that it reads Microsoft.AspNetCore.SignalR. We also changed how we invoke the client-side method by passing the method name as the first parameter to the InvokeAsync call. (See this step on github).

Now that we created a hub we need to configure the application to be aware of SignalR and to forward SignalR related messages to our hub. It’s as easy as calling AddSignalR extension method in the ConfigureServices method of our Startup class and mapping the hub with the UseSignalR method. We will also add the static files middleware which will be responsible for serving static files. The Startup class should look like this:

public class Startup
{
    public void ConfigureServices(IServiceCollection services)
    {
        services.AddSignalR();
    } 

    public void Configure(IApplicationBuilder app, IHostingEnvironment env)
    {
        if (env.IsDevelopment())
        {
            app.UseDeveloperExceptionPage();
        }

        app.UseFileServer();

        app.UseSignalR(routes =>
        {
            routes.MapHub<ChatHub>("chat");
        });
    }
}

(See this step on github).

And this is all the work we had to do create a functional SignalR chat server. Now we can focus on the client side.

The JavaScript Client

In the new version of SignalR the JavaScript client is distributed using npm. The npm module contains a version of the client that can be just included in a web page using the tag, as well as, typings and modules that can be consumed from TypeScript. To get the client to your machine you need to install npm if you haven’t already and run:

npm install @aspnet/signalr-client

The client will be installed in the node_modules folder and you can find the necessary files to include in the node_modules/@aspnet/signalr-client/dist/browser folder. You may wonder why there are so many files in this folder and what purpose they serve. Let’s go over them then and explain.

First, you will find that there are two sets of files – files that contain ES5 in the names and files that do not contain ES5 in the names. SignalR JavaScript uses ES6 (a.k.a EcmaScript 2015) features like Promises or arrow functions. Not all browsers however, support ES6 (looking at you Internet Explorer). The files without ES5 in the names are meant to be used in browsers that support ES6. The files that contain ES5 in the names are the ES6 files transpiled to ES5. They are ES5 compatible and include all required dependencies. The downside of the ES5 files is that they are much bigger than ES6 files.

Another interesting set of files are files containing msgpackprotocol in the name. The new version of SignalR supports custom hub protocols – including binary protocols – and has built-in support for a binary protocol based on MessagePack. The JavaScript implementation of the MessagePack based hub protocol (using the msgpack5) turned out to be quite big so we moved it to a separate file. This way you can include the MessagePack hub protocol only if you want to use it and will not pay the price if you don’t care.

You will also find that each file has a min counterpart. These are just minified versions of the corresponding files. You will want to use the minified versions in production but debugging is much easier with non-minified files so you may want to use non-minified versions during development.

Finally, there is also the third-party-notices.txt file. These are notices for the msgpack5 library and its dependencies used in the MessagePack hub protocol implementation.

Using the SignalR JavaScript Client from JavaScript

Now, that we know a little bit about the JavaScript client let’s update our application to use it.

First, let’s copy all the files from the node_modules/@aspnet/signalr-client/dist/browser folder to a new ​scritps/signalr folder under the wwwroot. (See this step on github).

After the files are copied, let’s create the index.html file in the wwwroot folder and paste the contents of the html file from the tutorial. (See this step on github).

If you try to run the application at this point it will not work. The index.html has references to files like the jQuery library or the old SignalR client which don’t exist. Let’s fix that. Note that even though jQuery is no longer required to the new SignalR client I will continue to use it to minimize the number of changes I need to make. All in all this is not a tutorial on how to remove jQuery from your app so let’s not get sidetracked. Let’s start from sorting out the scripts situation. For jQuery, I will replace the link with the one to the jQuery CDN. For SignalR, I will replace the link to the signalR-2.2.1.min.js file with signalR-client-1.0.0-alpha1.js (feel free to use the ES5 version if you are using a browser that don’t support ES6 features) and remove the link to hubs since hub proxies are currently not supported. (See this step on github (github trick – notice that the link ends with ?w=1 – try removing it and see what happens. Very useful when reviewing some PRs)).

Now we can finally fix the code. Fortunately, this is not a lot of changes:

  • Instead of using proxies we will just create a new HubConnection
  • To register the callback for the client side broadcastMessage method we will use the on function
  • We will replace the done method used by jQuery deferreds to the then used by ES6 promises
  • We will invoke hub methods with the invoke function

(See this step on github).

That’s pretty much it. If you run the application now you should be able to send and receive messages.

Using the JavaScript Client from TypeScript

We now know how to use the new JavaScript SignalR client from JavaScript code. The SignalR client module contains also all necessary bits that make it possible to be consumed from TypeScript. To see how it works let’s take our chat application a bit further and convert it TypeScript.

First, make sure that you have a recent TypeScript compiler installed – run tsc --version from command line. If running the command fails or you have an older version installed install the latest one using this command:

npm install typescript -g

After installing or updating the typescript compiler we will initialize a new project by running

tsc --init

in the project folder. This will create a tsconfig.json file which will look like this:

{
  "compilerOptions": {
    "target": "es6",
    "module": "commonjs",
    "strict": true,
    "noImplicitAny": true
  }
}

after performing some cleanup. We will also add a new chat.ts file which we will leave empty for now. If you run the tsc command from project root you should see an almost empty chat.js file generated from your chat.ts file. (See this step on github).

Because we are using TypeScript and will bring dependencies using npm we will no longer need JavaScript files for the browser so let’s delete them. (See this step on github).

To be able to add and restore dependencies the client will need, let’s create a package.json file by executint the npm init command. We will leave default values for almost all settings except for the project name which needs to be lowercase.

PS C:\source\SignalRChat\SignalRChat> npm init
This utility will walk you through creating a package.json file.
It only covers the most common items, and tries to guess sensible defaults.
See `npm help json` for definitive documentation on these fields
and exactly what they do.

Use `npm install <pkg> --save` afterwards to install a package and
save it as a dependency in the package.json file.

Press ^C at any time to quit.
name: (SignalRChat) signalrchat
version: (1.0.0)
description:
entry point: (chat.js)
test command:
git repository:
keywords:
author:
license: (ISC)
About to write to C:\source\SignalRChat\SignalRChat\package.json:

{
"name": "signalrchat",
"version": "1.0.0",
"description": "",
"main": "chat.js",
"dependencies": {},
"devDependencies": {},
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "",
"license": "ISC"
}

Is this ok? (yes)
PS C:\source\SignalRChat\SignalRChat>

Now let’s add our dependencies – signalr-client, jquery and jquery typings (they enable using jquery from TypeScript). We will use the --save-dev option to save the dependencies as dev dependencies in the package.json file.

npm install @aspnet/signalr-client --save-dev
npm install jquery --save-dev
npm install @types/jquery --save-dev

We also need to install browserify – a tool which we will use to create the final script to be used by the browser:

npm install -g browserify

(See this step on github).

We can now start working on the code. First, we need to import the dependencies we are going to use. We can do that by adding the following two lines at the top of our chat.ts file:

import * as signalR from "@aspnet/signalr-client"
import * as $ from "jquery

Now we can move the script from our .html file to the .ts file. If you do that and play a little bit with the code you will notice that intellisense now tells you about class members and function parameters and if you press F12 (in Visual Studio Code) it will take you to the function header. Another thing, you will see is an error on line 5.  This TypeScript telling you that there is a type mismatch for the parameter passed to the jQuery val() function – the prompt() function can return null which is not a valid input for the val() function.

VSCodeSignalR

In our case we know that prompt will return string so we will just cast the result to string to suppress the error.

Since we moved the function to the .ts file we can now remove all the JavaScript code from our index.html file. We can also remove all the tags since we no longer depend on them to bring dependencies (we also already deleted the scripts). (See this step on github).

Let’s compile our chat.ts file now by running tsc command. If you look at the generated chat.js file you will notice that it looks pretty much the same as the source chat.ts file with some additional lines at the top. You will also notice that it does not have the required dependencies (i.e. signalr-client and jquery). This is where browserify comes into play. We will use browserify to generate the final version of the file with all the dependencies. Let’s run the following command (you may need to create the wwwroot/scripts folder if one does not exist) from the project folder:

browserify .\chat.js -o .\wwwroot\scripts\chat.js

Take a look at the chat.js file that was created by browserify and now you will see that the file is much bigger and contains all the required dependencies. If we include this file in our index.html with the tag, start the application and open in the browser you will see that it works and you can send and receive messages. (See this step on github). We could even automate build steps (e.g. with gulp) but it’s out of scope for this post.

Summary

In this post, we looked at using the new SignalR JavaScript client in web applications. We learned how to use the client from both JavaScript and TypeScript. We tried to port an application using the previous version of SignalR to see how hard it is. In the next part, we will take a look at using the client in NodeJS applications.