Performance of array filter in JavaScript

Today a coworker remarked to me that he had been using the JavaScript native array method .filter, but found that writing a custom for loop to do the filtering was much faster. “How can this be?” we wondered, “Native is always faster!”

Since I’m skeptical by nature, my first move was to confirm my coworker’s claim in a controlled environment.

I created an array of a few million random numbers, and a function to filter then based on even or odd.

var vals = [];
for (var i = 0; i < 5000000; i++) {
    vals.push(Math.floor((Math.random() * 100) + 1));
}

var f = function(x) { return x % 2 === 0; };

Then I applied a performance measurement:

function measure(f) {
    var start = performance.now();
    f();
    var end = performance.now();
    var diff = end - start
    document.write(f + " took " + diff + " ms.<br />");    
}

I also wrote a completely naive for-loop based filter method which I attached to the Array prototype.

Array.prototype.naiveForFilter = function(f)   {    
    'use strict';
    var results = [];
    var len = this.length;
    for (var i = 0; i < len; i++) {        
        if (f(this[i])) {
            results.push(this[i]);
        }
    }
    return results;  
};

Finally, I compared the execution time of the two:

measure(function() { vals.filter(f) });
measure(function() { vals.naiveForFilter(f) });

The outcome (in Chrome) was shocking: the naive hand-rolled for loop ran in 1/5 the time of the native filter method. This seemed opposed to all possible common-sense. I asked my coworker if he had done any search, and he said there was an article from 2011, but he figured since it was so many years ago, it wouldn't still apply.

The article claims the slow-down is due to the additional safeties offered by the native method:

  1. It ignores deleted values and gaps in the array
  2. It optionally sets the execution context of the predicate function
  3. It prevents the predicate function from mutating the data

This still seems a little suspect, given that the native method was 5x slower. It also raises the question (just a curiosity) of how much impact each of those safeties costs. Is there one in particular that incurs a higher cost? To dig further, I pulled the Mozilla polyfill reference implementation for filter.

Adding this entry to my profiling, I found it about five times slower than the native method. Wow! So native, doing the same thing, is definitely faster. But still, the question is: where is the expense? I identified two possible cost centers visually in the polyfill: membership checking in the array using "in" and binding the context of the filter function using "call":

if (i in this) {
  if (f.call(thisArg, val, i, this)) {

I guessed that the context binding was the slow part. But to find out, I made multiple versions of the filter function, beginning the naive implementation (known to be very fast), and incrementally adding checks until it approached the complexity of the reference polyfill (known to be very slow). I tested all implementations on both browsers on my machine (Chrome and IE 11). I also compared to an implementation using ES6 foreach.

filter speed

The performance differences are by far most significant in Chrome, where the use of "in" to validate if an element is present in the array is an extremely expensive operation. Introducing this check consumes the vast majority of time in the implementation. Note: this is not the same as an element being undefined, which is much cheaper check.

Interestingly, in IE 11 the performance is much more normalized, and it appears that the function binding (.call) is the most expensive incremental addition to the function, with the in being fairly cheap. IE 11 is also exposed as having poor JavaScript optimization capability, as even a tight naive JS loop is much much slower than a more involved native implementation, whereas in Chrome the opposite is true. I'm assuming Edge is much more performant but haven't tested.

So, even today, if performance is critical, it might be faster to hand roll a custom-tailored function as opposed to using the built-in natives, as the built-in natives may have special support (like filter does for sparse arrays) that are known to be not needed in your specific case. But as Knuth says, "premature optimization is the root of all evil", so always default to using the native functions unless you're sure you need something different.

JavaScript scoping strikes again

We have a for loop that populates an observable array in knockout. For some reason, only the first item started being loaded, even though we hadn’t made any changes to the loop, and all the data was still being sent down.

The loop starts off:

for (i = 0; i < ts.Weeks.length; i++) {

I stepped through the loop using Chrome debugger and found that i=0 consistently, all the way until the very end of the body, when we added the data object to the observable array:

self.Weeks.push(week);

At this point, it jumped to i=10 right before going back to the top. Of course, with only a half-dozen data items, the loop then exited. But, nothing else in the loop modifies i, and where was 10 coming from anyways??

Since the array was observable, when it was updated with push, knockout also called all the various “computed” functions. Also since the for loop variable “i” was declared without var, it was actually in the global scope! One of those computed (I didn’t figure out which one), or a library that they used, also had an “i” that was in the global scope, and the two variables were clobbering each other.

Although I didn’t determine the other perpetrator code, simply updating the loop, by adding var, to read

for (var i = 0; i < ts.Weeks.length; i++) {

solved the issue by ensuring the variable i was function scoped.

It’s easy to forget in languages like C# that don’t do global scope by default… JavaScript does do global scope by default!

Lightweight ASP.NET Background Processes

Imagine a tool which performs some long running background task on the server (on the order of several minutes) and wants to notify users of completion. This notification may take the form of an email, or it may be that the user remains on the website and receives a visual confirmation that the task is complete. The “industrial strength” solution to this is Signal R, which provides all sorts of client/server connectivity capabilities.

In many cases, this may be an excessive solution. Running a background task, with both backend and frontend notifications, is straightforward and doesn’t require a lot of wiring. This post shows one simple approach that will work with ASP.NET Web API or MVC.

Core Concept

Use a dictionary in application state to track background tasks and their status. Match each task to an ID for reportability.

This technique will not work if the runtime of the background task is too long compared to the application idle time-out on IIS. For this reason, the technique is appropriate only for non-essential tasks which don’t expect to run more than a few minutes.

Long Running Job Class

Initiating, checking, and maintaining jobs is handled by the long running jobs class. The class has static methods to create and check a job; and instance values of an ID (which can be sent to/from the client) and the task itself.

Non-static fields and methods are the easiest, primarily just wrappers around the ID and Task.

public class LongRunningJob
{

  public enum JobStatus { Running, Done, Failed }

  /// <summary>
  /// A unique identifier for this job.  Send this ID to the client, 
  /// and it may use it to query the status of the job.
  /// </summary>
  public readonly Guid ID;

  /// <summary>
  /// The actual background task.  Use a continuation to send email 
  /// or perform other server-side operations when the task completes.
  /// </summary>
  public readonly Task Task;
                
  protected LongRunningJob(Guid id, Task task)
  {
    ID = id;
    Task = task;
  }

  /// <summary>
  /// Returns a status based on the Task
  /// </summary>
  public JobStatus Status
  {
    get
    {
      if (Task.IsFaulted)
        return JobStatus.Failed;
      else if (Task.IsCompleted || Task.IsCanceled)
        return JobStatus.Done;
      else
        return JobStatus.Running;
    }

  }

}

The real work is done in the static methods which provide factory construction of a long running job, and status querying via the application state dictionary. The dictionary is accessed via a “Tasks” static property.

Starting a new job is done via the factory method:

/// <summary>
/// Start a new job.  The method provided will be placed into the task 
/// queue and may be started immediately.
/// </summary>
/// <param name="job">The method to run in the background</param>
/// <returns>An object containing the task and a unique identifier that
/// can be used to retrieve the job (and check its status)</returns>
public static LongRunningJob StartJob(Action job)
{
  Guid id = Guid.NewGuid();
  var task = Task.Factory.StartNew(job);
  var lrj = new LongRunningJob(id, task);
  Tasks.Add(id, lrj);
  return lrj;
}

Retrieving a job by ID queries the Tasks dictionary:

/// <summary>
/// Retrieve a job by ID.  If no matching job is found, 
/// returns null.
/// </summary>
/// <param name="id">The ID from LongRunningJob.ID</param>
/// <returns>The LongRunningJob, 
/// or null if no matching job found</returns>
public static LongRunningJob RetrieveJob(Guid id)
{
  if (Tasks.ContainsKey(id))
  {
    return Tasks[id];
  }
  else
  {
    return null;
  }
}

We define the application state dictionary of tasks via a property which instantiates it on first request.

protected static IDictionary<Guid, LongRunningJob> Tasks
{
  get
  {
    var dict = HttpContext.Current.Application["_LongRunningJob"] as 
      IDictionary<Guid, LongRunningJob>;
    if (dict == null)
    {
      dict = new Dictionary<Guid, LongRunningJob>();
      HttpContext.Current.Application["_LongRunningJob"] = dict;
    }
    return dict;
  }
}

Sample Usage

To demonstrate usage, we create a sample application which performs long-running jobs as sleeping threads of various lengths. Failures can be introduced by intentionally throwing exceptions.

Here is the sample Web API controller methods:

[HttpPost]
public Guid CreateRegularJob([FromBody] int seconds)
{
  var job = Models.LongRunningJob.StartJob(new Action(() => 
  {
    // TODO: some long running task
    System.Threading.Thread.Sleep(seconds * 1000);
  }));
  job.Task.ContinueWith(new Action<System.Threading.Tasks.Task>(t => {
    // TODO: send email notification of completion
  }));
  return job.ID;
}

[HttpPost]
public Guid CreateFailJob([FromBody] int seconds)
{
  var job = Models.LongRunningJob.StartJob(new Action(() =>
  {
    System.Threading.Thread.Sleep(seconds * 1000);
    throw new Exception();
  }));
  return job.ID;
}

[HttpGet]
public Models.LongRunningJob.JobStatus CheckStatus(Guid id)
{
  var job = Models.LongRunningJob.RetrieveJob(id);
  return job.Status;               
}

On the client side, sample jobs are created to POST’ing to the appropriate create jobs methods. Each ID is placed in a table, and a polling routine queries the status of each running ID each second until they complete or fail. The ajax calls return without blocking for the long running task to complete.

A frontend sample shows several jobs queued up to run simultaneously for various lengths:
longjobpoll

Future Work

  • Tasks can be extended to return data.
  • Old jobs should be removed from the dictionary to avoid memory leaks.

“Effective JavaScript”, a Review

David Herman’s book “Effective JavaScript” is written for developers who are already comfortable with basic JavaScript syntax and semantics, but want to learn the idioms (he calls it “pragmatics”) of the language. There’s an old saying, “You can write FORTRAN in any language,” meaning that developers who learn the best practices of one (possibly older) platform may not update their style with a newer platform and may simply to continue to code in the older style in a more (or differently) capable language.

Egregious examples of this weakness include non-object oriented style in C++ or non-functional style in C#. As an experienced C# developer who writes enough JavaScript to get by, and no more, I often worry that I’m “writing C# in JavaScript” (see also “How Good C# Habits can Encourage Bad JavaScript Habits“). Herman writes his book to developers in a similar situation.

The book is divided into a series of items, where each item addresses one facet of JavaScript that might be misunderstand, misused, or unknown to a developer inexperienced in JavaScript.

Most Interesting Items

Coming from C#, I found several items to be particularly valuable:

Item 13: Use Immediately Invoked Function Expressions to Create Local Scopes

In C# there are (somewhat rare) occasions where a variable needs to be declared inside a loop to avoid an incorrect reference. In JavaScript, however, two language features that conspire to make this difficulty substantially more dangerous: all “outer” variables are stored by reference in the closure, not by value; and JavaScript does not support block scoping. In other words, you can’t have a value type, and you can’t declare a local variable scoped inside a loop (or other block).

Since functions are the only scoping type in JavaScript, a workaround is to create and immediately execute a function to enclose the scope.

Item 18: Understand the Difference between Function, Method, and Constructor Calls

In C#, classes (as opposed to objects) are a real distinction, and a constructor is a specific “kind” of thing with special compiler treatment. In JavaScript, it becomes clear through exposure that functions can serve as classes (that is, be instantiated with “new”) but many of the subtleties are unclear. When is a function also a constructor? Since there are no classes, only functions and objects, what does “new” actually do? This item helps clear up some of these confusions.

There’s a lot that isn’t answered by this “item”, but that alludes to be answered to by future items (mostly in Chapter 4, “Objects and Prototypes”)

Item 21: Use apply to Call Functions with Different Numbers of Arguments

In C#, methods usually have a fixed number of parameters. A method can be made to accept variable parameters using a params array. Consider the String.Format method:

public static string Format(
	string format,
	params Object[] args
)

Such a method could then be called either with an actual array or with variable parameters. The “work” of allowing this flexibility is done by the method definition, in the use of the “params” keyword.

String.Format("{0} - {1} - {2}", someArrayOfThreeObjects);
// OR
String.Format("{0} - {1} - {2}", object1, object2, object3);

In JavaScript, the same capability exists, but this work is instead done by the caller using the “apply” syntax. An interesting comparison.

Item 36: Store Instance State Only on Instance Objects

What the author here shows as an “accident” could easily be reclassified as the JavaScript distinction between static and instance members. Instance members go on the instance. Static members go on the prototype. I suppose the distinction is only equivalent for fields/properties, not for methods.

Item 49: Prefer for Loops to for…in Loops for Array Iteration

C# programmers are very accustomed to using foreach loops for iterating over the items in a collection (such as an array). This naturally leads to the temptation to use the similar-seeming for…in loop in JavaScript to iterate over an array. Bad plan. Use .forEach method on arrays instead (in ES5).

Concurrency

The final chapter is on concurrency. Overall, this section made me appreciate the .NET Task Parallel Library and C#’s async/await so much more. It is also effectively exposed the dangers of trying to handle concurrency “on your own” without using appropriate high-level constructs.

Overall

An excellent review of JavaScript “gotcha’s” and best practices that might not be obvious to a developer approaching from another language, such as C# or Java. It’s a short book, a quick read, but worthwhile.

“Bolt-On” CSRF Protection in Intranet Web API Windows Authentication Scenarios

Web API is a powerful tool for constructing web services, but also for separating concerns of a web application. However, like any web application, security concerns must be addressed. One of the most common security concerns with authenticated operations is cross-site request forgery. In a traditional ASP.NET MVC application, we can use the built-in AntiForgeryToken mechanism to place a unique token within each page served to the client, and require that token be included in any requests back to the server. Since an attacker cannot access the contents of the actual served page, they are unable to acquire the token; and since the token is not a cookie or credential, the browser will not send it automatically. Thus, an attacking page cannot craft an effective CSRF attack.

When moving into the Web API world, things are a little more muddy. The general tack has been to take a different angle on authentication and authorization all together. For example, the use of certificates to sign each request, bearer tokens, oAuth, or similar approaches. However, for purely intranet applications, we may still wish to use Windows authentication. This means the end-user of a website which calls our API will automatically do so with the user’s windows credential. (It is also possible for server-side applications to use impersonation to call the API with a particular Windows service account, however, this is not subject to CSRF vulnerability).

There are also ways to make a joint MVC/Web API application use the MVC anti-forgery tokens, which is really neat, but only works if the API and application are one cohesive web application solution, as otherwise the Web API’s instance of ASP.NET would differ from the MVC instance of ASP.NET, and they would not share the set of valid anti-forgery tokens. It also doesn’t work if “plain” HTML/JS is being used to access the API.

However, by using Windows authentication, we open our API up to CSRF attackers. An intranet user may be using a web application which uses our API (and thus, carries their credential), while at the same time they are browsing evilattacksites.com which can construct a form with the intranet URL and post it to the API’s intranet address. This post will carry the user’s credentials with it, and execute a successful attack.

You might note that the API is on a intranet site and the attacker is (probably) external, so CORS might be suggested as an answer. However, CORS only controls what data can be read, it does not protect against CSRF submissions.

We want Web API CSRF protection that:

  • Works with any client (doesn’t require pages generated by MVC)
  • Works with Windows authentication/intranet
  • Is easy to “bolt-on” to existing Web API services AND clients

Closing the CSRF Vulnerability

The solution we will use is to provide a per-IP CSRF token that must be attached to the HTTP header and is validated on all POST/PUT/DELETE requests.

The technique here is to construct a message handler which will process all Web API requests before they go to the controller. It will validate the presence of a CSRF token when needed, and produce it when requested. Thus, there will be no changes to the controllers! The only server-side application change (besides importing the message handler), is to add it in the WebApiConfig Register method.

config.MessageHandlers.Add(new Infrastructure.CSRFMessageHandler());

On the client side, none of the individual requests to the API need to be altered. An initial login call (handled by the CSRFMessageHandler on the server) acquires the CSRF token and places it into the headers for all subsequent ajax calls. There will be no other changes needed to the client. However, clients which consume multiple Web API services secured in this way will have a more complex setup procedure. 🙂

$(function () {
   $.ajax("../api/login")
      .done(function (data) { $.ajaxSetup({ headers: 
         { "X-CSRF-Key": data } }) })
      .fail(function () { /* DO SOMETHING */ });
});

With no further changes, the API is now secure, and the clients receive and use the CSRF token to gain access. This protection can easily be “bolted on” to existing APIs and clients.

Implementation of CSRFMessageHandler

The real magic of this technique is in CSRFMessageHandler. This handler intercepts each call to the Web API before it is sent to the controller. There are two parts: the CSRF token verification, and the token generation.

First, the overall class structure of the handler. We prepare a RNGCryptoServiceProvider for generating the token, and extract the request context so we have access to the ASP.NET application state object, where the CSRF tokens will be stored (you could also store them in a database, or other repository).

public class CSRFMessageHandler : DelegatingHandler
{
   private static readonly RNGCryptoServiceProvider rng = 
      new RNGCryptoServiceProvider();
   public const string CSRF_HEADER_NAME = "X-CSRF-Key";      

   protected override async Task<HttpResponseMessage> SendAsync(
      HttpRequestMessage request, CancellationToken cancellationToken)
   {            
      var method = request.Method.Method.Trim().ToUpper();

      // Extract the context from the request property
      // so that application "state" can be accessed
      var context = ((HttpContextBase)request.
         Properties["MS_HttpContext"]);

      // PART ONE: On "login" request, create token.  
      // Put this here so no need to modify controller.
      ... see below
      // PART TWO: For update methods, enforce the CSRF key
      ... see below

      return await base.SendAsync(request, cancellationToken);
   }
}

When the client sends a “login” request (any request ending in /login), it will be intercepted by the handler and a CSRF token will be created and returned. If desired, you could check the dictionary to see if the client already has a key and reuse it. You could send back the username along with the key. You could set an expiration time for the key and make a process to remove old (expired) keys. Some additional improvement is definitely possible.

if (method == "GET" && 
   request.RequestUri.AbsolutePath.ToUpper().EndsWith("/LOGIN"))
{
   var keys = context.Application[CSRF_HEADER_NAME] as 
      IDictionary<string, string>;
   if (keys == null)
   {
      keys = new Dictionary<string, string>();
      context.Application[CSRF_HEADER_NAME] = keys;
   }

   byte[] bkey = new byte[16];
   rng.GetBytes(bkey);
   var key = Convert.ToBase64String(bkey); 
   keys[context.Request.UserHostAddress] = key;
   return new HttpResponseMessage(HttpStatusCode.OK) 
   { 
      Content = new StringContent(key) 
   };
}

For all modification requests (POST, PUT, DELETE), the handler will validate that the appropriate key is included in the headers.

if (method == "POST" || method == "PUT" || method == "DELETE")
{
   HttpResponseMessage response = request.CreateErrorResponse(
      HttpStatusCode.Forbidden,
      "POST/PUT/DELETE require valid and matching anti-CSRF key, use Login method");

   string key = null;
                
   if (request.Headers.Contains(CSRF_HEADER_NAME))
      key = request.Headers.GetValues(CSRF_HEADER_NAME).Single();

   var keys = context.Application[CSRF_HEADER_NAME] as 
      IDictionary<string, string>;
   string ipaddr = context.Request.UserHostAddress;
   
   // match to the key for user's IP address
   if (keys == null ||
      !keys.ContainsKey(ipaddr) ||
      keys[ipaddr] != key) throw new HttpResponseException(response);
}

“Class”-like Inheritance, Overriding, and Superclass Calls in JavaScript

JavaScript is a prototype-based language with objects, rather than classes, as its fundamental construct. However, the use of functions to approximate classes (complete with the “new” keyword) is very common. Although we’re often told that JavaScript supports inheritance, it’s not entirely obvious how the class-like inheritance works in a prototype-based language such as JavaScript. Adding to the confusion is various syntax approaches that may be used to devise objects and “classes”.

As an aside, the techniques shown in this post require ES5 support, which, for example, excludes IE 8 and below. There are fairly simple shims that can be used to enable support in older browsers.

We’ll first consider a “naive” approach to defining a class in JavaScript, then discuss why this approach isn’t suitable for an inheritance scenario.

function BaseClass(initialValue) {
  this.value = initialValue;
  this.compute = function() { return this.value + 1; };
};

This class seems to work fine:

var instance = new BaseClass(3);
console.log(instance.compute()); // outputs 4

However, this approach is not suitable for inheritance. The function “compute” (and any other functions defined in this way) will be defined not on the “class” (prototype) but on the object itself. When overridden, there will be no reasonable way to access the base function, because the override will actually replace the original function on that object. Thus, functions should be defined on the prototype, while fields are defined on the object itself. If you define a field on the prototype, it is like a static member in C#.

function BaseClass(initialValue) {
  this.value = initialValue;
};
BaseClass.prototype.compute = function() { return this.value + 1; };

This produces the same result as before, but now the function is defined on the prototype instead of the object, so it can be referenced by derived classes.

Now imagine we want to construct a derived class. There are two important considerations: ensuring that the super-constructor is called correctly, and ensuring that “instanceof” continues to give the correct result. There is no “base” or “super” keyword in JavaScript, so you must use the actual name of the base class whenever you want to refer to it.

function DerivedClass(initialValue, secondValue) {
  BaseClass.call(this, initialValue); // *** Call super-constructor
  this.anotherValue = secondValue; // Additional field
};
// *** Establish prototype relationship
DerivedClass.prototype = Object.create(BaseClass.prototype); 

// Additional function
DerivedClass.prototype.anotherCompute = 
   function() { return this.compute() + this.anotherValue; }; 

Now let’s look at the things we can do with an instance of the derived class. They should all work as expected.

var instance2 = new DerivedClass(4, 6);
// base method with base fields: outputs 5 (4+1)
console.log(instance2.compute()); 

// derived method with base method and derived field: 
// outputs 11 ((4+1)+6)
console.log(instance2.anotherCompute()); 

// outputs true
console.log(instance2 instanceof DerivedClass); 

// ALSO outputs true
console.log(instance2 instanceof BaseClass); 

Now imagine we want to create a derived class that overrides some behavior of the base class. First, we will create a class inheriting from the previous derived class, and override the compute method.

function ThirdClass(initialValue, secondValue) {
  // In this case, the "base" class 
  // of this class is called "DerivedClass"
  DerivedClass.call(this, initialValue, secondValue); 

  // no new member fields
};
ThirdClass.prototype = Object.create(DerivedClass.prototype); 

// override base function compute
ThirdClass.prototype.compute = function() { return 100; }; 

Let’s see this overridden function in action. The override also impacts, for example, the “anotherCompute” function since it calls compute. The overriding works exactly as you would expect as in, say, C#.

var instance3 = new ThirdClass(4, 6);
// overridden method always outputs 100
console.log(instance3.compute()); 

// derived method with OVERRIDDEN method (inside) and derived field: 
// outputs 106 (100+6)
console.log(instance3.anotherCompute()); 

// confirm that the compute has only been overridden 
// for instances of ThirdClass: 
// outputs 5, as before
console.log(instance2.compute()); 

One final task remains: can we override a function, and then still make a call to the base class version of that function? In this case, we’ll replace the definition of compute in the third class with one that overrides compute while also making a class to the base class definition of compute. This is where the technique for function definition (defining on the prototype vs. defining on the object) becomes critical. Since the functions are actually defined on the prototypes, we can call back to the base class prototype for that version of the function. Remember that “ThirdClass” derives from “DerivedClass”

// override base function compute, but still use it in this function
ThirdClass.prototype.compute = function() { 
  return DerivedClass.prototype.compute.call(this) * 5; 
}; 

Now we’ll retry the two instance3 calls from earlier:

// overridden method calls base implementation of compute, 
// which returns 5, and multiplies it by 5. output: 25
console.log(instance3.compute()); 

// derived method with overridden method
// (inside, which calls base implementation) 
// and derived field: outputs 31 ((5*5)+6)
console.log(instance3.anotherCompute()); 

We now have a reliable technique for constructing “class”es in JavaScript which fully support inheritance, overriding, and superclass calls in the way conventional from class-based object-oriented languages.

Remember, though, that this is “writing C#/Java in JavaScript” and should be used sparingly. Also, some authors believe implementation inheritance is usually bad even in languages which support it as a primary form of reuse.