Improving Forecasting Through Ensembles

There’s this wonderful article I want to share on building prediction models using ensembles. “Ensembles” in this case simply means the combination of two or more prediction models.

I’d personally had great success bringing several (relatively) poorly performing models together into one ensemble model, with prediction accuracy far greater than any of the models individually.

Definitely something to check out if you’re into this sort of thing.

The net is set for the fish

The following passage is taken from the beautiful book  Master of the Three Ways by Hing Ying Ming (which libraries might classify as “Eastern Philosophy”):

The net is set for the fish,
But catches the swan in its mesh.

The praying mantis covets its prey,
While the sparrow approaches from the rear.

Within one contrivance hides another;
Beyond one change, another is born.

Are wisdom and skill enough to put your hopes on?

Just a little reminder for my future self on the uncertainty of life (r-squared never is 100%).

Update: For the uninitiated, my comment on “r-squared”  above was just a little statistical quip. R-squared is a number between 0 and 1 that represents the amount of variability of a linear model, in percentage, that can be explained by the model. Anything outside of r-squared, so 1 less r-squared, is uncertainty.

Tell me what you want to see

Caught this magnificent optical illusion on kottke.org today. I’d say that is definitely  this is worth a minute or two of your time.

Was in my “data” frame of mind when I watched this, and couldn’t help thinking that this is exactly how data works: control the content, control the angle (i.e. perception), and you can make a square block look like a cylinder.

Expensive Software and Consultants

They took our data, ran it through their software, and they got the answers that eluded us for so long.

I was told they were a big consulting company, which meant they probably had great, restrictively expensive software that could do the job. That’s why.

But I don’t buy that argument.

Great software needn’t be expensive.

I’ve lived and breathed great open-source, free technologies growing up. Linux; Apache; PHP; MySQL; WordPress; Python; R.

Are any of these free technologies inferior to their paid counterparts? In development (including data science) work, I don’t think so.

So why were they “successful”? Why could they come up with an answer we couldn’t?

My guess: they were a consulting company with less vested interest.

They came up with an answer. But would it have been better than the one we would have come up with if we were in their shoes? I don’t know.

As a consultant I’d have been much more liberal with my analyses. No matter how badly I mess up, the worst that would happen would be that my company would lose a contract. And chances are good I could push the blame to the data that was provided, or having been provided the wrong context, or information that was withheld.

When you’re part of the company, you have far more vested interest. Not just in your job, but your network, both social and professional. Consequences extend far beyond they would if you were an external consultant working on “just another project”. I’d be far more meticulous ensuring everything was covered and analyses properly done.

 

Business Implications of Analysis

“And,” she said, “we found that the more rooms a hotel has, the higher the positive rating.”

I was at NUS (National University of Singapore) in my Master’s class — listening to my peers present their analysis on the relationship between hotel class (e.g. budget, mid-scale and luxury) and the ratings of several key attributes (e.g. location, value, service) based on online reviews.

By now, having been through ten presentations on the same topic in the last couple of hours, it was clear that there was a link between hotel class and attribute ratings: higher class hotels tended to get better reviews.

But something was missing in most of these presentations (mine included, unfortunately): there wasn’t a business problem to be solved. It was simply analysis for analysis’ sake. Through it all I couldn’t help but think, “so what?”

So what if I knew that a budget customer’s standard of “service quality” was different from that of the patron of a luxury class hotel? So what if I knew that economy-class hotels didn’t differ from mid-scale hotels but differed with upper-scale hotels? So what if I knew that hotels with more rooms tended to have more positive reviews?

(And on this last point, it was a rather common “finding”: it was found that hotels with more rooms tended to have higher ratings, and presented as if if you wanted higher ratings, you might want to build hotels with more rooms; the problem of course is that larger hotels with more rooms tend to be of the higher-end variety; budget and independent hotels tended to have fewer rooms. Would the business implication then be that even budget hotels with more rooms will improve their ratings? Probably not.)

In the end the 15 presentations or so that we went through just felt like a whole lot of fluff. Sure he analytical conclusions were technically correct; statistically  sound. But so what?

It reminded me that you can be great at analysis, but without an understanding of the business, without a mindset of constantly questioning “so what does this mean — what are the implications on the business?”, all your analytical prowess would be for naught.