Transparency & Innovation: Open Data For Green Building
By: Bomee Jung
July 01, 2009
I'm not old enough to have enjoyed the first hey-day of energy-efficiency and alternative power back in the 70's and 80's, but I do love chocolate and have a vivid recollection of the classic Reese's Peanut Butter Cups commercials from those days. There were several variations, but basically, a person holding a chocolate bar runs into a person holding a open jar of peanut butter, causing the chocolate bar to drop into the peanut butter. They exclaim in dismay:
— "You got peanut butter in my chocolate!"
— "You got chocolate in my peanut butter!"
But, as the slogan goes, they discover that "two great tastes that taste great together", and candy lovers everywhere rejoice in the finding.
Not unlike the chocolate-peanut butter collision, two transformative movements of our time are poised to slam together into a concoction no less delightful than the Peanut Butter Cup (particularly to green enthusiasts of geekly tendencies): the Open Data movement and high-performance green building.
Smart Machines and Smart Data
This section gets a little geeky, perhaps even a bit Sci-fi, but bear with me.
You might think of the Open Data movement as a key piece of the next iteration (Web 3.0) in the evolution of the information technology that brought us email, hypertext and the World Wide Web (collectively Web 1.0), Youtube, blogs like this one, and iPhone apps (Web 2.0).
There are literally shelves of books written on this topic, but to boil it down to a few sentences: Web 1.0 arose from open access at the physical and networking levels (the laying of cables, and TCP/IP and an alphabet soup of its friends that allowed networks to find and talk to each other). Web 2.0 is about open access to Web-based applications (via Application Programming Interfaces - APIs) and some data via data feeds (the RSS feed of this blog, for example). APIs and data feeds allow folks other than those who came up with a particular tool (say, Google Maps) to tap into its capabilities and combine it with some other set of information (say, tweets) about the passing of Michael Jackson), and output something entirely new and unanticipated by the original authors (The Michael Jackson Tributes Twitter Map).
According to the gurus (and who's more guru than Tim Berners-Lee, the "father of the World Wide Web"), Web 3.0 will be about getting machines to smarten up and give us useful, actionable information by "understanding" vast amount of data and how they relate to each other.
As much as I love my computers, they're really not all that smart at the moment. When I google "building", for example, what the computer "sees" is 'b-u-i-l-d-i-n-g'. It doesn't know if I mean the verb or the noun, and if I were to say "green building", it doesn't know that those two words together connote a particular paradigm in construction and management. When it shows me ads on LEED exam prep courses, that's because a person somewhere told it to display those ads when the text "green" and "building" appear next to each other in a search request, and if the top search result happens to be USGBC, it's only because that's the one most clicked on by other people who searched for "green building".
The amazing thing that Web 3.0 (or Semantic Web) proposes to do is to let computers "understand" meaning, not just "see" text. In the 3.0 world (year after next?) the computer would know that "buildings" are things that take up space and have attributes that distinguish them from other things that take up space, like cats or shoe boxes.
When my iPhone gets really smart and Semantic Web-enabled, it ought to be able to deliver user-centric, context-sensitive information, doing all the tedious correlation of information behind the scenes in its little machine brain, without any help from me, the user. For example, when I land in Almaty, Kazakhstan, it could present me with a map of the high-performance green buildings within walking distance of my hotel, show me who designed them, and let me send dinner invitations to the ones who aren't Sir Norman Foster.
When my iPhone goes looking for green buildings in Almaty, it needs to be able to access a source of data that identifies itself as representing buildings in the real world and gives the appropriate amount of detail so that the iPhone can figure out that I might be interested in it. Actually, today's machines are plenty smart. What's needed for Web 3.0 is smarter data, and a whole lot of it. For example, a publicly accessible semantic dataset published by the local university on the Almaty building stock might include an entry like:
<building> <building:name>Almaty Twin Towers</building:name> <building:heat-index>3 BTU/SF/HDD </building:heat-index> <building:designed-by> <person> <person:name>Norman Foster</person:name> </person> </building:designed-by> </building>
Thus, when my iPhone finds this bit of information out in the Web, it can tell that the item relates to a Building, that it has a name, a heating performance index that is very low (and thus "green"), and that it is associated with an item of type Person, which in turn has a name of 'Norman Foster'. Let's assume that somewhere out there is another document that describes what attributes items of type Building or type Person can have, and that the iPhone knows to go looking for more Person information, like Sir Norman's email address. It's this ability to relate text to things and things to each other that makes Web 3.0 smart.
As you can see, for this kind of scheme to be useful, you're going to need a lot of data that is capable to telling machines what they refer to in the real world, and a lot of relationships linking one item to another. This is where the Open Data movement comes into play. As some clever contributor to Wikipedia phrases it, the Open Data movement is "a philosophy and practice requiring that certain data are freely available to everyone, without restrictions from copyright, patents or other mechanisms of control." Here are the principles of Open Data in pithy verse form, thanks to Frank Hebbert of the RPA
Open data should be COMPLETE! Get that info out to the street.
TIMELY data is useful data. Think about sharing it sooner not later.
PRIMARY data are disaggregated, and have infinite potential to be re-tabulated.
Make your data ACCESSIBLE! MACHINE PROCESSABLE!
SHARE it baby! Be UNPROPRIETARY!
LICENSE FREE does not have to be scary!
There's a lot packed in there, but let's just sum it up by saying that the point of all this is to make available data that can be re-used in ways unanticipated by the original owner of the data, and that we want machines to be able to do the hard work, so that people can have real-time, context-sensitive information tailored to their specific needs.
Does this all sound a bit far-fetched? You may be surprised to know that our Federal government, and Obama's Chief Technology Officer Vivek Kundra, is already taking the lead: As of June 2009, Data.gov, is providing "public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government", including the Toxic Release Inventories from the EPA, H1N1 flu information from the CDC, Project-based Section 8 information from HUD, and hundreds of others .
To its credit, New York City is hot on the heels of the Feds and could become one of the first cities in the country (after Washington D.C.) to adopt a mandate for open data standards for public records. Also under consideration is Intro 476-A, a part of the package of four green building bills announced by Mayor Bloomberg on Earth Day.
So... What Does This Have To Do With Green Buildings?
Everyone's top 10 list of why don't we build more green buildings has on it some version of "we don't have enough data about green buildings". What data do we need? Why do we need data? Who will benefit? The short answer goes something like this: Knowledge makes profits, but sharing makes markets.
First, what kind of data are we talking about? Let's start with energy use in buildings, though you could just as well say empirical measures of indoor environmental quality, or measurable health impacts like number of asthma free days. Energy use should be easy because we already measure it and store it — or at least our utilities do.
One purpose of having publicly available energy performance data would be to verify performance claims. Today, for instance, "green buildings" are those designed or rehabilitated with the intent of achieving high performance, whether or not they meet those performance goals once they are occupied and operating.
Typically, the the incentive programs and certification schemes like USGBC's LEED , Enterprise Green Communities and many others use models to compare the expected performance two versions of a building: the one that could built according to code and the one as designed by the development team. This is a critically important exercise in designing for energy efficiency, but lacking empirical post-occupancy performance data, claims made about performance can be misleading.
Take, for example, a fictional building called Efficiency Place, a Class A office building that claims to be 25% more efficient than comparable buildings. You might imagine that someone out there has a large set of data about the energy performance of Class A office buildings, and in fact, someone does — it's the Department of Energy's Commercial Building Energy Consumption Survey (CBECS), the data set that underlies the EPA's Portfolio Manager benchmarking tool . If Efficiency Place is making the claim based on past usage input into the Portfolio Manager calculator, then the 25% claim would be a fair and accurate one. If, however, the claim is based on comparing design-stage energy models, then a perfectly accurate statement would have to say something like "Efficiency Place was modeled to perform 25% better than if it had been built to meet code minimums, and actual performance may be better or worse than the model — no one can possibly know until we benchmark the usage at stable occupancy".
Ultimately, the data sharing question goes to the heart of what we mean when we talk about "green buildings". If you want "green buildings" to mean buildings that demonstrate (not just shoot for) high energy and environmental performance, you need to know two things: first, how other "like kind" buildings perform, and second, how the particular building performs. Knowing both these things will entail gathering and making publicly available empirical data.
The benefits of verifiable claims are obvious: First, just as the buyer of a used car would want the vehicle's history to inform how much money she's willing to pay for the car, potential tenants, buyers, or underwriters who are concerned about the impact of a building's energy performance on the bottom line would benefit from a verifiable performance claim. Indeed, this may be happening in places (like Europe) where performance disclosure is required at the time of property transfer. Second, if you are a portfolio owner of properties, knowing how your buildings are doing in comparison with each other and with other like buildings will allow you to make prudent decisions about where to invest in improvements. Third, if you are a public body that sets goals and monitors progress toward energy performance improvements, you need this data so you can set ambitious but achievable targets that ratchet up over time as improvements are realized.
Consider the proposed bill for benchmarking New York City's buildings, which would require buildings of 50,000 gross square feet or bigger to submit annual benchmarking information using a city-specified benchmarking tool. The city would then make the outputs of the benchmarking tool publicly available. This is a wonderful first step toward realizing the benefits of publicly available building performance information. It falls short, however, of achieving true Open Data status. Recall from above that Open Data entails data collected at the source of the information with as little aggregation and modification as possible, made available in formats that allow automated processing. If the information that is made available is limited to the outputs of the benchmarking tool (i.e. energy per square foot rather than total usage and gross square feet as separate quantities), any future analysis that requires knowing the values before calculation by the benchmarking tool would be hobbled.
If we have the tools to compare a particular building against other like buildings, why do we need the raw data? The act of aggregating and processing raw data into more generalized forms entails making assumptions about how that data will be used. Raw data is desirable specifically because it's assumption-free and easily manipulated to serve new, interesting analysis.
Take, as an example, a bank that wants to offer a building performance-tied loan product, one in which the availability and terms of the loan are directly influenced by the energy performance of a building. This bank must design an underwriting model and origination and servicing protocols that enable it to assess and price the financial risk associated with this product. In developing this framework, the bank will need data to feed into its model that allows it to slice and dice information about the anticipated energy performance of a building to suit its needs, and to do this, it may need to aggregate information from several sources.
Let's say the bank wants to know what reasonable expectations for performance improvements may be for retrofit projects addressing single pipe steam-heated residential buildings of 40 units. It may need to pull together information about building attributes from Department of Buildings and property tax records, match it against energy use data from the utilities, then filter that to find the projects that have done retrofit projects so that you have a control and a post-retrofit population. If the bank can access raw data, it can apply its own assumptions and arrive at conclusions that fit its needs. If it can access only information interpreted for other purposes, its efforts to provide a product may be stymied by the need to collect its own data from scratch.
The goal of Open Data is to enable innovation: to make available data that can be re-used in ways unanticipated by the original owner of the data, whether it's an iPhone app for locating green buildings or underwriting models for new financial products. It's anyone's guess to what uses publicly available data — collected as close to the source as possible, and made available in formats appropriate for automated processing — will be put. I would argue that if you let the data roam free, motivated souls will create products and services to make use of these resources. Given the pressing need to rapidly scale up our efforts to create sustainable urban environments, we need all the innovation we can muster up.
In 2002, Bomee Jung founded GreenHomeNYC, a non-profit that connects NYC residents with local experts and actionable information to help them improve the energy and environmental performance of the city's homes and buildings. She is currently the Program Director for Green Communities in the New York office of Enterprise, a national leader in greening affordable housing. Before turning her attention to green building, Bomee developed Web-based applications.