Caswell.org

XML - Creating Order Out of Chaos

Relational databases have rigid structures. Indexes allow the computer to rapidly filter and sort the data by limiting the information it processes. Database designers work hard to make sure that the right fields are indexed to return results quickly for common searches. They also make sure that data are grouped correctly into tables to reduce the storage space needed by eliminating duplication. This process of normalizing the data prepares it to work well with the computers' style.

But does that style work well with messy data? What if you have differing amounts of information for each record? What if some records include information on details that others do not? In a database, you have to create a field for each type of information you might want to store, and leave those fields blank when you don't have it. But if those fields are numeric, how do you differentiate between a zero, and a blank? Many databases fail to account for this, and as a result statistics produced from them are wrong. Also, users who see a field exists, and search for data on it, may get the wrong impression when the results returned include a limited set of records. Experienced database designers work with users to eliminate fields that won't be consistently populated with data.

This is where XML solves the problem. XML stands for Extensible Markup Language. It is a relative of HTML, the basic language of the World Wide Web. The key is the extensible part. In XML, if you need an extra field sometimes, you just add it. You only populate the data with fields you have. Missing data leaves no gaps. Each record can posses a different quantity of fields. While this method of storing information requires a lot more computer space and processing power, the flexibility is much better suited for messy data.

The definition of vacancy can mean anything from physically empty, to not currently competitively marketed for lease.
Exchange rates for currency conversions are reported as of a certain date. Comparisons using different conversion dates need to adjust for the rate changes.
Aggregating rounded observations increases the rounding error.
Averages for variables like rent should be weighted by total available space rather than a simple average.
Watch for posted revisions to reports.

There are as many mistakes to make in reporting data as there are methods to compute it. Frequent turnover in the staff hired to do the job contribute to the problem.

You need to make adjustments when combining data from different sources.

These adjustments can produce plausible data series, or result in even more errors, so be careful. A checklist is recommended.

The best solution for anyone truly interested in understanding the markets is to subscribe to long term, cross market data from a source such as PPR, TWR, REIS, RCA, or others. Each takes the time to cleanse the underlying issues and produce outputs that are usable in your own models. Many also provide econometric forecasts that tie the CRE variables to macro-economic forecasts from Moody's Economy.com or Global Insight. Some users complain that the data do not represent what they see on the ground in a particular market, but these differences are usually accounted for by the processes taken to assimilate each market into a consistent set of practices needed for accurate cross market, long term comparisons. The biggest hurdle for most users is the high cost for accessing these sources.

Another approach is to access the property level data directly. You could create your own survey of individual properties, contacting building owners and leasing agents regularly for updated data on availability as well as keeping up with new construction and zoning changes with data from First American CoreLogic or directly from the counties. Prepare to spend about twenty minutes per property per quarter. Or, you could subscribe to CoStar, Xceligent, or other property level data providers. This too is expensive but depending upon the size of the area you want to cover, may be less expensive than doing the survey work yourself. Here though you will run into problems in trying to export the data to work in your own databases and spreadsheets. Some of the sources contractually and practically prohibit exporting and storing the data and limit the derivative works you can create from it. Additionally, few of the property data sources have covered many markets prior to 2000. Since 2000 we have experienced a full economic cycle. Forecast models are more reliable when inputs span more cycles.

<?xml version="1.0"?>
   <Record>
    <Name>Bill McNasty</Name>
    <Company>Acme Inc.</Company>
    <Title>Associate</Title>
    <Birthday>1966-05-15</Birthday>
  </Record>
  <Record>
    <Name>Suzie Sunshine</Name>
    <Company>Sprockets Co.</Company>
    <Title>Vice President</Title>
  </Record>
</xml>

In the simple XML example above, one row has a birthday field (in red) while the other does not.

Messy data requires experience to manage into reliable answers. One of my favorite sources for economic data is the Blue Chip series which compiles economic forecasts from different sources and creates a consolidated, consensus view. This provides a less biased forecast. Since economic forecasts can be somewhat of a self-fulfilling prophecy, getting this type of answer is the best method of sensing what the market sees. A similar approach to CRE data is equally useful, even though it takes a lot of work and know-how to complete. Luckily for you, I have done the work and have the inside experience on how the data are collected, prepared, and released. Thus, I can save you time and produce a better result. Let's talk about what you want to better understand. I can help you get there a lot quicker and avoid costly mistakes.

Continue:

Page 5