Sunday, June 10, 2012

My first 4 days with gdal



I'm new to using the command line to get things done. Last month, I installed Lubuntu and a very small ssd in a very old MacBook because I wanted to save money and I've read a few places online the best way to learn how to code (besides spending the time to learn how to do it) is to use Linux.

This past week at work, I spent most of my time attempting to translate Esri to OGC (in Windows). I realize I'm no spatial Neil Armstrong, but nearly all of my in-depth GIS experience has been in the Esri context. This was the first GIS project I started where I had no idea where to begin (or even what the issues might be) and then, once I did start, if I would be able to successfully finish. Scary stuff!

I'm working on a project to scan, georeference, and serve over 1,100 historical maps of the San Francisco Bay Area. I need to create a workflow that interns and library student employees can learn and improve upon. Here's how I solved the problem of Esri not playing nicely with others:


  1. Georeference the scanned maps in ArcMap (the base maps can't be beat)
  2. Make sure to use WGS84 for the spatial reference; reproject raster tool for those that aren't
  3. Export the newly georeferenced map as a tiff file with world file (might need to find some more horsepower for a cubic transformation)
  4. Move the files over to gdal to create a real geotiff: gdal_translate -of GTiff -a_srs EPSG:4326 G4362_S223_1914_L3.tif G4362_S223_1914_L3gdal.tif
  5. Load the file into the Geoserver data directory, publish, then move on to create metadata
It took me faaaaar too long to realize step 2 was the key. Now that I've taken that one small step for (a) man, I'm hoping to get a real workflow together starting with a paper map and ending up with a georeferenced raster in GeoData@UC Berkeley.

Friday, June 1, 2012

Some day my data will link



On Friday, May 18, I attended the Northern California Technical Processes Group (NCTPG) annual meeting and presentation on linked data at the UC Berkeley Alumni House. I was impressed that NCTPG would host the three speakers: Walter Nelson, Phil Schreur, and Karen Coyle at Berkeley since I haven't heard anything about linked data efforts going on in the library. I'm hopeful this was the first in a series of ongoing discussions about what UCB can do to link our records.

First up was Walter Nelson who rocked the house. He spoke very candidly about how library integrated library systems (ILS) don't really do what libraries need them to do. Nelson discussed how library data is not different from other data on the web and how libraries' and library software vendors are unwilling to recognize this. In the end, he advocated for using Drupal, which was nice to hear. I did some looking around, and, of course there are RDF modules for Drupal 7. Some day, my data will link.

Second was Phil Schreur who spoke about the efforts Stanford has been making in working with European libraries and linked data. I don't remember what Schreur's official title is, but it's certainly something about metadata, a clear indication that metadata, not bibliographic data, is on the minds of Stanford librarians and administration. Some day, my data will link?

Finally, Karen Coyle showed the group some more practical ways to engage with linked data for libraries including the interesting Linking Open Data cloud diagram. Since I'm not afraid of linked data or non-MARC standards, Coyle's discussion was interesting and motivating, not intimidating. Some day, my data will link!

Overall, I'm concerned about how to move forward and link data. I'm working on a geoportal project that uses FGDC and a homebrewed schema in Solr. It's not linked data, but that's some of the only metadata I'm in charge of at the moment. My greater concern is how to persuade people that linked data is the future. It seems to be a proposition that one either inherently understands and can see the value of, or one that seems frightening, especially when compounded with the prospective time and money issues foreseen by those who control such things.

MARC is an OK output format, but it locks library data away from the way many potential library users look for data. When I find myself trying to explain to library users why they can't find material in our library by searching the internet, I have to stop short and admit it's a failing on the library's part, not the user's. It's for them that I will link my data (someday).