Tuesday, March 30, 2010

Datamining is the MVP of the future

Data Mining is the process of extracting patterns from data (Go, Go, Wikipedia!) and I firmly believe that being a Data Miner is the Most Valuable Profession of the future.

I bring this up because I watched a Ted presentation from 2006 last night by Hans Rosling on the topic of Social Health Statistics. This Ted talk is the best presentation I've seen in forever. I'm still puzzling over whatever gorgeous program he used for the graphics that did such an excellent job conveying information instantaneously with very little explanation. The two points of the talk that are absolutely required are at 3:50 and 11:30. Rosling stands in front of the graph while it is animated and gestures at the data points explaining why they are moving in the direction they are moving. Later in the video, he states that his interpretation of the data is made possible because the has been aggregated and formatted in a way that it is easy for humans to grok.

What is marvelous about this talk is that, in 2006, Rosling speaks of Data Mining without referencing it by name. He ends his talk by saying we need a 'garden' of interfaces to the vast amount of data we've been hoarding like misers against a time when we will actually be able to use and understand it. Our recent history is boiled down to data points, numbers and keywords, and stored in banks within organizations who, as Rosling says, have an attitude like the Head of UN Statistics he mentions. These organizations say 'we can't do it', at least as of this talk in 2006, but the option is there for others to try.

Data Miners are already digging fast and well, trying their hand at these banks of information, and it is only getting more prevalent as 2010 flows past. Data Mining, the type that shoots for human readability, combines art and statistics while providing relevance and suggesting relationships. This budding profession requires someone who has a notion of presenting information in understandable ways as well as someone who can wrangle the analysis required. Most of the data representation ends up in a graphic, if not a graph, and if it moves or is interactive, so more the better. Data Mining leads to Chart Porn. (Totally Safe-For-Work despite the name!)

Sites like Data Mining, Strange Maps, Information is Beautiful, and Weather Sealed, present data in visual form, extracting meaning (or at least interesting relationships) out of meaningless heaps we've saved just in case we might need them. Additionally, ever more specific datasets can provide very specific information about just how broad or narrow certain trends can be. For WoW geeks like me, there is Armory Data Mining, which mines through the huge database called the Wow Armory that Blizzard provides for public access.

To rephrase the earlier definition, Data Mining (and subsequent Chart Porn ^^) is the process by which we take large quantities of data and poke at it until it makes a picture.

Science Fiction has been pointing at this idea and jumping up and down about it for years. The books/series that I recall off the top of my head that mention the idea of Data Mining and the representation of data for easy consumption are Otherland by Tad Williams, and Gun With Occasional Music by Jonathan Lethem. In each of these books, they contain the seed of an idea in which data (news, in this case) is aggregated in a visual or aural fashion, through what I assume is a process very like Data Mining, and presented to a character.

Otherland is especially interesting because Williams uses the same metaphor that Rosling does in his Ted presentation, a garden. This garden has roses and weeds, and each plant and flower is weighted by so that color, health and other variables to correspond to frequency, reliability and whether or not the observer considers it a positive or negative trait.

Gun With Occasional Music takes a different route, as a science fiction dystopia that extrapolates from a point where visual media never really took off, and a morning symphony provides the daily news. A low, ominous tremolo suggests violence and percussion suggests murder, which sparks the main character to pick up an investigation. Both of these representations utilize Data Mining for information presentation, except they pull from a constantly changing dataset.

With these representations comes that the idea that any representation of sufficient complexity could be utilized, especially to assist in keeping the differently-abled up to date using sound, visuals, numbers, and patterns. Humans are wild awesome at finding patterns and connections even where none exist.

Designing weighted patterns to put value and meaning in context, Data Miners provide one of the most valuable tools that our future has. With the volume of information at our fingertips getting more and more overwhelming, another layer of abstraction is absolutely necessary to be able to see the big picture. The 'big picture' requires us to zoom out, and the bigger the picture the further out we must go.

The Wikipedia article on Data Mining does not mention this sort of 'social' application and sticks mostly to business and science. However, Google proves that the social applications are already being explored. Google's algorithms operate on the basic assumption that you can mine almost any dataset for relationships and outside of their intense search algorithms, their Google Reader has a feature where you ask it to find interesting things based on what you've already said you liked. Similarly, most music databases can stream radio for you based on a 'seed' song or artist because it has gathered data from others who like the same things that you do. Data Mining as a concept has been around since the 80s and is being applied everywhere in small ways now that the internet has provided access to data to work with.

But what is it GOOD for? How is this useful other than in a 'information is fascinating' and beyond 'big picture'?

One limited example is that Data Mining can be coupled with sufficiently complex Artificial Intelligence to look for flagged relationships within datasets. Credit card companies and other large corporations that deal with fraud in billing look for things like 'Purchase in Houston', with an immediate 'Purchase in Anchorage', and then again a 'Purchase in Houston'. The AI needs to be able to discern whether or not that is suspicious behavior (maybe they're local business selling through the internet?) and if someone's card needs to be frozen. My mother once got a call querying her if she'd used her credit card for gas. She had the previous week, for the very first time, and it was atypical enough that the credit card company gave her a courtesy call to make sure her card hadn't been stolen. Even ten years ago it was not usually people flagging these instances. AI Data Mining of this sort is worth hundreds of thousands of dollars each year as it cuts down fraud.

Another example of what Data Mining is useful for comes about through Facebook, Myspace, and other high-volume social networks which have huge marketable databases. However, how ethical is it for the company that owns this database full of such an enormous volume of personal information to sell it to marketing firms or other places who might want to peek at all that delicious data? Not only does Data Mining present opportunities for marketing, but it also brings up huge glaring questions on how this information can be used while still protecting the individuals who provided the information.

I'm of the opinion that some data is meant to be mined, as if someone brought a dump truck of ore to your foundry and said, "find me something useful!" As a caveat, however, I think that the data should have been collected specifically to be mined, like the US Census and the daily news. With consent being the biggest factor, I believe that personal information should be protected as a type of media.

Professor Lev Manovich of the University of California, San Diego, suggested in 2001 (Nearly a decade ago!) that Databases are a Symbolic Form. In other words, he posits that databases, because of their unique structure without a beginning or an end, should be considered a new form of media. In this, I use the word database rather than dataset, primarily because database includes some element of structure and meaning to the collection of data and can include objects instead of datapoints, while dataset is a more mathematical term where the data within the set must have meaning applied.

Traversing a database, then, according to Manovich, is a non-linear way of navigating our shared experience. Where a story has a narrative and a photograph encourages movement of the eye to each point of interest, a database uses a webbed approach to convey information driven more by the user than the artist who created it. As anyone who has lost hours on TvTropes, Youtube, or Wikipedia could tell you, this type of presentation - a media full of other media, a meta-media - can be both edifying or a total waste of time. Manovich ends his essay by stating that there is nothing inherent within a database that 'fosters a narrative', but there are hundreds of thousands of narratives - a linear path compatible with human interpretation - lurking within any one database. These narratives, which he also refers to as interfaces, are the 'garden' of Rosling's Ted talk. Manovich talking about how the seeds from which the flowers grow are a type of essentially different new media.

With the idea that databases are a new media comes the idea that they can be treated ethically as media. In this vein, databases and datasets should be subject to a new form of copyright law incorporating basic rights of privacy wherein the individuals providing their information have the right to withhold it from the repository. If by simply living we are constantly creating and generating information for the consumption of others, then we should be able to hold the copyright on those creations like we do other types of created media.

With these concerns in mind, the internet and its ever-growing network of fascinating information - mostly unintentional and full of meaningful implied spaces - now gives us the capability of taking a stab at something very much like Asimov's Psychohistory to predict what the general trend of events. We're approaching the point where everyone's personal data on the web, tracking their online lives and the gaps representing their offline lives, could feasibly be used to predict the future the same way we predict the weather. Google searches can already anticipate outbreaks of the flu and it is only a matter of time before judicious Data Mining turns up other trends and behavior we can identify as important.

We desperately need this type of data acquisition and make-sensey-ness in every field under the sun. The social sciences, government, business, medicine, education, computer science, news, the internet, and I need data mining. And, really, how can you say no to the internet?

This will be a baseline skill required everywhere once people figure out that this IS a skill, an art, and a profession. It is a field that requires new, developing structures and a way to compartmentalize so that people who do not know HOW this works still know they can hire a Data Miner to perform crucial functions. Even with an official title and trying to wedge themselves into other professions while requiring similar skillsets, Data Miners will be hired to the AI/Fraud department of credit card companies, to high-profile research projects that need a code monkeys(for science!), to marketing companies hoping to pinpoint their demographics by prodding twitter, and to news outlets to produce pretty graphics for multimedia news presentation.

We need this. It is a skill and a profession that should be taught as such. Just as there are programmers, researchers, artists, and statisticians there should be data miners who incorporate elements of each of these other professions and combines them into a field the future will find essential.

Friday, March 26, 2010

Computer Science and Institutional Sexism

My senior thesis was a rather pathetic grasping towards a concept that never ended up fully coalescing over the course of my final year at school. What I wanted to write and what I was writing were so completely different that I gave up trying to reconcile the two.

To make it clear: I'm very bitter about my actions during this period and am frustrated at my pure, unadulterated failure to produce something worthwhile.

The man who mentored me was a typical thesis adviser; Distant, busy, and overawing. Credited, he tried to steer me out of my directionless flailing, but the suggestions only heightened my stubbornness (Woo. I love it when my Contrariness kicks in.) and left me scrounging for data and digging through research barely related to my desired thesis. The paper ended up being a short, thinly-veiled rant about my educational experience.

As an example of how vague I ended up being was how dealt with the question, "Well, WHY do we need more women?" I had a proto-answer, but it wasn't clear, firm, or easy to articulate. Of course now, when it no longer matters for my thesis, I've got an uncomplicated answer that I firmly believe in. We need more women because men don't have a monopoly on good ideas. Every individual able to contribute means a bigger pool of good ideas that might solve problems in ways that someone else might never think of. The top 1% of a larger group is a larger pool of talent to draw from. With the future in the hands of technologists, we need all the pool we can get.

That I could not answer that question says a lot about how hard I worked when I was writing my thesis. The results of my flailing were that I had no thoughts on curricular improvement, nothing to show for my research except an ambiguous hand-waving solution of 'teacher reeducation' (to what purpose I could not say) and the goal of further data acquisition. Not my finest hour. I admit that I was lazy - pure and simple - and there was so much more I could have done.

Coulda-woulda-shouldas aside, I have been continuing my research in a haphazard fashion since I graduated. I can only imagine that this preoccupation is an attempt to salve my bruised ego by rectifying a failure, if only in my own understanding of the topic.

My original concept of my thesis was this:
"Women - as a group - have such a goddamned hard time forcing themselves over the learning curve because the theory and books and logic for most institutionally taught CS were written for men, by men."

Self-indulgent, pretty narrow, and not precisely true. With the proper background information and framework around it, it would be a fascinating thing to study in and of itself, but I had neither background information nor framework to even contemplate a study. Before, my understanding of this topic was just some nebulous 'wait- I think... er-' and there was no supporting evidence. I did find papers about the physical and mental differences between genders, some interesting research about spatial vs. relational learning, and gender differences in how students connect to material, but I had no idea how to use the information in them to clarify my point.

My thesis was rooted in the disconnect that I experienced in what I wanted out of my CS degree and what was available to me, similar to how the thesis I wanted was not the thesis I wrote. Because of my lack of knowledge, I conflated my disconnect with the constant sexism I experienced.

I would like to believe that attributing my frustration with the structure of CS to gender is understandable considering that, currently, CS curriculum is directly related to CS culture, which is festooned in sexist trappings. However, curriculum can be separated from culture on a structural level, if not an individual level, so the tendrils of sexism can be severed, with time and effort, once they're identified. A sexist culture can be hard to change, but modifying the method of transmission can make it equally hard to perpetuate.

I've been to Grace Hopper all of once, and at the conference I don't remember hearing the world sexism. It is very possible that the conference did tackle this topic over and over again and that I was still self-absorbed enough to ignore mentions of sexism and feminism 'cause they were scary *isms. I do remember, however, that I was very focused on my thesis at the time and I went to every panel/discussion/lecture even remotely related to how to improve CS so that women were more comfortable there. No mention of feminists and the substantial work done to chart and define sexism.

Instead, we talked about how rough it was to be part of a work community because we're disrespected, how important it was to educate girls early to show that science was cool, and what kind of effort we put forth to support women who don't leak from the pipeline. Every single inspirational story had a woman kicking ass and taking names and conquering a male field because she was the BEST at what she did. I am in awe of the sheer tenacity and obstinacy it took for every single one of the panelists or keynote speakers to become the amazing women they are today.

Sexism, if it was mentioned at all, was never discussed but obliquely even though every topic was related. Since I'm oblivious to reality, I never picked up on it. (I am pretty oblivious most of the time.)

Women working in Science, Technology, Engineering, and Math (STEM) deal with carving out a space and getting results through action. Every counter-move is a practical, algorithmic approach to systematically subverting sexism. Without referencing feminism, however, there is a significant lack of shared definitions except for the above-linked 'pipeline'. Since there is such a focus on the 'How to Fix' - we are scientists and engineers, after all, and sometimes the stereotypes have the ring of truth - there is very little emphasis on 'How to Talk About It'. It's hard to come up with feminist language without the benefit of a feminist vocabulary.

All I had was the everyday, ongoing, low-grade, and omnipresent enemy that I had no name for, while the people on the other side of campus were talking about *isms and the history of *isms and how they're viewed today. Every thought I had that touched upon sexism was along the lines of, 'My goodness, I'm the only girl in my class and my teacher is surprised and gratified that I'm not asking him to do my homework for me'. Or, 'we lost one of the three freshman girls this year'. There were no words for this, only an unease that had no outlet other than to prove that I can do it just as well as the boys can.

I had to wrestle with how little I cared about my own degree, bogged down by being financially unable to justify a switch out and no longer having a passion for the field. I wish I could say that my fire was never smothered, but it was. Thoroughly and without conscious malice.

I even envied some of the foreign students in a passive, frustrated fashion. English was their second language and if something was lost in the translation it wasn't a fault of their gender, but merely that they hadn't learned that vocabulary word yet. My failures were blamed on being female; my successes were blamed on being a fluke of my gender. When I was not feminine, I was intelligent.

There was no crossover between the floofy feminism of Women's Studies - the only view I had of Women's Studies and feminism. Convenient - and the sharp discrimination that I had to deal with on a class-by-class basis. It lurked, silently underneath the curricula and the assumption that I would treat the discipline how it was meant to be treated according to the culture established by a very small slice of tinkerer-type men. I spent my share of 3am lab nights in laughing camaraderie, always with the underlying knowledge that it was /3am/ and I was on campus and, at some point, I would need to walk home in the dark, alone.

There was no crossover. No way to explain even to people who knew about feminism that this is what was going on for me every single day that I walked into the classroom. I could share my experiences, but there were no words to describe why it was happening and no solutions but to tough it out. All the ones who made it have run the gauntlet before I did and I wanted to be one of those who made it.

In the end, I didn't quit. I gave up instead. I rejected both the box and the labels. One one side of campus, coasted through a theatre double-degree with great interest and little effort, acing most of my classes. On the other side of campus, I struggled to care in most of my CS classes and walked away with 'c's. Not as good as I could do, but about as much as I was willing to do. I began to deliberately remove myself from the path of opportunities. All of this and I am still fascinated with the concept of artificial intelligence, and natural language processing, and the zillions of ways that humans can interact with their machines. My passion, however, is now reserved for writing speculative fiction and following the development of new technology at a spectator's distance.

My boyfriend wanted to know why I went to school for something that I didn't want to do. The problem is that I did want to do it. I just didn't want to do it their way. Everything got complicated in ways that I did not know how to deal with, both because of my immaturity and the complete left-fieldness of such blatant sexism to someone oblivious like me. This starts to sound like an excuse. In a way it is an excuse, because I could be pursuing my computer-related interests right now and I am not. I also stopped caring about them years ago and once you stop caring it is hard to start again.

My conversations about sexism until this point have been swapping incidents like vets swap war stories. Beyond the stories, however, we just kind of look at each other and tentatively suggest that we can form organizations. Then we do social things, enjoy being women in CS, and generally try and give ourselves a culture to replace the one giving us so much shit. That only works so far, I'm afraid, and I want more permanent solutions that address both the structure and the culture of CS.

My ultimate goal is to find a way to educate both male and female CS students early about the language of feminism and how it can - in very practical ways - benefit everyone learning and studying one of my favorite topics, computers and technology. Being able to tell a young man, "do not disrespect me by insinuating that I'm only here because my gender was needed to fill a quota," and having him understand what I am talking about would have made my college years so much more pleasant. Education is key, in both sexism and feminism, and dispelling damaging inaccuracies benefits men just as much as it benefits women.

This isn't so much about dismantling the pervasive sexism (though it could help), as much as it is me desperately wishing that I had known and been exposed to this so that the wheels could have started turning earlier. I was a senior or a super-senior before I was ever sat down and educated about racism, and it still has taken me three or four years after finding the related subject of sexism to reach this point even though I was well-acquainted with the concept through experience. The earlier this formal introduction and education happens, the better. I want to take the work done in one field, Women's Studies, and apply it practically to the steps that STEM women are already taking.

If anyone knows how to design a class, I'd love to any advice you might be willing to give. :)

Friday, March 5, 2010

Slowly Updating

It occurred to me that I had just shy of a dozen unfinished posts in my draft cache, created about one a month since December of 2008. I'll be trying to finish and post them over the next week or so.

<3

An Open Letter For a Second Chance

Dear friends,

This letter is not about me. This letter is about all the people who offended you years ago, and with whom you severed ties. This letter is about people who were going through something difficult that, while it did not excuse their actions, meant that their actions were a temporary response to a greater situation. This letter is about the people who destroyed their first chance with a rocket launcher.

The price of immaturity is the loss of second chances.

I ask for your understanding and for a little forgiveness. The person who harmed you in the past has grown, changed, and is now different and yet the same as they once were. They are stronger, more aware, and more mature. They have learned from their mistakes and while they are not perfect, they have discovered you and themselves valuable. One disaster does not mean the disaster will repeat forever.

Sometimes there is no way to be friends. Sometimes there is no way to come to middle ground and start over. Sometimes you or the other person understand the world completely differently. Sometimes the disaster really will repeat forever and you must rid yourself of the toxicity.

Sometimes.

But not always.

Sometimes the cause is this: Tragedy. Depression. Misunderstanding. Uncertainty. Frustration. Abuse. Solitude. Pressure. Bullying. Living a Lie. Miscommunication. Mistakes.

These are not excuses. The behavior that harmed you cannot be excused, only taken responsibility for and rectified if possible.

But I ask for forgiveness for those with depression. I ask for forgiveness for those suffering from abuse - at their own hands or others. I ask for forgiveness for those whose circumstances leave no room for understanding how to deal with something that is eating them alive. I ask for forgiveness for those whose circumstances make their lives incomprehensible to outsiders and so they cannot receive reliable help.

The desperate flailing of the drowning can break noses and wound hearts.

I ask for forgiveness because sometimes, the way to be in other people's company can be consumed by living life. The emotions and sensitivity others demand dissolve, leaving only the sharp edges of hurt and pain for everyone else to deal with. There is nothing left for others when all of the person's power is being poured into simply surviving. When surviving or trying to fight their way to a life more honest, more truthful, less confusing, more joyful, sometimes there only way someone sees to move forward is to defend against all comers.

When a person defends themselves, the defense does not always differentiate between friends and attackers. And, then, when a mistake is made, sapped self-confidence destroys the idea of reconciliation. There's no way to say, "I'm sorry" if the wound is deep enough.

There's no way to know there's a wound if there's no feedback, and a second chance is feedback. Not everyone can offer a second chance, and not everyone should. But if you can, I implore you to try. You might be the better for it.

I just want you to remember, not all harm is caused by cruelty. At some point, it may be you desperately seeking forgiveness for the past.