Megducation: databases

Showing posts with label databases. Show all posts

Sunday, May 23, 2010

Stop Thinking of Internet Privacy in Human Terms

David Hurley from Intellectual Freedom Roundtable (IFRT) writes about how liking Star Trek could hinder your chances at a job in childcare. Sort of. Hypothetically!

Rather than a person knowing discrete facts, the database allows your data to be carefully analyzed as part of the aggregate. And when you analyze such a huge pot of data, you start finding odd correlations.

Wednesday, May 12, 2010

Stephen Wolfram - Making All Knowledge Computational

Stephen Wolfram talks about his ideas to make all knowledge computational. I'll be honest, a lot of it goes right over my head, but it's worth watching. Stephen Wolfram designed Wolfram Alpha, a curious little search engine that provides very up to date computational facts.

Parameta Data

Start with meta, get your head around metadata, and then consider parameta data. I think I get it.

Friday, April 30, 2010

How Much Information is There?

How much information is there? Not counting books, just the digital information storage.

For most of us, “a crapload” is a sufficiently accurate answer. But for a few obsessive data analysts, more precision is necessary. According to a recent study by market-research company IDC, and sponsored by storage company EMC, the size of the information universe is currently 800,000 petabytes. Each petabyte is a million gigabytes, or the equivalent of 1,000 one-terabyte hard drives.

If you stored all of this data on DVDs, the study’s authors say, the stack would reach from the Earth to the moon and back.

Wednesday, April 14, 2010

Library of Congress and Twitter

Every Tweet Will Be Preserved for History. Seriously?

I wonder what category tweets go under in the Library of Congress Classification system.

Saturday, April 3, 2010

FAQ: Google, China, and Censorship

WIRED.com has made an FAQ (a list of Frequently Asked Questions) regarding Google, China, and Censorship. It is useful!

Saturday, March 13, 2010

Wikipedia Collaborators and Their Roles

A paper written by a University of Arizona professor and a graduate student found that quality Wikipedia articles are the result of the work of different kinds of collaborators.

Starters, for example, create sentences but seldom engage in other actions. Content justifiers create sentences and justify them with resources and links. Copy editors contribute primarily though modifying existing sentences. Some users – the all-round contributors – perform many different functions.

So basically, really good Wikipedia articles come about by embracing the core tenet of the project: many people working together provide better information.

The paper is also available for download.

Tuesday, February 23, 2010

Google's Algorithm

It's been far too long since the Google tag has come up. Here, learn about their search algorithms.

Take, for instance, the way Google’s engine learns which words are synonyms. “We discovered a nifty thing very early on,” Singhal says. “People change words in their queries. So someone would say, ‘pictures of dogs,’ and then they’d say, ‘pictures of puppies.’ So that told us that maybe ‘dogs’ and ‘puppies’ were interchangeable. We also learned that when you boil water, it’s hot water. We were relearning semantics from humans, and that was a great advance.”

But there were obstacles. Google’s synonym system understood that a dog was similar to a puppy and that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy. The problem was fixed in late 2002 by a breakthrough based on philosopher Ludwig Wittgenstein’s theories about how words are defined by context. As Google crawled and archived billions of documents and Web pages, it analyzed what words were close to each other. “Hot dog” would be found in searches that also contained “bread” and “mustard” and “baseball games” — not poached pooches. That helped the algorithm understand what “hot dog” — and millions of other terms — meant. “Today, if you type ‘Gandhi bio,’ we know that bio means biography,” Singhal says. “And if you type ‘bio warfare,’ it means biological.”

Saturday, January 30, 2010

123 Hack Me

A New York Times article reveals many people are still using simple, easily-guessed passwords.

Back at the dawn of the Web, the most popular account password was “12345.” Today, it’s one digit longer but hardly safer: “123456.”

This list comes from a list of 32 million passwords a hacker posted from a company that makes software used by social networking sites like Facebook and MySpace. It was only briefly posted, but it was downloaded and examined by hackers and security specialists alike. What a great resource!

According to the article, here are the top 32 passwords:

123456
12345
123456789
password
iloveyou
princess
rockyou
1234567
12345678
abc123
nicole
daniel
babygirl
monkey
jessica
lovely
michael
ashley
654321
qwerty
iloveu
michelle
111111
0
tigger
password1
sunshine
chocolate
anthony
angel
FRIENDS
soccer

I knew this guy in high school with a PDA. I was intrigued by it; he let me play with it. I remember asking him what the password was and he would say, "It's a secret." After a sadly long time, I clued in that the password was 'itsasecret'.

Monday, January 25, 2010

Haiti People Finder

Alright, it may not have anything to do with libraries, but this is the Haiti people finder project. It does feature working with databases, which is a librarian thing, and also helping people - also a librarian thing. If you have time and can follow instructions, you can help.

Librarian fail: I neglected to notice where I got this link from.