LISTSERV 16.5 - CODE4LIB Archives

On Oct 27, 2014, at 12:38 PM, Bigwood, David wrote:

> Learning UNIX is fine. However, I do think learning SQL might be a better investment. So many of our resources are in databases. Understanding indexing, sorting and relevancy ranking of our databases is also crucial. With linked data being all the rage knowing about sparql endpoints is important. The presentation of the information from databases under our control needs work. Is the information we present actionable or just strings?

Quite likely. I wouldn't teach people SQL (and I've done plenty of pl/sql and t/sql programming) unless:

1. They had data they wanted to use that's already on an SQL server.
2. They had a (read-only) account on that server, so they could actually use it.

If they had to go about setting up a server (even if it's an installable application) and ingesting their data to be able to analyze, you can get frustrated before you even start to see any useful results.

If they have some scenario where they need multiple tables and joins, then sure, teach them SQL ... but over the years, I've had weeks of SQL-related training*, and I don't know that I'd want to make anyone go through all of that if they're just trying to do some simple reports that could be done in other ways. I wouldn't even suggest teaching people about indexing until they've tried doing stuff in SQL and wondered why it's so slow.

Likewise, if there were some sort of non-SQL database for them to play with (even an LDAP server) that might have information of use to them, I'd teach them that first ... but I'd likely start w/ unix command line stuff (see below).

> Or maybe I just like those topics better and find the work being done there fascinating?

Quite likely. I still haven't found a reason good reason to wrap my head around sparql ... I guess in part because the stuff I'm dealing with isn't served as linked data.

...

On Oct 27, 2014, at 11:15 AM, Tod Olson wrote:

> There’s also something to be said for the Unix pipeline/filter model of processing. That way of breaking down a task into small steps, wiring little programs to filter the data for each step, building up the solution iteratively, essentially a form of function composition. Immedidately, you can do a lot of powerful one-off or scripting tasks right from the command line. More generally, it’s a very powerful model to have in your head, can transform your thinking.

I 100% agree.

If I were to try to teach "unix" to a group, I'd come up with some scenarios
where command like tools can actually help them, and show them how to automate
things that they'd have to do anyway. (or tried to do, and gave up on).

For instance, if there's some sort of metric that they need, you can show
how simple `cut | sort | uniq | wc` can be used...

eg, if I have a 'common' or 'common+' webserver log file, I can get a quick
count of today's unique hosts via :

cut -d" " -f1 /var/log/httpd/access_log-2014.10.27 | sort | uniq | wc -l

If I wanted to see the top 10 hosts hitting us:

cut -d" " -f1 /var/log/httpd/access_log-2014.10.27 | sort | uniq -c | sort -rn | head -10

If you're lazy, and want to alias this so it didn't have to hard-code today's date:

cut -d" " -f1 `ls -1t /var/log/httpd/access_log* | head -1` | sort | uniq | wc -l

If your log files are rolled weekly, and we need to extract just today :
(note that it's easier if you're sure that something looking like today's date won't show up in requests)

If you just wanted a quick report of hits per day, and your log files aren't rolled and compressed:

cat `ls -1tr /var/log/httpd/access_log*` | cut -d\[ -f2 | cut -d: -f1 | uniq -c | more

(note that that last one isn't always clean ... the dates logged are when the request started, but they're logged when the script finishes, so sometimes you'll get something strange like:

12354 23/Oct/2014
3 24/Oct/2014
1 23/Oct/2014
14593 24/Oct/2014

... but if you try to use `sort`, and you cross months, it'll sort of alphabetical, not cronological)

You could probably dedicate another full day to sed & awk, if you wanted ... or teach them enough perl to be dangerous.

-Joe

* I've taken all of the Oracle DBA classes back in the 8i days (normally 4 weeks if taken as full-day classes), plus Oracle's data modeling and sql tuning classes (4-5 days each?)