LISTSERV 16.5 - CODE4LIB Archives

Just to play Simplicity Devil's Advocate, and admittedly not having followed this whole thread or your whole design. 

What if the model was nothing but two entities:

Software
Person/Group (Yes, used either for an individual or a group of any sort). 

With a directed 'related' relationship between each entity and reflexive. (Software -> Person/Group ; Software -> Software; Person/Group -> Software ; Person/Group-> Person/Group ). 

That 'related' relationship can be annotated with a relationship type from a controlled vocabulary, as well as free-entered user tags.  Controlled vocabulary would include Person/Group *uses* Software;  Person/Group *develops* Software;  Software *component of* Software.  Person/Group *member of* Person/Group.  

People could enter 'tags' on the relationship for anything else they wanted. You could develop the controlled vocabulary further organically as you get more data and see what's actually needed -- and what people free tag, if they do so.  

Additional attributes are likely  needed on Software; probably not too many more on Person/Group.  But to encourage 'crowd source', you can enter a Software without filling out all of those attributes, it's as easy as filling out a simple form, and if you want making a couple relationships to other Software or Person/Group, or those can be made later by other people, if it catches on and people actually edit this. 

Things like URLs to software (or people!) home pages can really just be entered in a big free text field -- using wiki syntax, or better yet, Markdown. 

I think if the success of the project depends on volunteer crowd sourcing, you've got to keep things simple and make it as easy as possible to enter data in seconds. Really, even without the entering, keeping it simple will lead you to a simple interface, which will be useable and more likely to catch on. 



________________________________________
From: Code for Libraries [[log in to unmask]] on behalf of Peter Murray [[log in to unmask]]
Sent: Tuesday, August 09, 2011 5:45 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Seeking feedback on database design for an open source software registry

I'm combining several responses into one.  Apologies for the delay in getting back to folks; I'm technically on vacation at the moment...


On Aug 9, 2011, at 12:47 PM, Brice Stacey wrote:
> I'd be curious to know if this project itself would be open source.

Yes, we'll post the registry code and configuration in an open code repository.

> Second, I'm intrigued because I've never seen a UML diagram so close before in the wild and it's fascinating to discover the jokes are true (I kid, I kid...). Let's get serious and pull out your Refactoring book by Fowler and turn to page 336... you can "Extract superclass" to get Provider/Institution/Person to inherit from Entity. Then "Merge Hierarchy" to tear it down into a single Entity class and add a self-referencing association for employs. ProviderType should be renamed to Services and be made an association allowing 0..* services. At that point, the DB design is pretty straight forward and the architecture astronauts can come back down to earth.

Hmmm, interesting.  It has several modeling implications.  For instance, the model was forcing a Person to be associated with a Provider, and it came up in an earlier discussion (about using the word "Provider" for independent developers and consultants) that this was an unnecessary constraint.  And I think there is a difference between "uses" and "supports" that would need to be captured.

> Seriously though, I think that technically, you might be over thinking this. If you replace Package with Blog, Release with Post, Technology with Tag, Provider/Institution/Person with User, Keep Comment as Comment, and ignore Event for now.... It's just a simple collection of blogs with posts with tags and users that have roles and can leave comments.

Possible, but I think simplifying it that far will make it hard to answer some of the target questions.  The relationships ("uses") are necessary to tease out peer institutions who are using the package of interest.

> Lastly, you may want to look into Drupal's project module. I think that's what they use to run their module directory. It seems like it would be a good starting point and may work out of the box.

Cool -- thanks for the tip!

> It's a bold project. The library needs it and it's something no single institution would ever pay to have done, so I'm glad to see there is a grant for it.

I appreciate that.  I think so, too, and the external validation is useful.



On Aug 9, 2011, at 2:21 PM, Jonathan Rochkind wrote:
> I agree with Brice think you might be over-thinking/over-architecting
> it, although over-thinking is one of my sins too and I'm not always sure
> how to get out of it.

Oh, geez -- and as I was coming up with the draft I kept thinking of other entities for other features that would be useful, but put all of that off for possible later work.

> But am I correct that you're going to be relying on user-submitted
> content in large part? Then it's important to keep it simple, so it's
> easy for users to add content without having to go through a milliion
> steps and understand a complicated data model.  If you can keep it
> simple in a way that is flexible (the 'tags' idea for instance), you
> also may find users using it in ways that you didn't anticipate, but
> which are still useful.

It will be user-submitted content with a little oversight from a volunteer group of interested individuals.  (The group would curate the various vocabularies, for instance -- this became a much lighter burden with the characteristics/features function removed.)

I didn't include a "tag" function.  Do you think it would be useful?  With such a small set of entities I wasn't sure if it would be.



On Aug 9, 2011, at 2:42 PM, Matt Jones wrote:
> As some points for comparison, you might look at two exisintg and similar
> systems for registering software...
>
> First,  a software tools database that is maintained for the environmental
> sciences community:
> http://ebmtoolsdatabase.org/
>
> An example of one of my tool entries in this system is here:
> http://ebmtoolsdatabase.org/tool/kepler-scientific-workflow-system-0
>
> The system is easy to use, has some nice descriptions of the software, and
> is user-maintained.  Maybe some of their use cases and yours overlap?  I'm
> not sure which CMS they use, but I found it easy to edit entries myself.

Cool!  I'm still looking for more exemplars as points of comparison.  EBMTools seems to be based on Drupal as well.  It doesn't seem to be taking advantage of the built-in taxonomy structure; at least, there isn't a browse functionality beyond the alphabetical list.  I'll need to take a deeper look.

> Second, the open source site Ohloh has some nice features for characterizing
> a project, such as languages used, licenses, etc. Here's the page for the
> same Kepler system in Ohloh:
> https://www.ohloh.net/p/kepler
>
> Ohloh is nice because much of its information is harvested directly from
> links to the open source code repositories for the project, which allows it
> to show some nice trends in the software project's life.

A colleague e-mailed me privately about Ohloh as well, and in particular the metrics function to tell how viable a project is.  I haven't looked at Ohloh yet to see if it is possible to call into its service to get the metrics for registered projects, but at the very least this kind of project activity statistics is an important point for considering an open source package and I'd like to find a way to get it into this registry.



On Aug 7, 2011, at 4:10 PM, stuart yeates wrote:
> On 06/08/11 10:27, Peter Murray wrote:
>
>> Well, we certainly don't want to get into a situation where we find it is turtles all of the way down.
>
> Am I right in parsing that as "we have consciously decided to make the
> registry blind to the concept of visualisation." ?
>
> Given that visualisation is such a huge trend at the moment, good luck
> with that.

Stuart -- I apologize for not fully understanding your point; I think we are talking past each other.  I don't see how limiting the scope of the definition of "Package" to just library-related or library-specific entities makes a statement one way or another on visualization.


Peter
--
Peter Murray         [log in to unmask]        tel:+1-678-235-2955
Ass't Director, Technology Services Development   http://dltj.org/about/
LYRASIS   --    Great Libraries. Strong Communities. Innovative Answers.
The Disruptive Library Technology Jester                http://dltj.org/
Attrib-Noncomm-Share   http://creativecommons.org/licenses/by-nc-sa/2.5/