LISTSERV mailing list manager LISTSERV 16.5

Help for CODE4LIB Archives


CODE4LIB Archives

CODE4LIB Archives


CODE4LIB@LISTS.CLIR.ORG


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CODE4LIB Home

CODE4LIB Home

CODE4LIB  August 2011

CODE4LIB August 2011

Subject:

Re: Seeking feedback on database design for an open source software registry

From:

Jonathan Rochkind <[log in to unmask]>

Reply-To:

Code for Libraries <[log in to unmask]>

Date:

Tue, 9 Aug 2011 23:34:25 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (125 lines)

Just to play Simplicity Devil's Advocate, and admittedly not having followed this whole thread or your whole design. 

What if the model was nothing but two entities:

Software
Person/Group (Yes, used either for an individual or a group of any sort). 

With a directed 'related' relationship between each entity and reflexive. (Software -> Person/Group ; Software -> Software; Person/Group -> Software ; Person/Group-> Person/Group ). 

That 'related' relationship can be annotated with a relationship type from a controlled vocabulary, as well as free-entered user tags.  Controlled vocabulary would include Person/Group *uses* Software;  Person/Group *develops* Software;  Software *component of* Software.  Person/Group *member of* Person/Group.  

People could enter 'tags' on the relationship for anything else they wanted. You could develop the controlled vocabulary further organically as you get more data and see what's actually needed -- and what people free tag, if they do so.  

Additional attributes are likely  needed on Software; probably not too many more on Person/Group.  But to encourage 'crowd source', you can enter a Software without filling out all of those attributes, it's as easy as filling out a simple form, and if you want making a couple relationships to other Software or Person/Group, or those can be made later by other people, if it catches on and people actually edit this. 

Things like URLs to software (or people!) home pages can really just be entered in a big free text field -- using wiki syntax, or better yet, Markdown. 

I think if the success of the project depends on volunteer crowd sourcing, you've got to keep things simple and make it as easy as possible to enter data in seconds. Really, even without the entering, keeping it simple will lead you to a simple interface, which will be useable and more likely to catch on. 



________________________________________
From: Code for Libraries [[log in to unmask]] on behalf of Peter Murray [[log in to unmask]]
Sent: Tuesday, August 09, 2011 5:45 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Seeking feedback on database design for an open source software registry

I'm combining several responses into one.  Apologies for the delay in getting back to folks; I'm technically on vacation at the moment...


On Aug 9, 2011, at 12:47 PM, Brice Stacey wrote:
> I'd be curious to know if this project itself would be open source.

Yes, we'll post the registry code and configuration in an open code repository.

> Second, I'm intrigued because I've never seen a UML diagram so close before in the wild and it's fascinating to discover the jokes are true (I kid, I kid...). Let's get serious and pull out your Refactoring book by Fowler and turn to page 336... you can "Extract superclass" to get Provider/Institution/Person to inherit from Entity. Then "Merge Hierarchy" to tear it down into a single Entity class and add a self-referencing association for employs. ProviderType should be renamed to Services and be made an association allowing 0..* services. At that point, the DB design is pretty straight forward and the architecture astronauts can come back down to earth.

Hmmm, interesting.  It has several modeling implications.  For instance, the model was forcing a Person to be associated with a Provider, and it came up in an earlier discussion (about using the word "Provider" for independent developers and consultants) that this was an unnecessary constraint.  And I think there is a difference between "uses" and "supports" that would need to be captured.

> Seriously though, I think that technically, you might be over thinking this. If you replace Package with Blog, Release with Post, Technology with Tag, Provider/Institution/Person with User, Keep Comment as Comment, and ignore Event for now.... It's just a simple collection of blogs with posts with tags and users that have roles and can leave comments.

Possible, but I think simplifying it that far will make it hard to answer some of the target questions.  The relationships ("uses") are necessary to tease out peer institutions who are using the package of interest.

> Lastly, you may want to look into Drupal's project module. I think that's what they use to run their module directory. It seems like it would be a good starting point and may work out of the box.

Cool -- thanks for the tip!

> It's a bold project. The library needs it and it's something no single institution would ever pay to have done, so I'm glad to see there is a grant for it.

I appreciate that.  I think so, too, and the external validation is useful.



On Aug 9, 2011, at 2:21 PM, Jonathan Rochkind wrote:
> I agree with Brice think you might be over-thinking/over-architecting
> it, although over-thinking is one of my sins too and I'm not always sure
> how to get out of it.

Oh, geez -- and as I was coming up with the draft I kept thinking of other entities for other features that would be useful, but put all of that off for possible later work.

> But am I correct that you're going to be relying on user-submitted
> content in large part? Then it's important to keep it simple, so it's
> easy for users to add content without having to go through a milliion
> steps and understand a complicated data model.  If you can keep it
> simple in a way that is flexible (the 'tags' idea for instance), you
> also may find users using it in ways that you didn't anticipate, but
> which are still useful.

It will be user-submitted content with a little oversight from a volunteer group of interested individuals.  (The group would curate the various vocabularies, for instance -- this became a much lighter burden with the characteristics/features function removed.)

I didn't include a "tag" function.  Do you think it would be useful?  With such a small set of entities I wasn't sure if it would be.



On Aug 9, 2011, at 2:42 PM, Matt Jones wrote:
> As some points for comparison, you might look at two exisintg and similar
> systems for registering software...
>
> First,  a software tools database that is maintained for the environmental
> sciences community:
> http://ebmtoolsdatabase.org/
>
> An example of one of my tool entries in this system is here:
> http://ebmtoolsdatabase.org/tool/kepler-scientific-workflow-system-0
>
> The system is easy to use, has some nice descriptions of the software, and
> is user-maintained.  Maybe some of their use cases and yours overlap?  I'm
> not sure which CMS they use, but I found it easy to edit entries myself.

Cool!  I'm still looking for more exemplars as points of comparison.  EBMTools seems to be based on Drupal as well.  It doesn't seem to be taking advantage of the built-in taxonomy structure; at least, there isn't a browse functionality beyond the alphabetical list.  I'll need to take a deeper look.

> Second, the open source site Ohloh has some nice features for characterizing
> a project, such as languages used, licenses, etc. Here's the page for the
> same Kepler system in Ohloh:
> https://www.ohloh.net/p/kepler
>
> Ohloh is nice because much of its information is harvested directly from
> links to the open source code repositories for the project, which allows it
> to show some nice trends in the software project's life.

A colleague e-mailed me privately about Ohloh as well, and in particular the metrics function to tell how viable a project is.  I haven't looked at Ohloh yet to see if it is possible to call into its service to get the metrics for registered projects, but at the very least this kind of project activity statistics is an important point for considering an open source package and I'd like to find a way to get it into this registry.



On Aug 7, 2011, at 4:10 PM, stuart yeates wrote:
> On 06/08/11 10:27, Peter Murray wrote:
>
>> Well, we certainly don't want to get into a situation where we find it is turtles all of the way down.
>
> Am I right in parsing that as "we have consciously decided to make the
> registry blind to the concept of visualisation." ?
>
> Given that visualisation is such a huge trend at the moment, good luck
> with that.

Stuart -- I apologize for not fully understanding your point; I think we are talking past each other.  I don't see how limiting the scope of the definition of "Package" to just library-related or library-specific entities makes a statement one way or another on visualization.


Peter
--
Peter Murray         [log in to unmask]        tel:+1-678-235-2955
Ass't Director, Technology Services Development   http://dltj.org/about/
LYRASIS   --    Great Libraries. Strong Communities. Innovative Answers.
The Disruptive Library Technology Jester                http://dltj.org/
Attrib-Noncomm-Share   http://creativecommons.org/licenses/by-nc-sa/2.5/

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003

ATOM RSS1 RSS2



LISTS.CLIR.ORG

CataList Email List Search Powered by the LISTSERV Email List Manager