(Back from vacation now.  Thanks again for everyone's thoughts and suggestions.)

On Aug 9, 2011, at 7:34 PM, Jonathan Rochkind wrote:
> Just to play Simplicity Devil's Advocate, and admittedly not having followed this whole thread or your whole design. 
> What if the model was nothing but two entities:
> Software
> Person/Group (Yes, used either for an individual or a group of any sort). 
> With a directed 'related' relationship between each entity and reflexive. (Software -> Person/Group ; Software -> Software; Person/Group -> Software ; Person/Group-> Person/Group ). 
> That 'related' relationship can be annotated with a relationship type from a controlled vocabulary, as well as free-entered user tags.  Controlled vocabulary would include Person/Group *uses* Software;  Person/Group *develops* Software;  Software *component of* Software.  Person/Group *member of* Person/Group.  
> People could enter 'tags' on the relationship for anything else they wanted. You could develop the controlled vocabulary further organically as you get more data and see what's actually needed -- and what people free tag, if they do so.  
> Additional attributes are likely  needed on Software; probably not too many more on Person/Group.  But to encourage 'crowd source', you can enter a Software without filling out all of those attributes, it's as easy as filling out a simple form, and if you want making a couple relationships to other Software or Person/Group, or those can be made later by other people, if it catches on and people actually edit this. 
> Things like URLs to software (or people!) home pages can really just be entered in a big free text field -- using wiki syntax, or better yet, Markdown. 
> I think if the success of the project depends on volunteer crowd sourcing, you've got to keep things simple and make it as easy as possible to enter data in seconds. Really, even without the entering, keeping it simple will lead you to a simple interface, which will be useable and more likely to catch on. 

Interesting model.  I'd like to think this through a little more; my first thoughts are that while it might make the user interface and the data model simpler, enforcement of consistency of the data itself would diminish, which might cause a hodgepodge data that would be difficult to page through.  Simplicity in data entry might be sacrificed for simplicity in search/browse.

On Aug 9, 2011, at 7:50 PM, stuart yeates wrote:
> You may also be interested in the (older?) work at 
> and For example:
> /
> Interoperability with RDF/DOAP lets you build on others work and lets 
> others in turn pick your work over.
> At the very least if allows you to get suck in the latest and greatest 
> releases automatically.

Ah, yes!  That is the sort of linked data interoperability I was thinking would be possible.  Thanks for the pointers to those efforts.

On Aug 9, 2011, at 8:23 PM, Matt Jones wrote:
> On Tue, Aug 9, 2011 at 3:50 PM, stuart yeates <[log in to unmask]>wrote:
>> ...
>> Ohloh is great. However it relies almost completely on metrics which are
>> easily gamed by the technically competent. Use of these kinds of metrics in
>> ways which encouraging gaming will only be productive in the short term,
>> perhaps the very short term.
>> For example: it's easy to set up dummy version control accounts and there
>> can be good technical reasons for doing so. It's easy to set up a build/test
>> suite to update a file in the version control after it's daily run and there
>> can be good technical reasons for doing so. But doing these things can also
>> transform a very-low activity single user project into a high-activity dual
>> user project, in the eyes of ohloh.
>> Turning on template-derived comments in the next big migration handles the
>> "is the code commented?" metric.
>> The more metrics are used, the more motivation there is to use tools (which
>> admittedly have other motivations) which make a project look good.
> I agree the ohloh metrics are easily gamed.  What metrics do you recommend
> that can't be gamed but still provide a synopsis of the project for
> evaluation, comparison, and selection? I think there is some utility even
> though they can be gamed.  The metrics are not a substitute for critical
> evaluation, but provide a nice synopsis as a jumping off point.  For
> example, if I am interested in projects that have a demonstrable lifespan >
> 5 years, and that have had more than 10 developers contribute, I can find
> that via these metrics.  I can then assess for myself if any of the
> resulting projects are false positives (e.g., the commit log will give some
> idea of the types of commits made by each person).
> If you're concerned about the system being gamed via metrics, then you
> should also be concerned about user-submitted project descriptions.
> Projects have a tendency to over-generalize on what their software does,
> under-report defects, and generally paint a rosy picture.  Will there be
> some sort of quality control/editing/verification of the claims made by
> submitters? Will it matter if some of the projects are described more
> generously than in reality?  Won't the system still be useful even if they
> are?

I'm interested to hear more about what others think would be good metrics.  I agree with Matt that they serve as a useful rough sorting mechanism (perhaps as a way to cull projects which clearly have no active community, or at least not one that is actively gaming the metrics -- but even gaming shows some activity, doesn't it?).  

Peter Murray         [log in to unmask]        tel:+1-678-235-2955                 
Ass't Director, Technology Services Development
LYRASIS   --    Great Libraries. Strong Communities. Innovative Answers.
The Disruptive Library Technology Jester