Right, what I'm saying is that this survey is subject to "response bias" (http://en.wikipedia.org/wiki/Response_bias - "It also occurs in situations of voluntary response, such as phone-in polls, where the people who care enough to call are not necessarily a statistically representative sample of the actual population"), which doesn't render it irrelevant, it just can't, by itself, be declared representative of the non-participating community's demographics.
My point here isn't that it's not representative, it's that we can't know because the subject matter of the survey (which is about gender inequality, esp. among females) inherently produces statistical bias.
On Dec 5, 2012, at 2:23 PM, Jonathan Rochkind <[log in to unmask]> wrote:
> Hmm, it's quite possible you know more about statistics than me, but...
> Usually equations for calculating confidence level are based on the assumption of a random sample, not a volunteering self-selected sample.
> If you have a self-selected sample, then the equations for "how likely is this to be a fluke" are only accurate if your self-selected sample is representative; and there aren't really any equations that can tell you how likely your self-selected sample is to be representative, it depends on the circumstances (which is why for the statistical equations to be completely valid, you need a random sample).
> Is my understanding.
> On 12/5/2012 2:18 PM, Rosalyn Metz wrote:
>> I totally get what you're saying, I thought of all of that too, but
>> according to everything I was reading through, the likelihood that the
>> survey's results are a fluke is extremely low. Its actually the reason I
>> put information in the write up about the sample size (378), population
>> size (2,250), response rate (16.8%), confidence level (95%), and confidence
>> interval (+/- 4.6%).
>> On Wed, Dec 5, 2012 at 1:52 PM, Ross Singer <[log in to unmask]> wrote:
>>> Thanks, Rosalyn for setting this up and compiling the results!
>>> While it doesn't change my default position, "yes we need more diversity
>>> among Code4lib presenters!", I'm not sure, statistically speaking, that you
>>> can draw the conclusions you have based on the sample size, especially
>>> given the survey's topic (note, I am not saying that women aren't
>>> underrepresented in the Code4lib program).
>>> If 83% of the mailing didn't respond, we simply know nothing about their
>>> demographics. They could be 95% male, they could be 99% female, we have no
>>> idea. I think it is safe to say that the breakdown of the 16% is probably
>>> biased towards females simply given the subject matter and the dialogue
>>> that surrounded it. We simply cannot project that the mailing list is
>>> 57/42 from this, I don't think.
>>> What is interesting, however, is that the number roughly corresponds to
>>> the number of seats in the conference. I think it would be interesting to
>>> see how this compares to the gender breakdown at the conference.
>>> This doesn't diminish how awesome it is that you put this together,
>>> though. Thanks, again to you and Karen!
>>> On Dec 5, 2012, at 1:28 PM, Rosalyn Metz <[log in to unmask]> wrote:
>>>> Hi Friends,
>>>> I put together the data and a summary for the gender survey. Now that
>>>> conference and hotel registration has subsided, it's a perfect time for
>>>> to kick back and read through.
>>>> [Code4Lib] Gender Survey
>>>> Gender Survey Data is the raw data for the survey. Not very interesting,
>>>> but you can use it to view my Pivot Tables and charts.
>>>> [Code4Lib] Gender Survey
>>>> Gender Survey Summary is easy to read version of the above -- its the
>>>> summary I wrote about the results. Included is a brief intro, charts
>>>> above), and a summary of the results.
>>>> Let the discussion begin,
>>>> P.S. Much thanks to Karen Coyle for reviewing the summary for me before I
>>>> sent it out. Also if there are any typos or grammar mistakes, please
>>>> my friend Abigail who behaved as my editor.