Print

Print


I'd just do this the old fashioned way. Awk is great for problems like
this. For example, if your file is tab delimited, the following should work

awk '{FS="\t"}{if ($2 != "") question = $2;}{print $1,question,$3}''
yourfile

In the example above, I just print the fields but you could easily encase
them in tags.

kyle


On Mon, Jun 16, 2014 at 9:29 AM, Ryan Engel <[log in to unmask]> wrote:

> Thanks for the responses, on the list and off, so far.
>
> As I'm sure is true for so many of us, my interest in learning more about
> how to solve this type of problem is balanced against my need to just get
> the project done so I can move on to other things.  One of the great things
> about this list is the ability to learn from the collective experiences of
> colleagues.  For this project specifically, even clues about better search
> terms is useful; as Chris Gray pointed out, basic Google searches present
> too many hits.
>
> I did try following the "Create an XML data file and XML schema file from
> worksheet data" instructions on the Microsoft site.  And it did produce an
> XML document, but it wasn't able to transform this:
> Row1    Question1    Q1Answer1
> Row2                        Q1Answer2
>
> ...into something like this:
> <row1>Row One Data</row1>
> <question1>This is a question</question1>
> <answers>
> <q1answer1>Answer 1</q1answer1>
> <q1answer2>Answer2</q1answer2>
> </answers
>
> Instead, I could get it to either convert every row into its own XML
> entry, meaning that I had a lot of answers with no associated questions, or
> I got an XML file that had 1 question with EVERY SINGLE answer nested
> beneath it -- effectively all questions after the first question were
> ignored.  Based on those results, I wasn't sure if there is more tweaking I
> could do in Excel, or if there is some programmed logic in Excel that can't
> be accounted for when associating a schema.
>
>
> Another suggestion I received was to "fill" the question column so that
> every row had a question listed.  I did consider this, but the problem then
> is during the data import, I'd have to convince my CMS to put all the
> answers back together based on the question, something I'm sure Drupal
> COULD do, but I'm not sure how to do that either.
>
>
> Finally, this project is a spreadsheet with 225,270 rows, so you can
> imagine why I'd like a process that is reasonably trustworthy AND that can
> run locally.
>
>
> Anyway, any/all additional suggestions appreciated, even if they are "try
> searching for "blah blah python parser", or "I made something that solves a
> similar process, and you can download it from Git".
>
> Ryan
> ___
>
> Ryan Engel
> Web Stuff
> UW-Madison
>
>> Dana Pearson <mailto:[log in to unmask]>
>> June 13, 2014 at 7:14 PM
>> I don't use Excel but a client did who wanted to use XSL I had created
>> ONIX
>> to MARC to transform bibliographic metadata in Excel to XML. The built
>> in Excel XML converter was not very helpful since empty cells were skipped
>> so that it was impossible to use that result.
>>
>> There is an add on that allow you to map your data to XML elements by
>> creating a schema which is pretty cool.
>>
>> http://bit.ly/1jpwtqM
>>
>> This might be helpful.
>>
>> regards,
>> dana
>>
>>
>>
>>
>>
>> Terry Brady <mailto:[log in to unmask]>
>> June 13, 2014 at 6:53 PM
>> The current version of Excel offers a save as XML option.
>>
>> It will produce something like this. There is other wrapping metadata, but
>> the table is pretty easy to parse.
>>
>> <Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="7"
>> x:FullColumns="1"
>> x:FullRows="1" ss:DefaultRowHeight="15">
>> <Row>
>> <Cell ss:StyleID="s62"><Data ss:Type="String">row 1</Data></Cell>
>> <Cell><Data ss:Type="String">question 1</Data></Cell>
>> <Cell><Data ss:Type="String">answer 1</Data></Cell>
>> </Row>
>> <Row>
>> <Cell ss:StyleID="s62"><Data ss:Type="String">row 2</Data></Cell>
>> <Cell ss:Index="3"><Data ss:Type="String">answer 2</Data></Cell>
>> </Row>
>> <Row>
>> <Cell ss:StyleID="s62"><Data ss:Type="String">row 3</Data></Cell>
>> <Cell ss:Index="3"><Data ss:Type="String">answer 3</Data></Cell>
>> </Row>
>> <Row>
>> <Cell ss:StyleID="s62"><Data ss:Type="String">row 4</Data></Cell>
>> <Cell><Data ss:Type="String">question 2</Data></Cell>
>> <Cell><Data ss:Type="String">answer 1</Data></Cell>
>> </Row>
>> <Row>
>> <Cell ss:StyleID="s62"><Data ss:Type="String">row 5 </Data></Cell>
>> <Cell ss:Index="3"><Data ss:Type="String">answer 2</Data></Cell>
>> </Row>
>> <Row>
>> <Cell ss:StyleID="s62"><Data ss:Type="String">row 6</Data></Cell>
>> <Cell><Data ss:Type="String">quest </Data></Cell>
>> <Cell><Data ss:Type="String">answer 3</Data></Cell>
>> </Row>
>> <Row>
>> <Cell ss:StyleID="s62"/>
>> </Row>
>> </Table>
>>
>>
>>
>>
>>
>> Ryan Engel <mailto:[log in to unmask]>
>> June 13, 2014 at 4:28 PM
>> Hello -
>>
>> I have an Excel spreadsheet that, for the purposes of an easy import into
>> a Drupal site, I'd like to convert to XML.  I know people more
>> knowledgeable than I could code up something in Python or Perl to convert a
>> CSV version of the data to XML (and I have a colleague who offered to do
>> just that for me), but I am looking for recommendations for something more
>> immediately accessible.
>>
>> Here's an idea of how the spreadsheet is structured:
>>
>> Row1    Question1    Q1Answer1
>> Row2                        Q1Answer2
>> Row3                        Q1Answer3
>> Row4    Question2    Q2Answer1
>> Row5                        Q2Answer2
>> Row6    Question3    Q3Answer1
>> etc.
>>
>> How do other people approach this?  Import the data to an SQL database,
>> write some clever queries, and then export that to XML?  Work some wizardry
>> in GoogleRefine/OpenRefine?  Are scripting languages really the best all
>> around solution?  Excel's built in XML mapping function wasn't able to
>> process the one-to-many relationship of questions to answers, though maybe
>> I just don't know how to build the mapping structure correctly.
>>
>> In the interest immanent deadlines, I have handed the spreadsheet off to
>> my Perl-writing colleague.  But as a professional growth opportunity, I'm
>> interested in suggestions from Libraryland about ways others have
>> approached this successfully.
>>
>> Thanks!
>>
>> Ryan Engel
>> Web Stuff
>> UW-Madison
>>
>
> --
>
>
>