Print

Print


Ryan, it sounds like you simply want to pull two relational tables into 
drupal using referenced entities in a one (question) to many (answers) 
relationship?

This can be accomplished, albeit unintuitively (it is Drupal afterall), 
by using the Feeds CSV parser with Feeds Tamper, and I would skip what 
seems like an unnecessary conversion to XML step.

First, you import your questions and set a GUID target (typically your 
db id).  Then import your answers -- each should have a reference to the 
id mapped using the "Entity Reference by Feeds GUID".  I have done it in 
reverse order too, in which case you'd import all your answers first. In 
this case, your second (questions) import needs to contain a single 
column of delimited answer GUIDs into a single column.  Use the Feeds 
Tamper explode plugin on that field during mapping as explained (rather 
vaguely) here:

http://drupal.stackexchange.com/questions/32234/how-to-use-feeds-module-to-import-multi-value-fields

If that's not what you're trying to do, can you clarify?

-Shaun

On 6/16/14 3:35 PM, Joshua Welker wrote:
> Sorry, the last line got messed up by outlook.
>
> #now save the whole thing as an xml file
>
> with open('myfile.xml', 'wb') as file
>   ElementTree(rootNode).write(file)
>
>
> Josh Welker
>
>
> -----Original Message-----
> From: Joshua Welker [mailto:[log in to unmask]]
> Sent: Monday, June 16, 2014 2:32 PM
> To: Code for Libraries
> Subject: RE: [CODE4LIB] Excel to XML (for a Drupal Feeds import)
>
> This should be quite doable in most programming languages with
> out-of-the-box tools and no tricky parsing code. The gist is to save in
> Excel as a delimited text file (tab is a good choice), then have your script
> ingest the document and turn it into an array, and then turn the array into
> XML. In Python, it could be something like the code below (not tested but
> the principles should be sound):
>
> import 'csv'
> from elementtree.ElementTree import Element, SubElement
>
> #create a list
> mylist = []
>
> #open your delimited file with a csv reader with open('myfile.txt', 'rb') as
> textfile:
>    reader = csv.reader( textfile, delimiter='\t', quotechat='"')     #this
> assumes your file is tab-delimited (\t)
>
>    #loop through rows in your file and save each row as a key/value pair
> (dictionary)
>    for row in textfile:
>      fields = {
>        'field1': row[0]
>        'field2': row[1]
>        'field3': row[2]
>        'field4': row[3]
>      }
>
>      #append this row to our master list
>      mylist.append( fields )
>
>
> #create an xml root node
> rootNode = Element("XmlRoot")
>
> #loop through our list of "rows" from the text file and create xml nodes for
> row in mylist:
>    rowNode = Element("record")
>
>    #loop through all the fields on this "row" and turn them into xml nodes
>    for fieldName, fieldValue in row:
>      fieldNode = Element(fieldName)
>      fieldNode.text = fieldValue
>
>      #append each field node to the parent row node
>      rowNode.append(fieldNode)
>
>    #append each row node to the document root node
>    rootNode.append(rowNode)
>
> #now save the whole thing as an xml file with open('myfile.xml', 'wb') as
> file
>   ElementTree(rootNode).write(file)
>
>
>
> Josh Welker
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Kyle
> Banerjee
> Sent: Monday, June 16, 2014 1:04 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import)
>
> I'd just do this the old fashioned way. Awk is great for problems like this.
> For example, if your file is tab delimited, the following should work
>
> awk '{FS="\t"}{if ($2 != "") question = $2;}{print $1,question,$3}''
> yourfile
>
> In the example above, I just print the fields but you could easily encase
> them in tags.
>
> kyle
>
>
> On Mon, Jun 16, 2014 at 9:29 AM, Ryan Engel <[log in to unmask]> wrote:
>
>> Thanks for the responses, on the list and off, so far.
>>
>> As I'm sure is true for so many of us, my interest in learning more
>> about how to solve this type of problem is balanced against my need to
>> just get the project done so I can move on to other things.  One of
>> the great things about this list is the ability to learn from the
>> collective experiences of colleagues.  For this project specifically,
>> even clues about better search terms is useful; as Chris Gray pointed
>> out, basic Google searches present too many hits.
>>
>> I did try following the "Create an XML data file and XML schema file
>> from worksheet data" instructions on the Microsoft site.  And it did
>> produce an XML document, but it wasn't able to transform this:
>> Row1    Question1    Q1Answer1
>> Row2                        Q1Answer2
>>
>> ...into something like this:
>> <row1>Row One Data</row1>
>> <question1>This is a question</question1> <answers> <q1answer1>Answer
>> 1</q1answer1> <q1answer2>Answer2</q1answer2> </answers
>>
>> Instead, I could get it to either convert every row into its own XML
>> entry, meaning that I had a lot of answers with no associated
>> questions, or I got an XML file that had 1 question with EVERY SINGLE
>> answer nested beneath it -- effectively all questions after the first
>> question were ignored.  Based on those results, I wasn't sure if there
>> is more tweaking I could do in Excel, or if there is some programmed
>> logic in Excel that can't be accounted for when associating a schema.
>>
>>
>> Another suggestion I received was to "fill" the question column so
>> that every row had a question listed.  I did consider this, but the
>> problem then is during the data import, I'd have to convince my CMS to
>> put all the answers back together based on the question, something I'm
>> sure Drupal COULD do, but I'm not sure how to do that either.
>>
>>
>> Finally, this project is a spreadsheet with 225,270 rows, so you can
>> imagine why I'd like a process that is reasonably trustworthy AND that
>> can run locally.
>>
>>
>> Anyway, any/all additional suggestions appreciated, even if they are
>> "try searching for "blah blah python parser", or "I made something
>> that solves a similar process, and you can download it from Git".
>>
>> Ryan
>> ___
>>
>> Ryan Engel
>> Web Stuff
>> UW-Madison
>>
>>> Dana Pearson <mailto:[log in to unmask]> June 13, 2014 at 7:14
>>> PM I don't use Excel but a client did who wanted to use XSL I had
>>> created ONIX to MARC to transform bibliographic metadata in Excel to
>>> XML. The built in Excel XML converter was not very helpful since
>>> empty cells were skipped so that it was impossible to use that
>>> result.
>>>
>>> There is an add on that allow you to map your data to XML elements by
>>> creating a schema which is pretty cool.
>>>
>>> http://bit.ly/1jpwtqM
>>>
>>> This might be helpful.
>>>
>>> regards,
>>> dana
>>>
>>>
>>>
>>>
>>>
>>> Terry Brady <mailto:[log in to unmask]> June 13, 2014 at 6:53 PM
>>> The current version of Excel offers a save as XML option.
>>>
>>> It will produce something like this. There is other wrapping
>>> metadata, but the table is pretty easy to parse.
>>>
>>> <Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="7"
>>> x:FullColumns="1"
>>> x:FullRows="1" ss:DefaultRowHeight="15"> <Row> <Cell
>>> ss:StyleID="s62"><Data ss:Type="String">row 1</Data></Cell>
>>> <Cell><Data ss:Type="String">question 1</Data></Cell> <Cell><Data
>>> ss:Type="String">answer 1</Data></Cell> </Row> <Row> <Cell
>>> ss:StyleID="s62"><Data ss:Type="String">row 2</Data></Cell> <Cell
>>> ss:Index="3"><Data ss:Type="String">answer 2</Data></Cell> </Row>
>>> <Row> <Cell ss:StyleID="s62"><Data ss:Type="String">row
>>> 3</Data></Cell> <Cell ss:Index="3"><Data ss:Type="String">answer
>>> 3</Data></Cell> </Row> <Row> <Cell ss:StyleID="s62"><Data
>>> ss:Type="String">row 4</Data></Cell> <Cell><Data
>>> ss:Type="String">question 2</Data></Cell> <Cell><Data
>>> ss:Type="String">answer 1</Data></Cell> </Row> <Row> <Cell
>>> ss:StyleID="s62"><Data ss:Type="String">row 5 </Data></Cell> <Cell
>>> ss:Index="3"><Data ss:Type="String">answer 2</Data></Cell> </Row>
>>> <Row> <Cell ss:StyleID="s62"><Data ss:Type="String">row
>>> 6</Data></Cell> <Cell><Data ss:Type="String">quest </Data></Cell>
>>> <Cell><Data ss:Type="String">answer 3</Data></Cell> </Row> <Row>
>>> <Cell ss:StyleID="s62"/> </Row> </Table>
>>>
>>>
>>>
>>>
>>>
>>> Ryan Engel <mailto:[log in to unmask]>
>>> June 13, 2014 at 4:28 PM
>>> Hello -
>>>
>>> I have an Excel spreadsheet that, for the purposes of an easy import
>>> into a Drupal site, I'd like to convert to XML.  I know people more
>>> knowledgeable than I could code up something in Python or Perl to
>>> convert a CSV version of the data to XML (and I have a colleague who
>>> offered to do just that for me), but I am looking for recommendations
>>> for something more immediately accessible.
>>>
>>> Here's an idea of how the spreadsheet is structured:
>>>
>>> Row1    Question1    Q1Answer1
>>> Row2                        Q1Answer2
>>> Row3                        Q1Answer3
>>> Row4    Question2    Q2Answer1
>>> Row5                        Q2Answer2
>>> Row6    Question3    Q3Answer1
>>> etc.
>>>
>>> How do other people approach this?  Import the data to an SQL
>>> database, write some clever queries, and then export that to XML?
>>> Work some wizardry in GoogleRefine/OpenRefine?  Are scripting
>>> languages really the best all around solution?  Excel's built in XML
>>> mapping function wasn't able to process the one-to-many relationship
>>> of questions to answers, though maybe I just don't know how to build
>>> the mapping structure correctly.
>>>
>>> In the interest immanent deadlines, I have handed the spreadsheet off
>>> to my Perl-writing colleague.  But as a professional growth
>>> opportunity, I'm interested in suggestions from Libraryland about
>>> ways others have approached this successfully.
>>>
>>> Thanks!
>>>
>>> Ryan Engel
>>> Web Stuff
>>> UW-Madison
>>>
>>
>> --
>>
>>
>>

-- 
Shaun Ellis
User Interface Developer, Digital Initiatives
Princeton University Library
609.258.1698