Print

Print


> On 28 Jul 2022, at 00:01, Fitchett, Deborah <[log in to unmask]> wrote:
> 
> Um, firstly that xml contains personal information including names, addresses and emails, so unless it's dummy data ... anyone know if there's a way we can remove it from the list archives at least?
> 
> As to the question:
> * Excel can import xml data (and once you can get it into Excel at all it should be possible to manipulate how you want) through Data > Get Data > From file > XML
> * But it's very unhappy with this particular xml file. Looking at it in Notepad++ you need to move the first XML declaration (currently in line 2) to line 1 (above <root>), and then delete every other XML declaration.
> * Then Excel will load it, but of course you need to help it expand out the data you want from all the nestings. When you get the option after loading it, click on Transform data and you'll get to Power Query Editor.
> 
> At that point it's beyond me but there are probably tutorials online or you could play around with it until it submits, depend on your preference! Or someone may know of a friendlier tool.
> 

Once you got a valid XML file (using the steps mentioned by Deborah above) another option is OpenRefine (instead of Excel). OpenRefine will import the XML without any additional steps and put it into a tabular format for you - the way it manages this is to spread a single record across multiple rows in a data grid to deal with the hierarchical nature of the original data (in this case there isn’t a huge amount of hierarchy, but there are repeated categories). OpenRefine has a Records mode (where it understands that several rows of data make up a single record) to deal with this exact scenario.

`Once you have the data in OpenRefine you can manipulate it as you need to (rename columns etc.) and then export to Excel or CSV (OpenRefine supports both) as necessary.

There are lots of other options for converting XML to CSV - there are online converters and you can write code or transformations as XSLT - but for me OpenRefine is a nice spot between power and simplicity for working with this kind of data

OpenRefine download page: https://openrefine.org/download.html <https://openrefine.org/download.html>
OpenRefine documentation: https://docs.openrefine.org/ <https://docs.openrefine.org/>
OpenRefine tutorial aimed at library workers: https://librarycarpentry.org/lc-open-refine/ <https://librarycarpentry.org/lc-open-refine/> 

Owen