On Oct 29, 2021, at 9:46 AM, Eric Lease Morgan <[log in to unmask]> wrote:
> I have a database with the following structure:
>
> create table bibliographics (
> identifier TEXT PRIMARY KEY,
> title TEXT
> );
>
> create table keywords (
> identifier TEXT,
> keyword TEXT
> );
>
> create table entities (
> identifier TEXT,
> entity TEXT
> );
Thank you for all the useful replies! After plenty o' reading and plenty o' experimentation, I've decided the use of DISTINCT is the quickest and most immediately practical way for me to remove duplicates, like this:
SELECT b.identifier,
b.title,
GROUP_CONCAT( DISTINCT( e.entity ) ) AS entities,
GROUP_CONCAT( DISTINCT( k.keyword ) ) AS keywords
FROM bibliographics AS b
LEFT JOIN entities AS e ON e.identifier = b.identifier
LEFT JOIN keywords AS k ON k.identifier = e.identifier
GROUP BY b.identifier;
The result is not beautiful but more than functional. For example, here are a few results from a few different records:
identifier = author-homer-gutenberg
title = The Works of Homer
entities = achilles,apollo,hector,ilius,jove,menelaus,minerva,son of atreus,troy,ulysses
keywords = achaeans,achilles,hector,jove,menelaus,telemachus,trojans,ulysses
identifier = title-AustenSense_1811-gutenberg
title = Sense and Sensibility by Jane Austen
entities = barton,cleveland,dashwood,dear ma'am,dear, dear elinor,delaford,devonshire,edward,jennings,john willoughby,london,lucy,marianne,marlborough,middletons,no, elinor,norland,now, edward,selfish,willoughby
keywords = dashwood,edward,elinor,jennings,lucy,marianne,palmer,willoughby
identifier = crawl-planet20210207-code4lib
title = A crawl of Planet Code4Lib (October 7, 2021)
entities = detroit,donald trump,function,google,leaflet,madoff,rails autoscale,tether,the readme project,topics,trump,twitter,u.s.,us,windsor
keywords = data,digital,google,university
While I know the use of a sub SELECT would produce more expressive results, I could never get it to work. Too complicated; the use of DISTINCT is more elegant.
Fun with SQL.
--
Eric Morgan
|