Print

Print


On Oct 29, 2021, at 9:46 AM, Eric Lease Morgan <[log in to unmask]> wrote:

> I have a database with the following structure:
> 
>  create table bibliographics (
>      identifier  TEXT PRIMARY KEY,
>      title       TEXT
>  );
> 
>  create table keywords (
>      identifier TEXT,
>      keyword    TEXT
>  );
> 
>  create table entities (
>      identifier TEXT,
>      entity     TEXT
>  );


Thank you for all the useful replies! After plenty o' reading and plenty o' experimentation, I've decided the use of DISTINCT is the quickest and most immediately practical way for me to remove duplicates, like this:

  SELECT b.identifier,
         b.title,
         GROUP_CONCAT( DISTINCT( e.entity  ) ) AS entities,
         GROUP_CONCAT( DISTINCT( k.keyword ) ) AS keywords
   FROM bibliographics AS b
   LEFT JOIN entities  AS e ON e.identifier = b.identifier
   LEFT JOIN keywords  AS k ON k.identifier = e.identifier
   GROUP BY b.identifier;


The result is not beautiful but more than functional. For example, here are a few results from a few different records:

  identifier = author-homer-gutenberg
       title = The Works of Homer
    entities = achilles,apollo,hector,ilius,jove,menelaus,minerva,son of atreus,troy,ulysses
    keywords = achaeans,achilles,hector,jove,menelaus,telemachus,trojans,ulysses

 identifier = title-AustenSense_1811-gutenberg
       title = Sense and Sensibility by Jane Austen
    entities = barton,cleveland,dashwood,dear ma'am,dear, dear elinor,delaford,devonshire,edward,jennings,john willoughby,london,lucy,marianne,marlborough,middletons,no, elinor,norland,now, edward,selfish,willoughby
    keywords = dashwood,edward,elinor,jennings,lucy,marianne,palmer,willoughby

 identifier = crawl-planet20210207-code4lib
       title = A crawl of Planet Code4Lib (October 7, 2021)
    entities = detroit,donald trump,function,google,leaflet,madoff,rails autoscale,tether,the readme project,topics,trump,twitter,u.s.,us,windsor
    keywords = data,digital,google,university


While I know the use of a sub SELECT would produce more expressive results, I could never get it to work. Too complicated; the use of DISTINCT is more elegant.

Fun with SQL.

--
Eric Morgan