Commons:Village pump/Proposals

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Shortcuts: COM:VP/P • COM:VPP

Welcome to the Village pump proposals section

This page is used for proposals relating to the operations, technical issues, and policies of Wikimedia Commons; it is distinguished from the main Village pump, which handles community-wide discussion of all kinds. The page may also be used to advertise significant discussions taking place elsewhere, such as on the talk page of a Commons policy. Recent sections with no replies for 30 days and sections tagged with {{Section resolved|1=--~~~~}} may be archived; for old discussions, see the archives; the latest archive is Commons:Village pump/Proposals/Archive/2022/11.

COMMONS DISCUSSION PAGES (index)
Please note
  • One of Wikimedia Commons’ basic principles is: "Only free content is allowed." Please do not ask why unfree material is not allowed on Wikimedia Commons or suggest that allowing it would be a good thing.
  • Have you read the FAQ?

 
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 5 days and sections whose most recent comment is older than 30 days.

Change the name of POTY from "POTY [year the photos were promoted to FP]" to "POTY [current year]"[edit]

This year, last year, and some years in the past we have run Picture of the Year late in the year. This causes some confusion when we run, say, "POTY 2021" in late 2022. It sounds like we're talking about last year's event. The reason we name it that way is because the candidates are Featured Pictures promoted in 2021. I'd argue that it's less important for voters to know when the pictures went through the FPC process than to clearly communicate the name of the current event.

Proposal: Starting next year, the event which votes on FPs promoted in 2022 will be called "Picture of the Year 2023". I am not proposing to change this year's because it starts soon and this change would likely require some modifications to the scripts that generate it.

  • Symbol support vote.svg Support as proposer. — Rhododendrites talk |  13:14, 27 October 2022 (UTC)Reply[reply]
  • Pictogram-voting question.svg Question Is there any reason not to start early in the year, maybe February? So it would be quite common to name it after the last year. --Stepro (talk) 13:20, 27 October 2022 (UTC)Reply[reply]
  • That would be great, but for as long as we have had POTY, we have had trouble getting people to run it. The technical side is not something one can undertake with no knowledge, and there's no real documentation available, so there aren't many people who can run it. It would be nice to get a more sustainable system set up, but as ever, Commons could use more techfolk, and I see no reason to expect a change for next year. Ultimately, I think unless we can somehow bet on someone setting it up in January/February every year, we should change the name. — Rhododendrites talk |  13:53, 27 October 2022 (UTC)Reply[reply]
    "there's no real documentation available, so there aren't many people who can run it" - I think this is the problem. Instead of changing the name, it would be much more important and sustainable to document the process and find Wikipedians to join the organization of the vote. Stepro (talk) 16:32, 27 October 2022 (UTC)Reply[reply]
  • Yes that would be better. Are you volunteering? Many of us have advocated for it (and/or other changes) for years. At some point you have to do your best with what actually exists. POTY is much more outward facing than our other process so we owe it to the broader community to be as clear as possible about what it is. — Rhododendrites talk |  17:28, 27 October 2022 (UTC)Reply[reply]
  • Symbol oppose vote.svg Oppose It is the picture of the year when it was promoted as FP. Having the current year in the name but voting on last years pictures does not make sense. Many publications are published late in the following year with the year they are about in the title. --GPSLeo (talk) 16:26, 27 October 2022 (UTC)Reply[reply]
  • ? What proportion of our hundreds of voters do you suspect understand that they're voting specifically on photos promoted in some internal process called featured pictures candidates in a specific year vs. just look at the pictures and vote? — Rhododendrites talk |  17:28, 27 October 2022 (UTC)Reply[reply]
    I think there are criteria to have the right to vote on the photos. So you will have some knowledge about Wikimedia projects when you are voting. And if we are only talking about the voting we could just make a banner without a year. The published results should definitely have the year as they have it now. --GPSLeo (talk) 17:54, 27 October 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support - I get the current way and I don't ...., It would make much more sense to have it for the year it's being run instead of being a year behind .... but that being said if Rhododendrites is correct regarding lack of people behind the scences then maybe POTY should be done away with completely. I enjoy participating in it but could I live without it ? Absolutely. –Davey2010Talk 18:52, 27 October 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support --Kritzolina (talk) 19:18, 27 October 2022 (UTC)Reply[reply]
  • Symbol oppose vote.svg Oppose – There are other contests that take place in the year following the season "season" they are deciding winners for, e.g. the Academy Awards, the Super Bowl. I'm used to POTY being the same. What's the possibility of running both the 2021 and 2022 contests early next year, assuming people can be found to run them. What level of technical competence is required to manage the contest? Dhtwiki (talk) 05:43, 29 October 2022 (UTC) {edited 04:01, 9 November 2022 (UTC))Reply[reply]
    • The Super Bowl and Oscars in 2022 aren't called "Super Bowl 2021" and "Academy Award 2021". If you search anywhere for "Super Bowl 2022", though, you're taken to the Super Bowl that took place in 2022. If you search for "Oscars 2022", you're directed to the Oscars that took place in 2022. — Rhododendrites talk |  12:41, 29 October 2022 (UTC)Reply[reply]
      • Super Bowl 2022 decides the 2021 season; the Oscars awarded in 2022 are for films from 2021. The way that POTY is currently labeled seems more logical and shouldn't be confusing. Dhtwiki (talk) 00:08, 30 October 2022 (UTC)Reply[reply]
NO.--RZuo (talk) 22:08, 4 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support Like the Oscars, I agree that the format "POTY [year]" naturally implies that the competition was held that year, rather than some criteria of the pictures that appeared in that instance of the competition. So if we are to keep this formatting, I think it makes sense to go ahead with the above proposal. But I don't think this is optimal, because the important thing IMO is the year that the pictures were promoted, not the year when the competition is held (in the Oscars case, it seems the competition itself is sometimes more important than the films). So ideally, we'd be able to come up with something like "2006's Picture of the Year" or "Picture of the Year from 2006" which imply that the pictures are from 2006, rather than implying the competition ran in 2006. (Admittedly neither of these names are as snappy as "POTY 2006".) Renaming to more clearly align with the photo's promotion date might also allow us to do something like "2021-2022's Pictures of the Year" if we aren't able to run a competition annually. -M.nelson (talk) 12:11, 5 November 2022 (UTC)Reply[reply]
    Changing my 'vote' from Unsure to Support as I support changes in general to resolve the confusion that comes from the current naming scheme. -M.nelson (talk) 19:33, 18 December 2022 (UTC)Reply[reply]
  • Symbol oppose vote.svg Oppose I really want to have it much earlier in the year. We already have November, and it hasn't started yet. For me this is the problem, not the name. --Stepro (talk) 12:44, 5 November 2022 (UTC)Reply[reply]
    • I think everyone wants this (and has wanted this for many years), but for now it seems safe to assume it will continue to be sometime in the second half of the year. — Rhododendrites talk |  13:06, 9 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support Given the current state of POTY, this makes sense to me. I think it's a good idea to call it POTY 2023 and just note that the pictures were promoted to FP in 2022 - people can figure that out. My hat is off to anyone who has worked to make POTY happen, it is a lot of work that unfortunately receives complaints at times or mostly is ignored. I can understand why it's hard to find people with the motivation and skillset. As with many things in life, it's easy to ask why POTY is so late, but much harder to take up the task and make it happen. Glennfcowan (talk) 04:29, 7 November 2022 (UTC)Reply[reply]
  • I would tend to Symbol oppose vote.svg Oppose. -- Ikan Kekek (talk) 19:53, 8 November 2022 (UTC)Reply[reply]
  • I think there is room for a compromise, like take FPs from November 2022 to October 2023 and then run the vote in November/December 2023. This way the majority of photos are nominated in the current year. I think this follows how most people online do their "best of the year" proclamations in December even though the year isn't technically over. To get on this schedule, we'd want to do the 2022 POTY in Feb/March, and then align to the new schedule. Legoktm (talk) 08:57, 15 November 2022 (UTC)Reply[reply]
    • For example, here's the AP's top 2022 photos being announced at the beginning of December. Legoktm (talk) 16:28, 1 December 2022 (UTC)Reply[reply]
    • The problem is that people procrastinate until the last reasonable second, not that it actually takes 11 months to run. If we try a new schedule, then eventually the Nov 2025 - Oct 2026 batch won't be run until Sept 2027. -- King of ♥ 01:42, 7 December 2022 (UTC)Reply[reply]
      +1 Stepro (talk) 22:03, 18 December 2022 (UTC)Reply[reply]

Adopt Commons:Depiction guidelines as a guideline[edit]

Proposal: Adopt Commons:Depiction guidelines as an approved guideline and have it listed on Commons:Policies and guidelines

Rationale. Whilst I know there are different positions and arguments on the topic, the absence of agreed general guidelines create confusion.

  • There is a first rather historical page where the general topic is introduced Commons:Depicts, which includes some historical recommandations (apparently outdated given the discussions I could read). This page also include good contextual information about what a depict is, how to add them, what it is useful for etc. In short, this page goes beyond guidelines.
  • There is a more recent page going into a lot of details, Commons:Structured data/Modeling/Depiction. Whilst the essence of it is good, the high level of detail, some unfinished or outdated examples etc. make it difficult to propose as a guideline page. It is also hard to find and maybe too complex a page for a newcomer or occasional contributor.
  • There is no easily found page that an occasional contributor could find to get the basics (for example, there is no mention of the topic on Commons:Policies and guidelines)
  • There is currently no page where we can point a newcomer to, for them a basic understanding of the topic.
  • And... of course... perhaps due to historical reasons and wishful hopes about Wikidata, there is no clear agreement about depicts addition, resulting in editors reverting others, which is simply... a loss of everyone time :)

Hence, the proposal to adopt a short and basic guidelines page, which is a curated version build on the two other pages and on discussions I could find out on talk pages. Thanks for your attention

Anthere (talk) 11:30, 28 October 2022 (UTC)Reply[reply]

  • I appreciate this effort. In general this looks good. I added a couple little bits that I think are uncontroversial. My only objection is that it includes contradictory language. It says you should add multiple depicts (P180) statements, both general and specific then two paragraphs down it is recommended to add multiple depicts statements, both general and specific. However, this redundant tagging is disputed so please don't do it on a large scale.. Do it, but it's disputed so don't do it much? From what I've seen, the only reason it's disputed is because of (a) misunderstanding about what depicts is for, (b) misunderstanding its similarity to categories, and/or (c) carry-over best practices from working with categories. Can we just get rid of that "As a consequence..." line altogether and see how it goes? — Rhododendrites talk |  13:25, 28 October 2022 (UTC)Reply[reply]
    Looks good to me. Anthere (talk) 20:42, 28 October 2022 (UTC)Reply[reply]
    Hi, I agree with @Rhododendrites. I have edited these sections for more precision. Furthermore, I have added an example in the section on the differences between Structured Data and the Commons Category system. Beat Estermann (talk) 11:17, 2 November 2022 (UTC)Reply[reply]
  • I think we should improve the original guideline draft and not create a new one, also for keeping the version history. --GPSLeo (talk) 15:59, 28 October 2022 (UTC)Reply[reply]
  • But which original ? As I mentioned earlier... the first page include elements way beyond the guidelines and still include historical recommandations. The second would be more like it, but is a work-in-progress that would require serious work before being ok. A V2 ? None of those two pages will disappear. I will add a note in the talk page to better credit. Anthere (talk) 20:42, 28 October 2022 (UTC)Reply[reply]
  • We may want to add a paragraph on where to use depicts (P180) and where to use digital representation of (P6243), and - if applicable - how to use them in combination with each other. Beat Estermann (talk) 11:26, 2 November 2022 (UTC)Reply[reply]

So... about Commons:Depiction guidelines

  • Symbol support vote.svg Support the current version of the proposal Anthere (talk) 06:53, 5 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support - a very timely guideline that adds to greater understanding of expectations of depicts, etc. Islahaddow (talk) 09:15, 9 November 2022 (UTC)Reply[reply]
  • Symbol oppose vote.svg Oppose - This is very different from what the current guideline, Commons:Depicts, says (emboldening in original): These generic "tags" should not currently be added if more specific depicts statements already exist We should not be recommending to tag a picture of bearded collie with "dog"; such use should be strongly deprecated. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:22, 9 November 2022 (UTC)Reply[reply]
    • Yes that's the problem with the old version. Parts of it were written with an early misunderstanding of how this system actually works. It's not the category system. — Rhododendrites talk |  12:30, 9 November 2022 (UTC)Reply[reply]
      • I'm not under any misapprehension that "It's the category system", nor do I misunderstand how it works. However, I am aware of good practice in computing, and cataloguing, and not storing or repeating redundant information. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:39, 9 November 2022 (UTC)Reply[reply]
        • Just to clarify. Commons:Depicts isn't the current guideline it is one of two pages (the other being Commons:Structured data/Modeling/Depiction) which users refer to for guidance. The main purpose of this proposal is resolving the conflict between these pages so that there is a clear page to point e.g. completely new users to. /André Costa (WMSE) (talk) 10:12, 10 November 2022 (UTC)Reply[reply]
          • That does not "clarify". Something written for people to "refer to for guidance" is by definition a guideline. If the aim is "resolving the conflict between these pages", then the solution is to open an RfC on the correct way to use the property, not to write yet another guideline. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:31, 10 November 2022 (UTC)Reply[reply]
    I tend to agree with @Andy on this. However, we should make sure that there is a subclass/superclass relationship between "dog" and "bearded collie" in Wikidata, which was not the case until I made the following edit. If the edit remains unchallenged, I would suggest that we adapt the example in the guidelines accordingly. Beat Estermann (talk) 12:31, 14 November 2022 (UTC)Reply[reply]
    I have updated the guidelines. The issue pointed out by @Andy Mabbett should be resolved now.
    It might be useful to keep a list of relationships expressed on Wikidata, which software programs making use of depicts statements are expected to resolve on their side. At this point, I guess we would mainly expect them to handle subclass / superclass relationships, (failing to do so would lead to a situation where we would tend to clutter the metadata with quasi-meaningless depcits statements along every ontological tree), parent-child taxon relationships, "same as" relationships, and synonyms.
    Do we also expect them to handle <instance of> organisms known by a particular common name (Q55983715)? (this would be required to get from Rough Collie (Q38650) via dog (Q144) to Canis familiaris (Q20717272) or to Canis lupus familiaris (Q26972265)) - Personally, I think this is asking too much from software programs.
    -- Beat Estermann (talk) 15:44, 21 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support per what I wrote above. — Rhododendrites talk |  12:30, 9 November 2022 (UTC)Reply[reply]
  • Symbol oppose vote.svg Oppose The topic seems to be too fresh and too unsettled. We have skilled editors disagreeing about what should or shouldn't be added? If I have a picture of Winston Churchill taken in 1942, I need to add depicts man, prime minister, and WC? Let the structured data people get consensus about what should be done and then come back. Glrx (talk) 19:23, 9 November 2022 (UTC)Reply[reply]
    • @Glrx: There is already consensus across the Structured Data on Commons people. There are just some Commons regulars who think it's inefficient and should work more like the category system. It does not work that way. The debate is really between using structured data on Commons like it's supposed to be used, having a whole bunch of different standards/expectations and winding up with a mess, or not using it at all. As far as I'm concerned, the first one is the only one that makes sense, and all of this "it's inefficient" business can always be redirected to lobby developers to build out some form of hierarchical depicts framework for the future. That does not exist now, and there are no plans for it to exist in the future (at least last I heard). It's been three years, and 99% entirety of the confusion around this is centered on the "but I want it to be hierarchical" argument, which was codified by a handful of Commons users working with incomplete information on that initial Depicts page + inertia. — Rhododendrites talk |  19:33, 9 November 2022 (UTC)Reply[reply]
      But why should we use depicts (P180) for the general statements? Why not using two different properties? main subject (P921) also exists or we can create a new "tag" property. --GPSLeo (talk) 19:42, 9 November 2022 (UTC)Reply[reply]
      Please don't dismiss our valid objections by mis-describing us in this fashion. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:55, 9 November 2022 (UTC)Reply[reply]
      +1 --Beat Estermann (talk) 13:05, 14 November 2022 (UTC)Reply[reply]
    I think it is important that we settle on a version of the guidelines that reflects the current state of discussion / rough consensus within the community - be it just for the very pragmatic need of having a basis for telling (new) users how to approach the issue in practice. In my opinion, having one common, admittedly imperfect version of the guidelines out in the open (e.g. by linking to them directly from help pages, the ISA Tool, etc.) is be better than having a couple of contradictory versions tucked away somewhere in the depths of the wiki.
    Regarding your concrete example: If the man on the image is known to be "Winston Churchill", there is no need to add "man"; this can rather easily be inferred from the information on Wikidata. Maybe we should come up with a list of common cases where we assume that the tools making use of "depicts" statements should be able to make the inference on their side.
    The "prime minister" part of your example is more tricky: I think there is currently agreement that we should not directly add occupation (or role) in depicts statements, but we are lacking clear instructions (at least in our current version of the guidelines) how exactly to express this type of relationship: <depicts> "Winston Churchill" <in the role of> "prime minister" (please correct me if I am wrong and a solution has already been found to deal with this issue). On second thought, the same issue would also apply to the Lassie example, "Lassie" being a role played by a specific dog. - In my view, the way forward is to accept some level of fuzzyness for now and to further refine our practice (and along with it the guidelines) as we move on.
    --Beat Estermann (talk) 13:01, 14 November 2022 (UTC)Reply[reply]
    @Beat Estermann "Lassie" was a role played by a number of dogs, starting with "Pal" (who took the stage name of "Lassie") and continuing with descendants of that dog. See en:Lassie and en:Pal (dog).   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 13:15, 14 November 2022 (UTC)Reply[reply]
    There is "how it currently works, according to those who developed and run the structured data on commons team" and there is "how some Commons users think it should work". The guidelines should reflect how it does work, not how some people think it should/could/might in the future. The "issue" you're referring to is a desire for it to work differently. I don't disagree with the idea that it would be great to better integrate other properties, but whatever guidelines we have should reflect the current situation, and the people who implemented structured data on Commons say depicts for Lassie should include "dog". Anyone who wants it to change should work with the structured data on Commons folks, lobby for changes, and then, when they're ready, we update the guidelines. Last I heard, there were no plans to make depicts "inherit" higher level items (like dog from collie). Yes, it would also be nice to get some sophisticated AI to automatically detect the subject, too, but we shouldn't put "the AI will take care of it; don't add any categories" into a guideline. — Rhododendrites talk |  16:38, 21 November 2022 (UTC)Reply[reply]
    "how it currently works, according to those who developed and run the structured data on commons team" I dispute that "how it currently works" is accurately captured by the proposal; and the views of the "the structured data on commons team" are of no special authority in deciding policy; it is the Commons community as a whole which is sovereign here. You do not speak for the "the structured data on Commons folks". "dog from collie" is already inherited by SDC, since it relies on Wikidata, where that relationship is encoded. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:59, 21 November 2022 (UTC)Reply[reply]
    No, I am not connected to the SDC folks and obviously don't speak for them. I'm relaying what I've learned from them across a number of conversations. But maybe I've missed something -- you're saying that a search for "dog" using depicts on Commons will return pictures of collies even if the word "dog" isn't mentioned in depicts, categories, description, or filename? If that's true then I've seriously misunderstood something, or perhaps that inheritance just isn't as consistent as we're used to? What am I missing? — Rhododendrites talk |  19:30, 21 November 2022 (UTC)Reply[reply]
    I'm not saying anything of the kind. If I were you would read it in my post. However, your question - and your post below - implies an apparent fault with search, and if that is the case, search needs fixing; the solution is not to resort to crude keyword stuffing, and thereby irredeemably breaking SDC. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:20, 22 November 2022 (UTC)Reply[reply]
    The comments above reinforce the impression that the topic is not yet settled, so an official guideline is inappropriate. I do not buy the argument about describing how it works today. The comment suggests that the project does not have a clear idea of what it is trying to accomplish. I do not want the community told to do something today and then told what they should have done something else tomorrow. Glrx (talk) 19:10, 21 November 2022 (UTC)Reply[reply]
    I think that's reasonable. I also think it's reasonable to want it to work differently. My hope is that one day we can get a full suite of well-integrated properties that are easy for people to add here. However, depicts is not something that's being workshopped for implementation in the future -- it's already here, and used by an awful lot of people every day. We should have clear guidance about how to use the tool that exists today without assuming that major changes will be made that allow it to work differently. This precise discussion has been going on for years now. Right now, unless I've missed something, a search for "dog" does not return collies based on depicts. if we don't want collies to show up in searches for dogs reliably, then tell people not to tag with dog. If we want collies to come up when someone searches for dogs, we should have them tag it as dog. Then, if it should work otherwise, start the work to make it work otherwise and update the guidance accordingly. If anyone is certain that development will someday come, then I suppose we can say that searching for dog will someday be able to return pictures of collies, so let's not bother adding "dog" now even if it means limited functionality in the short-term -- I would get that. I just don't support that because who knows when/if that will actually change. I'll make this my last comment here for a while btw. Don't intend to bludgeon. — Rhododendrites talk |  19:26, 21 November 2022 (UTC)Reply[reply]
  • Symbol support vote.svg Support Having two opposing sets of guidelines prevents new users from being able to use Depicts at all. While it would be great to have search use the hierarchy encoded on Wikidata the fact is that this isn't done and is also not likely to be done (thanks El Grafo for the link to the mailing list). Having broader depicts tags is still useful for multilingual search as the broader concepts are a) more likely to have many translations, b) more likely to be a term users actually search for. I'm personally not to worried about people tagging it too broadly if you make it clear in the guideline that you tag as broadly as you believe it would be useful to search. /Lokal_Profil 09:52, 1 December 2022 (UTC) (not with my WMSE hat on)Reply[reply]
  • Symbol support vote.svg Support I don't think it is technically complex task to do distinction between general and specific OR detect if there is already more specific tag already set. If we try to enforce this then it would be a problem when users are importing images from source databases and would try to import the metadata alas. From this point of view it would be much easier just to scrap the limitation. (I have coded "more specific category detection code for imports couple of times and based on these this feels like problems filled rabbithole which is too complex to follow for most of the editors ) --Zache (talk) 04:46, 7 December 2022 (UTC)Reply[reply]

Depiction guidelines discussion: Comment from WMF Structured Data Team[edit]

  • Pictogram voting comment.svg Comment Do we have any comment or knowledge from the mediasearch team on how they are using the data / how they would like to use it? Currently, it seems that we want to add more general and more specific level tags as it makes querying easier (maybe?). It also makes adding the P180 values easier (based on for example source databases) as you don't need need to handle things like detecting if there is a more specific tag already which is kind of a complex thing to detect automatically. However, it would be interesting to know if the mediasearch team which is using the data for multilingual search has an opinnion that more overlapping tags are better than fewer specific. -- — Preceding unsigned comment added by Zache (talk • contribs) 02:25, 29 November 2022‎ (UTC)Reply[reply]
  • Question: is there any discussion anywhere of why searching down the hierarchy is difficult, and is being postponed to the indefinite future? It's not like the hierarchy changes all the time. It seems to me that it would be pretty easy to trace from an item back to the root every time the item's instance of (P31) or subclass of (P279) value is changed—that is, take a moderate hit at "write" time, in database terms—and then this data could be used in the search ("read" time) without incurring any large read-time costs. - Jmabel ! talk 16:28, 29 November 2022 (UTC)Reply[reply]
    @RIsler (WMF) and Keegan (WMF) made some statements in that direction Commons_talk:Structured_data/Get_involved/Feedback_requests/Computer-aided_tagging_designs#generic_vs_specific_tags here. I think it would be really helpful, if the people working on MediaSearch would elaborate a bit on this:
    • Can we expect searching for "dog" finding files tagged as dalmatians only at some point in the future?
      • If no: why?
      • If yes: when?
    Or in other words: is it worth holding out for something that might work at some point in the future if that means that things are less useful than they could be right now? El Grafo (talk) 08:20, 1 December 2022 (UTC)Reply[reply]
    Related thread on the mailing list (Jheald, 2018): Wikidata considered unable to support hierarchical search in Structured Data for Commons. TL;DR: "Wikidata is too chaotic to reliably deliver the info that would be needed for this." Ouch. Don't know if that has improved since 2018. El Grafo (talk) 08:37, 1 December 2022 (UTC)Reply[reply]
    When thinking of the many issues that can still be found in the ontological structure at Wikidata, I can relate to that. It would be interesting to know, though, whether an attempt has been made to formalize the requirements with regard to the ontological structure at Wikidata from the point of view of hierarchical search. In this case, its evolution could be tracked, and we would at least know whether things are getting better or worse over time. Once we know that, we could devise and implement strategies to make it better. --Beat Estermann (talk) 12:03, 1 December 2022 (UTC)Reply[reply]
    My team (Structured Data) worked on SDoC, and I personally spent quite a bit of time trying to figure out how to search down the hierarchy during the project - actually specifically using the technique you describe @Jmabel. We gave up on it because it added lots of noise to the search index, and very often failed to add what we wanted - see here for more details. The underlying problem is, as @Jheald says in that email thread, that the wikidata ontology is too chaotic
    More recently we tried to use the WDQS to search down the hierarchy at search time. It was more successful, but being based on SPARQL was too slow for use in production, and even if it had been quick we were nervous about putting too much pressure on the query service.
    In mediasearch on Commons we're prioritising the user getting good results ("precision" in search-speak), rather than making sure that a particular image is returned ("recall" in search-speak). Currently if a user searches for 'poodle' then we return images:
    So I guess a question that needs answering here is - are users getting sufficiently good search results without having to drill down through the hierarchy? And if they are then how important is that a particular image of a poodle is returned when someone searches for 'dog'?
    If you guys think it's sufficiently important to take action on then I guess my team has more investigation to do (though I should point out that we are jam-packed with work at least until the end of the current US fiscal year, and wouldn't be able to do anything on this before then) CParle (WMF) (talk) 12:43, 2 December 2022 (UTC)Reply[reply]
    @CParle (WMF) Thank you for the response, very much appreciated! To answer your last question first, yes, I think this is very important to figure out, for reasons you may not be aware of. Finding pictures tagged as Poodle when searching for "dog" was one of them major selling points of SDC when the project was started - or at least it was perceived like that by many, including myself. Some have given on on that, but some are still hoping and some seem convinced that it's the only way forward. If we want to move forward with SDC, we (the Commons community) need to figure out if it will be sufficient to tag poodles as such or if we also need to tag them as dogs. Personally, I'm fine either way. But I think I'm not the only one thinking along the lines of meh, I'll start tagging my uploads once we've figured out how to actually do that. This is a major blocker for SDC acceptance in the community, imho.
    Some semi-random thoughts to keep the discussion going:
    • Looking at your list of problems in the Wikidata ontology at the top of phab:T199119, it seems like many of them have been resolved meanwhile. To me it seems like a lot of these problems are due to errors in Wikidata - things that are fixable once someone is aware of them. That way, a search on Commons returning unexpected results may well be the trigger that's needed to fix something on Wikidata. We see that happen in other circumstances all the time.
    • The poodle/dog example may not be a good one. Searching for "dog" will give you plenty results so you won't particularly miss a couple of poodle photos that the search didn't find because id didn't know that poodles are dogs too. But for less conventional topics, you may need to dig deeper, and you may need the search function to do that for you. If I don't find enough good images when searchig for "dog", I might try searching deeper for poodles and huskies because I know they are kinds of dogs. But I have no clue about locomotives, so I'd really appreciate the search finding images of Southern Pacific 4449 when I'm searching for steam engine. (Still a bad example, because we have lots of images of steam engines.)
    • Maybe it doesn't need to be an always-on feature. Call it a "comb the desert" mode and false positives are much less of a problem immediately.
    El Grafo (talk) 10:16, 5 December 2022 (UTC)Reply[reply]
    Trying to build a picture in my head here of the user problem we're trying to solve ...
    • From an uploader's point of view I understand that I'd like to be able to tag my image with Southern Pacific 4449 (Q7570267) and have it show up in a search for "steam engine". I'm not sure that's a practical expectation in a collection of millions of images though - we already have thousands of images of steam engines, and even if you tag your image with "steam engine" there's no guarantee it'll show up anywhere near the top of the result set
    • From a searcher's point of view I'm likely to be searching for a good match for my search term rather than a particular image. Ontology-descending search would definitely be useful in this case if we had a node in the ontology with few results, but where nodes lower down had many results. I can't think of a concrete example though, and I'm not sure whether this is a common scenario. If it is I'd love some examples
    • If it's a more strategic thing we're trying to achieve - like to highlight wikidata ontology issues - then maybe we could create some sort of search keyword to trigger a "comb the desert" approach like you describe based on descending down instance of (P31) or subclass of (P279) links. This would have the advantage of not degrading regular search. deepcat search does something similar for categories already, so this might be something that the WMF could explore
    CParle (WMF) (talk) 14:24, 5 December 2022 (UTC)Reply[reply]
    Going in order of your points:
    • Uploaders feeling that their contributions are being valued is good, but not always possible. Making uploaders feel like their work is being disregarded, however is a bad thing. I don't need my images to turn up at the top of the results. But if I upload a picture of a doughnut and I cannot find it at all when searching for "doughnut" just because I was trying to be a good uploader and tagged it more precisely as a chocolate doughnut, that can be quite frustrating. Especially when I took the time to set up proper lighting in a studio etc. - but the search function favors the blurry phone snap of a Berliner someone uploaded with a description like "German doughnuts have no holes lol" (that last bit obviously being irrelevant to the greater question here).
    • It is difficult to come up with real-world examples, partially because only a tiny fraction of files already has depicts statements at all at the moment. But as pointed out below, a simple search for "car" gives you a bit of an idea. This will happen anywhere where experts divide things into smaller classes than the general public would search for, especially in biological taxonomy and technology. Looking at the Category tree might give an indication for that. And to make things worse, it seems like the good pictures with a clearly identified subject get the precise meta data, while the crappy ones stay in the more general categories because even the experts cannot figure out what exactly they show. Compare e.g. Category:Photographic lenses, where all the high-quality pictures of modern lenses are hidden away in the sub-categories. Look at how a search for "airliner" already favors crap like this, because people who know their stuff would never bother to use the word "airliner" anywhere in their file description, file name, or categorization.
    • Yes, deepcat is what I had in mind when proposing this.
    El Grafo (talk) 11:47, 7 December 2022 (UTC)Reply[reply]
    @CParle (WMF): Thanks for posting here. Someone else suggested, if I understand correctly, that the issue of general vs. specific depicts statements could at least in part be solved by using other properties like main subject (P921). Do you have thoughts on that (and how it would work with media search)? — Rhododendrites talk |  14:23, 5 December 2022 (UTC)Reply[reply]
    Hi @Rhododendrites! We could certainly add main subject (P921) to the properties that we search, but that's just adding another search signal to the ones we already use, and doesn't necessarily add weight to either side of the debate
    The overarching questions to me, from a search perspective, are:
    1. Does the current search provide good results for a search term? If not then what specifically are the problems?
    2. If there was a policy of only having specific depicts statements, might that degrade the current search? If it might then we ought to plan an experiment to see how much it'd degrade, and figure out whether any degradation would be mitigable via an ontology-descending search
    CParle (WMF) (talk) 15:14, 5 December 2022 (UTC)Reply[reply]
    Just very quickly, more later: assume we would enforce a policy that required specific "depicts" statements only. That would mean that, ideally, every picture of an automobile would have a depicts statement about its specific type such as Honda Civic (Q216747) only. Actually, given the Commons Community's collective OCD, something like Honda Civic 11th generation (Q107002471) is probably more likely. With how the searcdh currently works, that would mean that for anyone searching for a picture of some kind of car, any "depicts" statement would by design be out of the equation entirely. The search might still give reasonable results based on the other criteria, but to be honest, results for "car" are pretty bad right now.
    We already have this problem in the category system: The more us Commons curators geek out about precisely categorizing things, the more difficult it gets for the average Joe to actually find something (See for example the Category:Gliders tree). Or to view it from the other direction: Media with sloppily coarse meta data is easier to find. SDC/depicts currently has the opposite problem of there barely being any files actually having "depicts" statements at all. But if SDC eventually takes off, we need to be prepared for the pendulum swinging into the opposite direction. El Grafo (talk) 10:27, 7 December 2022 (UTC)Reply[reply]
    It could be a workable solution that the search would use wikidata P180 QID values from categories in the search index. ( Example File:Poodle_(4320409945).jpg -> Category:Poodles -> Category:Poodles (Q55330462) -> category's main topic (P301) -> poodle (Q38904) -> instance of (P31) dog breed (Q39367), subclass of (P279) dog (Q144) )
    This proxying would need hand-tuning, so it would work best if there were a Lua module in commons that could be edited by the community and would output QID list in a format suitable for the search index. As a task, it would be something like the current template:wikidata infobox. Based on a quick test, it would be doable. (example: Module:P180fromCategory) So the biggest question is if adding a module for this to every file namespace will generate too much server load. If it is not then this would be usable way to add more information to search index which would especially improve the multilingual search. -- Zache (talk) 11:46, 7 December 2022 (UTC)Reply[reply]
    The difficulty is that ascending the ontology tends to add lots of noise to the search index - this will be true whether we use Lua or MediaWiki to do it. TBH if we're going to have another look at this I think it'd be better to do it inside MediaWiki itself (more scalable) CParle (WMF) (talk) 11:34, 9 December 2022 (UTC)Reply[reply]
    For crowdsourcing the finding and fixing problems from rules of getting P180 values ( = redusing the noise), the current P180 values from categories need to be visible, and rules to generate them should be editable for wiki editors. Because of this, it would not scale in terms of volunteer human labor work if the editing needed to be done through git and MediaWiki update cycles. However, I am sure that from the technical point of view, it would be more efficient if the code were directly in MediaWiki so it is a trade between how it will scale from the code point of view and how it will scale from a crowdsourcing perspective. -- Zache (talk) 18:01, 9 December 2022 (UTC)Reply[reply]
    Hmm actually that's a fair point. We won't be working on this within the next 6 months, but if we are after that having editor-editable rules is definitely something we ought to look into CParle (WMF) (talk) 10:37, 13 December 2022 (UTC)Reply[reply]
    Ok so my takeaway from this and your other comments above is - without ontology-descending search we're incentivising users to use less precise depicts (P180) values, because their images are more likely to be found that way.
    Makes sense. It doesn't follow that ontology-descending search will be practical to implement, but it's certainly a good argument in favour of it. Like I said my own team is flat-out until July 2023, but I'll raise it as a possible project for after then
    (just fyi there are ~10M images out of ~90M total files on commons with 'depicts' statements atm) CParle (WMF) (talk) 11:49, 9 December 2022 (UTC)Reply[reply]

Protect all nudity and sexuality related files from IP edits[edit]

Files in the field of nude people and files related to human sexuality are much more often vandalized than other files. The in many cases the vandalism consist of harassing language to the people depicted on the photo. If this kind of vandalism if not reverted immediately it is a serious personality rights problem. To solve this problem I would propose that we protect all files under the following categories form IP edits:

An other less invasive solution could be that we do not protect all files, but protecting all files indefinitely if they got vandalized once. --GPSLeo (talk) 15:50, 28 October 2022 (UTC)Reply[reply]

A good idea to protect those categories so that volunteers can spend their time better than checking and repairing vandalism by anonymous users. Wouter (talk) 09:35, 5 November 2022 (UTC)Reply[reply]
@GPSLeo I am somewhat biased towards these kind of categories, and have never observed such vandalism. Was it quickly revdel’d in each case? Or have I just not been looking at enough histories? Brianjd (talk) 14:22, 5 November 2022 (UTC)Reply[reply]
Symbol oppose vote.svg Oppose: "in many cases" is not a convincing argument. I could only support such a motion if the OP could show that, say, 80% of IP edits to such images were vandalism. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:04, 9 November 2022 (UTC)Reply[reply]
Symbol oppose vote.svg Oppose per above. Also because it depends on a hopelessly undefined category. Nude or partially nude people is in Nudity or partial nudity, whose long-running CfD is a mess. I noted there that even bare feet qualify as nudity, at least in one user’s opinion!
The category problems just keep going. Just to pick some random examples, Human sexuality contains Books about human sexuality and Lewinsky scandal. I expect there are many more subcategories that should not be protected in the way proposed here. Brianjd (talk) 10:42, 9 November 2022 (UTC)Reply[reply]
GA candidate.svg Weak support Sexuality-related media could definitely be a magnet for some very bad edits and should probably be held to higher scrutiny than your average content, but this could also just be a rule of thumb than a straight up rule. —Justin (koavf)TCM 10:47, 9 November 2022 (UTC)Reply[reply]
@Koavf How does this work as a rule of thumb? Protection is already handled by administrators, who are elected by the community for their good judgement; they don’t need this rule. Brianjd (talk) 11:43, 9 November 2022 (UTC)Reply[reply]
And as I wrote, it wouldn't be a rule. A rule of thumb is general good advice, so the way it would be enacted is that admins would keep their eyes on these categories and media and pay special attention to them. I don't even understand what you don't understand about what I wrote. —Justin (koavf)TCM 11:45, 9 November 2022 (UTC)Reply[reply]
@Koavf I’m obviously using rule as an abbreviation for rule of thumb. I’m saying that admins already have good judgement and don’t need this rule of thumb on top of that.
Also, admins don’t have time to pay special attention to these categories; they barely have enough time to do things that actually require admin rights. So what are you actually proposing? Brianjd (talk) 11:53, 9 November 2022 (UTC)Reply[reply]
I am proposing what I just wrote. I don't know how to reword it. —Justin (koavf)TCM 12:00, 9 November 2022 (UTC)Reply[reply]
Symbol oppose vote.svg Oppose can you provide evidence of widespread vandalism that would justify the protection of such broad topic? I have seen some but should we also remove the ability for them to list a file for deletion? I think that it would be of no benefit. Bidgee (talk) 12:11, 9 November 2022 (UTC)Reply[reply]
  • Pictogram-voting question.svg Question, wouldn't this also make it impossible for IP users to nominate abusive files of nudity for deletion? Not everyone wants to make an account to combat vandalism and a lot of users who edit with an IP are good faith users. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 12:45, 17 November 2022 (UTC)Reply[reply]
    @Donald Trung Yes, it would make it impossible for unregistered users to nominate files for deletion. Ironically, some of those deletion nominations are the most abusive thing I see on this category of files (and legitimate deletion nominations tend to come from registered users). Even so, Bidgee above also raised this issue, and opposed this proposal on that basis. Brianjd (talk) 12:56, 17 November 2022 (UTC)Reply[reply]
  • Pictogram voting comment.svg Comment Like Bidgee above, I think such a proposal should be based on facts, on actual "evidence of widespread vandalism" in the subject area. That is, if it can be shown with actual data that there is significantly more vandalism and related issues there than on Commons in general, I would support the proposal, otherwise oppose. Gestumblindi (talk) 00:56, 7 December 2022 (UTC)Reply[reply]

Automatically remove "Category:Files with bad file names..." after rename?[edit]

would it be a good idea to let MediaWiki:Gadget-AjaxQuickDelete.js remove "Category:Files with bad file names..." after a file is renamed? because the assumption is that 99% chance the filemover would have renamed it to something not bad anymore. (i'm tired of removing the cat manually afterwards.)

if answer is yes, codes around line 1356 should be edited. RZuo (talk) 11:00, 26 November 2022 (UTC)Reply[reply]

Requests for comment: 2022 overhaul of categories by period[edit]

Commons:Requests for comment/2022 overhaul of categories by period.--RZuo (talk) 07:42, 5 December 2022 (UTC)Reply[reply]

Enable ProveIt or Citoid on Commons[edit]

References are often used here:

However citoid and Help:Gadget-ProveIt aren't enabled here, even if you manually activate the VisualEditor in your Beta settings here. So:

  • Many people don't use the templates and instead write the sources "manually":
    • The format is inconsistent across pages.
    • It's often in the uploader's language instead of a standard international format that anyone can read. Good luck reading references in Chinese, for instance: File:南宋疆域图(简).png.
    • There are bare URLs, often pointing to "dead" websites, using a template would enable bots to automatically archive these URLs as on Wikipedia.
  • Many people copy/paste the equivalent templates from Wikipedia, however these templates aren't identical. For instance "first1" isn't displayed here (see Template_talk:Cite_book#Extra_authors) so the authors are often hidden on Commons ("first" or "author" should be used instead on Commons).

So I suggest enabling ProveIt (remove the "hidden" parameter, see Help_talk:Gadget-ProveIt#Enabled_here?) and/or citoid to allow people to automatically generate citations based on a URL, doi, or ISBN and provide better references to readers. What do you think? A455bcd9 (talk) 12:24, 13 December 2022 (UTC)Reply[reply]

Support for enabling both. (ie. formal citoid support on visual editor at site configuration and adding provelt to gadgets) -- Zache (talk) 12:36, 13 December 2022 (UTC)Reply[reply]
Support since I've seen both tools do much good in other wikis. Sophivorus (talk) 13:16, 13 December 2022 (UTC)Reply[reply]
@Sophivorus: If we enable these tools here they will not be enabled by default: right? Only editors who manually enable them in their preferences will be able to use them? A455bcd9 (talk) 13:23, 13 December 2022 (UTC)Reply[reply]
@A455bcd9 Regarding Proveit, yes. Regarding Citoid, I'm not sure but I think it'll be enabled for all users. Sophivorus (talk) 13:32, 13 December 2022 (UTC)Reply[reply]
@Sophivorus OK for ProveIt. Regarding Citoid, as the Visual Editor isn't enabled by default on Commons (it has to be enabled in the Beta features) I think that all users won't have access to Citoid by default BUT all users who have already enabled the Visual Editor on Commons will then automatically see the Citoid gadget. Am I correct? A455bcd9 (talk) 14:14, 13 December 2022 (UTC)Reply[reply]
@A455bcd9 Probably, yes. Sophivorus (talk) 14:17, 13 December 2022 (UTC)Reply[reply]

Notifying everyone of an RfC[edit]

Hi, just notifying everyone of an RfC happening here. It's discussing files that were poorly imported from other wikis (often just cut and pasted) that don't preserve the history. --Matr1x-101 {user page - talk with me :) - contribs!} 12:43, 19 December 2022 (UTC)Reply[reply]