I think for Spain 4-7 averages would be enough (NW, NE, SW, SE, Basque and Canarian + one general average).
Italy and Greece require more but they should be in resonable amount (10-15 for each maybe? others will know better)
Printable View
I think for Spain 4-7 averages would be enough (NW, NE, SW, SE, Basque and Canarian + one general average).
Italy and Greece require more but they should be in resonable amount (10-15 for each maybe? others will know better)
Spain should have 6-7 regional averages at least. Yes, Spanish regional averages are similar in comparison to Greece and Italy. Take all K13 averages and place them at Vahaduo Custom PCA, Italy has an enormous range, far larger than Greece/Greeks, largest in Europe as a matter of fact and perhaps even in the world for a single nation, and as such would be best to keep all Italian administrative regions averages (except those 4 sub-regional mentioned before).
Davidski's Spanish avgs are more or less based on the autonomous communities
https://www.red2000.com/spain/images/r-map-en.gif
Yeah, pretty sure quite a few of them are basically identical. I know country like Portugal doesn't even require regional averages, they are so homogenous :D
Catalans drift towards French, Baques have their own drift (and people living near Basque Lands) and Canarians have strong north African input.
But other than that there aren't huge genetic differences in Iberia, they have more differences on W-E cline than N-S as far as I know.
Both Moldavia and Wallachia were principalities, Muntenia, Oltenia, South and North Moldavia are just way too detailed narrowing down, even if they exist in reality. But the people in these two principalities were either Moldavians or Wallachians, so just two for them would be plenty enough. In my opinion Dobruja can be added to Wallachia, since it was populated by Wallachians after uniting with Romania. Regarding Transylvania, you can pretty much add both Maramures and Crisana to it, since they are too small to exist separately and are culturally connected with Transylvania. In my opinion even Banat can be added to one Transylvanian average, since they are also way too small to exist separately and don't differ much genetically from Transylvanians. 3 main regions for Romanians is better, than having so many neighboring small regions that are very similar to one another.
@Lucas, I've renamed the Russians, it's better when they start with the word Russian. Erzya_Mordovian instead of Erzya is more understandable. At least people can google Mordovia and see it on the map.
Code:Russian_Kargopol,24.92,48.36,7.52,5.54,1.50,0.04,2.49,0.23,6.67,1.70,0.23,0.53,0.26
Russian_Kostroma,26.19,48.54,6.21,5.35,3.19,0.39,1.05,0.69,6.37,1.16,0.52,0.19,0.16
Russian_Pinega,24.72,51.94,5.05,2.82,0.39,0.58,1.58,0.24,9.96,1.96,0.27,0.31,0.23
Russian_Smolensk,28.16,48.47,8.74,5.82,3.83,1.69,1.38,0.17,0.85,0.43,0.38,0.06,0.01
Russian_Southwest,25.91,47.21,8.25,6.80,6.26,0.55,1.00,0.22,2.14,0.81,0.43,0.19,0.23
Erzya_Mordovian,21.08,50.52,6.40,7.60,2.62,0.32,1.78,0.49,7.11,1.50,0.00,0.25,0.33
No, it's really not, too low, look at Vahaduo PCA. I think that for Romania we also should have 6-7 regional, something along NUTS 2 statistical regions of Romania? Ion, do you agree, or at least try making them if we already don't have them?
What you posted are Nomenclature of Territorial Units for Statistics regions, made in the 2000s for EU funds. What I'm speaking are historical considerations, and what I sad are based on historical but also modern day genetic reality of Romanians. For example there are some differences between both Northern and Southern Transdanubia and Alföld Hungarians genetically, but still opted to only one average for each, because why should I divide Hungary more when it's not that important. One average for each historic region is more than enough, no matter if it has some differences between the south and north, east and west of that region.
There are differences and all of them exist for a reason.
Moldavia_North is mixed with Ukrainians, especially those from Bukovina.
Moldavia_South is intermediary between Moldavia_North and Wallachia. Probably one of the "purest" regions in the country, as it didn't have significant historic communities.
Muntenia is mixed with Bulgarians.
Dobruja is a recent colony, populated by people from all over the country and then also some Turks and Tatars mixed in.
Oltenia had a Hungarian presence, which is also noticeable genetically.
Banat had a strong migration from Oltenia.
Crisana is noticeably mixed with Hungarians.
Maramures is like Moldavia_North, as in mixed with Ukrainians, but not as significantly.
And Transylvania is similar to Crisana, but also received significant migration from Wallachia and Moldavia in communist times.
I am not consolidating them, as it will compromise accuracy for no reason at all. For a country of Romania's size and diversity, 10 averages is fair. Better cut the Ethiopian regions or some of the exotic populations from Southeast Asia and Africa, which is not meaningful to most users on this forum.
Your comparison between Hungary and Romania isn't reasonable. Romania has double the population and territorial area than Hungary, as well Romanian regional averages have a wider distance between themselves than those Hungarian with the exception of Hungarian Transylvania which is a Hungarian outlier with an expected inclination toward Moldavia and Romania.
Well, in the end, 10 averages for Romania perhaps are fine. I don't understand what the Ethiopian case has anything to do with Romanian averages. It's unrelated as each country/nation is its own case, it's not like we have a limited number of total averages, and we shouldn't cut any Ethiopian regions because people worldwide are using K13 Vahaduo, as well as Ethiopia has a population of more than 100 million, that's discriminatory...
Of course there won't be pure populations in any regions, since there was always mixing going on, all I'm saying for international purposes and to have more concise samples, it's better to not nit-pick to perfection all these differences within the regions, and rather use more broader regional averages. I even gave as an example about Hungary, where I could also easily nit-pick, but for an outsider how is it helpful to get North Alföd, South Transdanubia and so on, rather than the main historic region, as in Alföld or Transdanubia, which are more meaningful and easy to recognize?
Just checked how do these sub-regions compare to their broader region and you can see that both Crisana and Maramures are very close to Transylvania average, while the distances between Muntenia, Oltenia and Dobruja are even smaller with one another, and fit very well to the broader Wallachian average. Only slightly bigger differences I could notice between Banat and rest of Transylvania, and between both Moldavian regions, therefore maybe these could remain separately, but one broad Wallachia and Transylvania average for Romanians is enough.
Distance to: Romania_Maramures
2.79032256 Romania_Crisana
3.07353217 Romania_Transylvania
Distance to: Romania_Crisana
1.76031247 Romania_Transylvania
2.79032256 Romania_Maramures
Distance to: Romania_Muntenia
0.64575537 Romania_Wallachia
0.77511290 Romania_Dobruja
2.18970318 Romania_Oltenia
Distance to: Romania_Oltenia
1.55319027 Romania_Wallachia
1.77628826 Romania_Dobruja
2.18970318 Romania_Muntenia
Larger population doesn't equal that there should be more regional averages, especially when within a region distances between sub-regions are very close to one another. It's just unnecessary. On another note Transylvanian Hungarians aren't closer to Romanians and Moldavians, but actually to Székelys, Csángós, Croatians and Hungarians.
Distance to: Hungarian_Transylvania
1.16619038 Székely
2.06661559 Csángó
3.74759923 Croat_West
4.67794827 Croat_South
4.93330518 Croat
4.94560411 Hungarian
4.96957745 Hungarian_Alföld
5.43561404 Serb_north
5.44066172 Romania_Moldavia_North
5.47866772 Hungarian_Transdanubia+Budapest
5.71539150 Serb_central
5.91459212 Serb
6.02299759 Bosniak
6.13950324 Romania_Maramures
6.49513664 Romania_Crisana
6.54796151 Croat_East
6.61689504 Moldova_Centre
6.64836822 Croat_North
6.85382375 Moldova_average
7.28390005 Romania_Moldavia_South
7.38275694 Slovenian
7.75199974 Romania_Transylvania
7.83529195 Hungarian_Northern
7.85563492 Moldova_North
I advocate for these Italian averages getting tossed, many of which are subregional averages that aren't necessary IMO (also, do we really need to divide a subregion like Emilia in two?):
Also, IT_Lucania sample is identical to IT_Basilicata (it's the same region but using an archaic name for it), and we don't need the Puglia sample since we already have one.Code:IT_Insubria,32.58,12.51,24.96,7.90,18.79,2.07,0.05,0.27,0.12,0.14,0.21,0.22,0.12
IT_Orobia,33.02,11.21,25.86,6.82,19.58,2.31,0.24,0.13,0.25,0.21,0.05,0.23,0.03
IT_Emilia_ovest,30.28,10.80,24.92,8.49,21.36,2.54,0.31,0.45,0.00,0.32,0.40,0.03,0.04
IT_Emilia_est,28.39,11.31,23.83,8.27,22.84,4.00,0.19,0.30,0.09,0.21,0.32,0.14,0.03
IT_Sannio,18.49,7.95,23.25,13.57,29.86,4.99,0.39,0.03,0.22,0.55,0.15,0.33,0.16
IT_Puglia,18.81,9.37,21.86,14.54,28.77,4.95,0.28,0.05,0.17,0.26,0.44,0.35,0.09
IT_Lucania,20.54,8.01,20.72,14.50,28.73,5.50,0.37,0.08,0.09,0.03,0.46,0.59,0.33
IT_Calabria_citra,16.45,6.16,22.01,14.95,32.59,5.52,0.52,0.23,0.04,0.18,0.6,0.41,0.28
IT_Salento,19.24,6.97,22.95,14.88,30.37,4.23,0.00,0.27,0.03,0.52,0.29,0.16,0.06
IT_Calabria_ultra,16.11,5.87,21.72,14.90,32.65,6.51,0.24,0.34,0.00,0.04,0.17,1.16,0.21
Romani should be deleted, Balkan_Gypsy is better. Moldova_Roma is also weak, only two samples. Ion Basescul's Romania_Muntenia_Roma is more solid
Code:Romania_Muntenia_Roma,10.87,10.98,13.02,16.12,19.24,2.66,23.01,1.43,0.72,0.21,1.21,0.39,0.14
Couldn't you make averages for Russia like west, central, north etc? From these SW Russian is okay, other are based on one city which is pretty dumb. I know you have hundreds of east Slavic results.
I also dislike Ukrainian_Lviv average for example, broader west Ukrainian average would be better.
Honestly I don't feel like doing that. That'd take a lot of effort. On Dodecad we have a few good Russian references, I mostly use D K12b for Russians, it's also easier to have a database in only one format. I started with Dodecad back in 2018, by now many kits might be deleted, so I only have their K12b results.
Lukasz can add a few more (he added some academic stuff to Dodecad). I'd add Ryazan, Tver and Pskov personally.
Cities are not a problem, they can be understood as proxies. Smolensk = one of the Westernmost Russian oblasts. Lviv = proxy for Western Ukraine/Galicia. Kostroma, Kargopol = proxy for Northern Russian. Etc.
Peterski has Ukrainian_Kiev and Belarusian_Minsk, pretty solid academic data set. Both are capitals and iconic places/oblasts. I've only seen K15 which I don't use much. But he won't add shit to an LM project. I've been begging him to give me that data for months and to no avail.
I can confirm, it's quite a pain in the ass to manually run all kits once again on a different calculator and then add them in a sheet in order, then calculate them, and when you have hundreds, such a task totally takes way too many hours to even estimate, days more like. It's why it takes so long for me to advance with my K15 averages for Hungarians, especially that most often don't have super much time for this.
Also western Polish average is really needed as I wrote few pages back. Western Poles drift towards Sorbs, Czechs and East Germans, it's a large country and they deserve an average. As we know Peterski is from there and surely has tons of kits. I know he tested his family and relatives for a start and they are all deeply local.
Well, they are proxy but that feels a little wrong to put "North" when it's supposed to be Kargopol only, which is still a specific place even if technically not too different from neighboring ones.
I didn't know Kargopol myself before I learned it from Gedmatch :)
To be fair, Peterski did add Masurian K13 average to Vahaduo. He gave it to me and I sent it to Lucas. Please rename it to Polish_Masurian, that way it's more clear what it is. :)
As was stated before, first we should look at population size, country area & shape, the establishment of regions and interregional differences, and then genetic proximity of regions averages. It's useful to have more regional averages for Top 10 largest European nations because it gives a shape to the PCA as a whole. It is also good to have less regional averages for countries with lower (2) or intermediate (4) population size and other factors (like Hungary). Look at the current Vahaduo Custom PCA, how the Romanian averages follow up the pattern between the Bulgarian, Moldavian and Yugoslav averages, how they nicely shape the PCA. Are there 2-3 too many Romanian averages? As said before, probably, but imagine how it would look like if we only had 3 Romanian averages? That's not in the international or usefulness interest.
I didn't say that yet how Transylvanian Hungarians have an inclination from the Hungarian national average toward Moldavians and Romanians. Just look at the PCA, they are in-between Hungarian and Moldavian national averages, as well as Hungarian national average and northern Romanian regional averages.
You want to lower the number of Romanian averages to a minimum, while Ion wants to keep their number to a maximum. I'm proposing an intermediate solution. All happy.
I am for more concise database, I find it more useful than seeing one after the other in the distance calculator regions that are basically telling me the same thing and are related with one another. The same type of significant reduction should be applied to Spain, Greece, Italy without doubt. It's just better optics and better usage of space. But this is just my personal suggestion.
Is it possible to create Siberian Russian average?
And I have a question why is Bulgaria_Northeastern average is more southern, than Bulgarian_Southeastern, Bulgarian_Southcsntral and all other Bulgarian averages?
It isn't, unless you guys are not using my averages.
<google-sheets-html-origin style="color: rgb(0, 0, 0); font-size: medium;"></google-sheets-html-origin>
Name Number of Samples N_Atlantic Baltic West_Med West_Asian East_Med Red_Sea South_Asian East_Asian Siberian Amerindian Oceanian NE_African Sub-Saharan NE+NW Euro Med Euro Caucasus SW Asia Siberia India Bulgaria_Northeastern 9 22.15 22.37 17.46 13.43 19.44 2.56 0.46 0.32 0.49 0.24 0.29 0.11 0.10 44.52 36.90 13.43 2.67 1.06 0.46 Bulgaria_Northcentral 15 21.30 24.45 17.41 11.88 19.86 2.33 0.65 0.28 0.99 0.37 0.28 0.07 0.14 45.75 37.27 11.88 2.40 1.63 0.65 Bulgaria_Northwestern 27 22.04 26.19 17.58 10.15 19.05 2.31 0.40 0.52 0.70 0.51 0.38 0.07 0.09 48.24 36.63 10.15 2.38 1.73 0.40 Bulgaria_Southeastern 12 20.13 21.81 18.44 13.87 20.44 2.92 0.43 0.30 0.48 0.44 0.47 0.11 0.07 41.95 38.88 13.87 3.03 1.21 0.43 Bulgaria_Southcentral 33 21.27 23.54 18.13 11.65 20.17 2.46 0.60 0.36 0.90 0.35 0.32 0.21 0.04 44.81 38.31 11.65 2.67 1.60 0.60 Bulgaria_Southwestern 31 22.81 24.21 17.86 11.56 18.51 2.23 0.59 0.54 0.85 0.39 0.33 0.07 0.06 47.02 36.37 11.56 2.30 1.78 0.59 Bulgaria_average 127 21.74 24.31 17.91 11.52 19.38 2.40 0.53 0.44 0.80 0.41 0.36 0.12 0.07 46.04 37.29 11.52 2.52 1.65 0.53
Why are we even having this conversation? The whole point of the updated spreadsheet is to have regional averages, sure not in the range of 15-20 as in the past, but up to 10 depending on the size of the country is pretty reasonable.
Otherwise, we might as well delete it and return to the original spreadsheet with 1-3 averages, depending on the country.
I think that some of you are arguing because you are bored.
I'm expressing my opinion, which was from the beginning, that broader regions and national (cross-border where it's the case) averages are more interesting, than small region averages. At the end it's not me who decides, but I can make suggestions when I see it fit and substantiated.
For Dodecadk12b:
Code:Iran_Arab,18.52,0.29,2.61,0.21,6.69,3.63,4.18,3.35,23.35,0.45,34.63,1.87
To you maybe is, to others, it's not - but it depends on what you want to do. If it is needed to edit the source for required distance, multiple, and PCA.
In other words, again, we should have a general and well-detailed database source on Vahaduo which would be an intermediate between the concise minimum and sub-regional maximum.
@Lucas, can you create a sticky thread that will be easily accessible for others with spreadsheets of a minimum number, which is roughly 1 national average + 1-3 regional averages depending on the country, and of a maximum number, which is all these regional and sub-regional averages we are speaking about? All happy.
Fine for me, but Northcentral and Southcentral should be merged forming Bulgaria_Central (look at PCA) and so 5/6 regional averages on Vahaduo while 6/6 on the maximum spreadsheet?