Our own tiering system: Empirical Tiers - Page 4

Page 4 of 14 FirstFirst ... 23456 ... LastLast
Results 46 to 60 of 204
Like Tree3Likes

Thread: Our own tiering system: Empirical Tiers

  1. #46
    Techno Pussy Fawkes.'s Avatar
    Join Date
    Dec 2010
    Gender
    Uncertain
    Location
    Scotland
    Posts
    1,393
    Trophies

    Default Re: Our own tiering system: Empirical Tiers

    shouldn't there be an NFE tier?

  2. #47
    Out of the sky Vaexa's Avatar
    Join Date
    Nov 2009
    Gender
    Male
    Location
    Europe's drain hole
    Posts
    10,289
    Blog Entries
    135
    Follow Vaexa On Twitter

    Default Re: Our own tiering system: Empirical Tiers

    If you guys have some sort of preliminary tier list along with the Pokémon in them, I could try implementing them on the beta PO server. Just letting y'all know.
    Everything is possible in the game of life.

  3. #48
    is obsessed with Noivern! Zekurom's Avatar
    Join Date
    May 2010
    Gender
    Male
    Posts
    5,658
    Blog Entries
    108

    Default Re: Our own tiering system: Empirical Tiers

    Quote Originally Posted by Fawkes. View Post
    shouldn't there be an NFE tier?
    NFE is more of an orthogonal classification rather than a ranking.

    For that matter, some Pokémon might be in OU even if they're not fully evolved. We can't pass up that chance.
    The word "quadragonal" is the only word with "dragon" in it where "dragon" is not a root word. That makes it awesome.

  4. #49
    ._. Synthesis's Avatar
    Join Date
    Aug 2009
    Gender
    Male
    Location
    Ireland
    Posts
    8,119
    Blog Entries
    226

    Default Re: Our own tiering system: Empirical Tiers

    Quote Originally Posted by Gi-gi-gi-giaru! View Post
    NFE is more of an orthogonal classification rather than a ranking.

    For that matter, some Pokémon might be in OU even if they're not fully evolved. We can't pass up that chance.
    Yay evo stone chansey and dusclops.

  5. #50
    Lighting Things on Fire Sarcastically Insane's Avatar
    Join Date
    May 2009
    Gender
    Male
    Location
    EST time zone
    Posts
    1,728
    Blog Entries
    268

    Default Re: Our own tiering system: Empirical Tiers

    Also, Hippopatas and Snover are probably going to be used in tiers where Abomasnow and Hippowdon are banned. Same for DW Vulpix, possibly.
    Currently writing Hoenn Wars, a Travelsverse fic that needs no prior Travelsverse knowledge to understand. Chapter Seven, Sacrifice, is up.



    That's interesting; I might have a look when I have the time. Thanks!

    EDIT: Oh my god, these are too many links! Very specific...
    ^Someone's first impression of TVTropes.

  6. #51
    Registered User The Outrage's Avatar
    Join Date
    May 2007
    Gender
    Male
    Posts
    12,171
    Blog Entries
    951

    Default Re: Our own tiering system: Empirical Tiers

    Quote Originally Posted by Gi-gi-gi-giaru! View Post
    For that matter, some Pokémon might be in OU even if they're not fully evolved. We can't pass up that chance.
    There's definitely going to be more NFE usage with the pre-evolution stone, particularly for defensively oriented Pokémon where the pre-evolution may actually wind up having a better defensive spread.

    I know some people have an NFE tier, but even amongst the NFE some Pokémon are clearly better than the rest.

    Since we're splitting the tiers based solely on usage if an NFE gets used enough then it should appear on the system.

    However there may be NFE's that would likely never get used (like Pichu) so for simplicity's sake I think for the most part, unless the NFE's stat spread is significantly different (i.e. Scyther/Porygon2-->Scizor/Porygon-Z) they shouldn't appear in the tier (unless we want like 100 Pokémon at the lowest usage tier) but if things like Chansey get used enough due to pre-evolution stone, they should appear.
    Quote Originally Posted by Illustrious View Post
    If you guys have some sort of preliminary tier list along with the Pokémon in them, I could try implementing them on the beta PO server. Just letting y'all know.
    Well, if its based on usage I was thinking that the PO server could keep stats on what Pokemon are used (Shoddy did that so I'm wondering if PO can too) the most in the early weeks of BW and the tier listing will form out of that, basically we encourage a free-for-all.

    EDIT: Just a general question, the way we talk it seems like there's a consensus in the tier system proposed by Gi-gi-gi-giaru (except on the exact number of tiers) so are we going with that? Just posting to clarify that I do second that idea.

    EDIT 2: As for forming the initial tiers, would we be able to make a probability distribution graph from the Pokémon's usage and decide the cut-off points from that? I mean it would let us see which Pokémon's usage are above and below average.
    Last edited by The Outrage; 7th February 2011 at 07:25 PM.

  7. #52
    is obsessed with Noivern! Zekurom's Avatar
    Join Date
    May 2010
    Gender
    Male
    Posts
    5,658
    Blog Entries
    108

    Default Re: Our own tiering system: Empirical Tiers

    Quote Originally Posted by Outrage View Post
    EDIT: Just a general question, the way we talk it seems like there's a consensus in the tier system proposed by Gi-gi-gi-giaru (except on the exact number of tiers) so are we going with that? Just posting to clarify that I do second that idea.
    I noticed that too, but never got a chance to comment on it

    EDIT 2: As for forming the initial tiers, would we be able to make a probability distribution graph from the Pokémon's usage and decide the cut-off points from that? I mean it would let us see which Pokémon's usage are above and below average.
    We could do something that, yes. The cutoff points for 2 and 1/2 are pretty arbitrary. I'd need to see actual relative representation statistics before I can actually decide on anything.

    Also, a probability distribution graph? You mean like a histogram? I don't see how that comes into the picture.
    Last edited by Zekurom; 7th February 2011 at 07:54 PM.
    The word "quadragonal" is the only word with "dragon" in it where "dragon" is not a root word. That makes it awesome.

  8. #53
    Registered User The Outrage's Avatar
    Join Date
    May 2007
    Gender
    Male
    Posts
    12,171
    Blog Entries
    951

    Default Re: Our own tiering system: Empirical Tiers

    Quote Originally Posted by Gi-gi-gi-giaru! View Post
    I noticed that too, but never got a chance to comment on it



    We could do something that, yes. The cutoff points for 2 and 1/2 are pretty arbitrary. I'd need to see actual relative representation statistics before I can actually decide on anything.

    Also, a probability distribution graph? You mean like a histogram? I don't see how that comes into the picture.
    Well, like a normal distribution and I just mentioned it because its easier for me to visualize it on a graph. So basically, I was thinking we do what you said and start of with one tier (well, technically none) that would be +/- 0.5 SD from the mean score, and every 0.5 SD from the mean is another tier. So -0.5 to 0.5 is the middle tier. 0.5 to 1.0 is an upper tier, and -0.5 to -1.0 is a lower tier. But yeah, basically just arbitrary numbers right now since we don't have any data.

  9. #54
    is obsessed with Noivern! Zekurom's Avatar
    Join Date
    May 2010
    Gender
    Male
    Posts
    5,658
    Blog Entries
    108

    Default Re: Our own tiering system: Empirical Tiers

    Quote Originally Posted by Outrage View Post
    Well, like a normal distribution and I just mentioned it because its easier for me to visualize it on a graph. So basically, I was thinking we do what you said and start of with one tier (well, technically none) that would be +/- 0.5 SD from the mean score, and every 0.5 SD from the mean is another tier. So -0.5 to 0.5 is the middle tier. 0.5 to 1.0 is an upper tier, and -0.5 to -1.0 is a lower tier. But yeah, basically just arbitrary numbers right now since we don't have any data.
    I'm sure the histogram for r.r. within a tier would look like anything but a normal distribution, really...

    Do you mean you want the tiers to show a normal distribution? That would at least make sense. But what do you mean by the "mean score"? If you're talking mean relative representation for all Pokémon, that graph is going to be horribly skewed toward the upper side.

    *edit* Okay, I just realized what you're talking about - you want to form a priori tiers based on existing data.

    Uh... might not be the best idea. I'm pretty sure it's not going to follow the normal distribution that you want. But we can try compiling existing usage data and seeing what we get.

    Although it might be better than just putting everything in the middle tier at first and letting everybody run wild with it.
    Last edited by Zekurom; 7th February 2011 at 10:30 PM.
    The word "quadragonal" is the only word with "dragon" in it where "dragon" is not a root word. That makes it awesome.

  10. #55
    now's the time to shine coolking503's Avatar
    Join Date
    Jul 2009
    Gender
    Male
    Location
    India
    Posts
    1,124
    Blog Entries
    4

    Default Re: Our own tiering system: Empirical Tiers

    Quote Originally Posted by evkl View Post
    Coolking, how do people create a better, more balanced metagame than raw usage statistics? As I said in my initial post, my goal is to have no Poke represented on more than 20% of teams. But at the same time just drawing lines in the sand with "smart people" choosing what stays and what goes is basically what Smogon has done for, oh forever. I don't, frankly, trust the people to get it right. The game is complex and there are lots of people who do weird things and there are even more unintended consequences.

    As for your nightmare scenario, I don't see a problem with a metagame without Scizor or something. If Stall gets super popular, there are going to be a handful of Pokemon that stall super-well. They'll get banned, and then Pokemon that can power through the remaining, not-so-good stall teams will be utilized. And we will restore balance to the Force.
    Is a metagame with nothing being used over 20% balanced? Yes, of course. But will that ever happen? probably not. Why not? Because of what you just said. Stall gets super good, and pokemon like Blissey and Skarmory go Uber. Now what is going to counter Gengar and Alakazam? Well, either every team carries a snorlax or cresselia or losses to one of them, considering a decent player can get rid of most of their other counters like scizor easily (oh wait, scizor is banned). Eventually we will end up banning so many things that aren't broken, but still are strong enough to be used.

    Again, let me ask you, do you think scizor was broken in gen 4? Did anyone think scizor was broken in gen 4? It had plenty of counters, and was an anti-metagame pokemon. So what if it was used 30% of all teams? Does that make it Uber? I don't think so. I think most competitive battlers would agree that Scizor isn't overpowering OU. so then why make it Uber?

    Here is the funny thing about this. I think we are actually gunning for the same thing but in different ways. I suggested we make our "OU" metagame consist of "Ubers". In gen 4, scizor had enough usage in the Uber tier to be an "uber" via my classification of >5% usage. As such it would banned from UU. We would only have 27 (28 if you count manaphy sitting at 4.99%) OU pokemon, and the rest would be UU, at least to start. (Keep in mind Arceus wasn't allowed at the time of these stats.) Salamence would be UU, but Metagross and Kingdra would be OU. Now say Salamence rips apart a metagame without scizor. We have a suspect test and it gets banned through super majority, and is placed in OU. Say Kyogre is broken due to the auto Drizzle it gives. We simply vote on it and if people think it is broken it is banned. In your system it takes two months, which isnt a long time, but ends up in the same situation.

    People are the ones playing in the metagame. Not computers. We are trying to make a metagames which is fun, not mathematically ideal.

    Read the underlined part of your quote. You don't trust people to get it right? Who is playing the game? So what if "people" don't get it right? People are playing the game. Also, its ironic because "people" are the ones who make up the usage stats. Yes, this is slightly contradictory, but I agree "there are lots of people who do weird things and there are even more unintended consequences." And this is another problem. What of these weird things mess up the usage stats and magikarp gets banned? That is a perfect example of what that quote implies. Weird people and unintended consequences. In my scenario, weird people wont be able to get to the criteria which they need to make a decision. Unintended consequences wont happen because this is a system of Checks and balances. Your system has none of that. Mine has usage stats and People, which means you cannot abuse one of the two and get unintended consequences. Yours is easily manipulated, especially considering the amount of people who are likely to play. I'm on the server right now. In fact, I'm the only person on the server right now. If I wanted to I could get an alt on the server and battle myself a hundred times, and all of a sudden Magikarp, feebas, weedle, caterpie, starly, and tyrouge are banned (note that if this actually happens I may be the inspiration but I didn't actually do this). My system wont have that happen.

    I agree, the game is complex, but who is likely to understand the complexity? the average person with the average rating of under 1000 or a person with a rating of above 1300 (or whatever the req is).

    Yes Smogon having Smart People choose what happens is what they have been doing. They really managed to screw things up with the council, and the other similar things that happened with Latias and the late stage 3 tests. However there is a reason Barrack Obama is president and not Joe the Plumber. Obama is much more likely to make the right decision than your average Joe. Thats why we voted him or any other political candidate, and your average Plumber or whatever.

    Do you guys get the idea? Sticking to usage stats will never work. You need people to compliment the system, which the current one doesn't have. Checks and balances are needed. The system needs major reformatting. Consider it seriously.
    Lord Clowncrete likes this.
    If we are having a battle and I disconnect, it is 100% unintentional; my power goes off often and each time my modem resets.

    Gone until Aug, 10 or something for a trip w/out net access. C u after that. If I suddenly dissapear after that you can bet I got bad grades on my IGCSE's, since results are coming out the week afterwards.

  11. #56
    is obsessed with Noivern! Zekurom's Avatar
    Join Date
    May 2010
    Gender
    Male
    Posts
    5,658
    Blog Entries
    108

    Default Re: Our own tiering system: Empirical Tiers

    Wait, so what's your opinion on my promotion/demotion system again, coolking? Because you said that "one of the methods that would work" is the 7-tier method, which I assume meant the one I proposed. But it still relies completely on usage stats.

    Perhaps we can put a weight factor into mine so that players with a higher ladder rating count more in the usage stats. But then the calculations become immensely more difficult. (Primarily, what the weight curve should be.)
    Last edited by Zekurom; 7th February 2011 at 10:46 PM.
    The word "quadragonal" is the only word with "dragon" in it where "dragon" is not a root word. That makes it awesome.

  12. #57
    Can I get an encore? evkl's Avatar Vice-Webmaster
    Join Date
    Dec 2002
    Gender
    Male
    Posts
    9,629
    Blog Entries
    68

    Default Re: Our own tiering system: Empirical Tiers

    I don't think we should include a weight factor. The formulae, whatever we choose, should be simple and easy to understand.

    But more to the point, I fail to see how we won't keep everything under 20% by fiat. If it's the rule, it's the rule, and we'll ban stuff until we get to that distribution. From any tier, in any system. And we can quibble about whether it should be 20% or 22% or 25% or 15%, but to me it seems quite easy to put a line in the sand, say "we will [ban/promote up] a Pokemon that gets over X% activity in a month," and just do it. I really think you're taking an entirely too narrow view of the metagame if you think it will constantly centralize around one strategy and one set of Pokemon.
    "And we're not gods, we're just hacks."

    blog | twitter | Bulbagraphic

  13. #58
    is obsessed with Noivern! Zekurom's Avatar
    Join Date
    May 2010
    Gender
    Male
    Posts
    5,658
    Blog Entries
    108

    Default Re: Our own tiering system: Empirical Tiers

    Quote Originally Posted by evkl View Post
    I don't think we should include a weight factor. The formulae, whatever we choose, should be simple and easy to understand.

    But more to the point, I fail to see how we won't keep everything under 20% by fiat. If it's the rule, it's the rule, and we'll ban stuff until we get to that distribution. From any tier, in any system. And we can quibble about whether it should be 20% or 22% or 25% or 15%, but to me it seems quite easy to put a line in the sand, say "we will [ban/promote up] a Pokemon that gets over X% activity in a month," and just do it. I really think you're taking an entirely too narrow view of the metagame if you think it will constantly centralize around one strategy and one set of Pokemon.
    I think that what coolking is saying is that drawing a line in the sand and declaring that as the standard is easy to do, but very hard to justify.

    What is the basis of your saying that "a Pokémon that's used in 1 in 5 teams is being used too much"? Is there any mathematical reason? Because this tier game is all about mathematics. You have to get the system designed correctly in order for it to work fairly, and that means designing an appropriate algorithm, not just applying an arbitrary limit and having it at that.

    My system depends on comparisons to the average. It's mathematically provable for my system that the more evenly spread out the representation of each Pokémon in a tier, the smaller the total of all the relative representations. But it does not say that 20% usage (a r.r. of 25%) is the limit at which we'll bump up a Pokémon to the next tier. Granted, my proposal of "twice the fair share" is pretty arbitrary too, but at least it depends on actual data - mainly, the average expected representation. Meanwhile, the limit of 20% will grow weaker and weaker as the number of Pokémon increases - because, after all, there's diversity in numbers.

    That being said, it means that you can keep everything under 20% by fiat, but it won't be a good system.
    Last edited by Zekurom; 8th February 2011 at 12:17 AM.
    The word "quadragonal" is the only word with "dragon" in it where "dragon" is not a root word. That makes it awesome.

  14. #59
    Can I get an encore? evkl's Avatar Vice-Webmaster
    Join Date
    Dec 2002
    Gender
    Male
    Posts
    9,629
    Blog Entries
    68

    Default Re: Our own tiering system: Empirical Tiers

    The point I was trying to make is that at some point it's going to be a line in the sand, but I'd rather have the line in the sand be automated (20% usage in OU, RR of 25% in a tier, whatever) than have people picking and choosing what stays and what goes to create artificial "balance".
    "And we're not gods, we're just hacks."

    blog | twitter | Bulbagraphic

  15. #60
    now's the time to shine coolking503's Avatar
    Join Date
    Jul 2009
    Gender
    Male
    Location
    India
    Posts
    1,124
    Blog Entries
    4

    Default Re: Our own tiering system: Empirical Tiers

    Quote Originally Posted by Gi-gi-gi-giaru! View Post
    Wait, so what's your opinion on my promotion/demotion system again, coolking? Because you said that "one of the methods that would work" is the 7-tier method, which I assume meant the one I proposed. But it still relies completely on usage stats.

    Perhaps we can put a weight factor into mine so that players with a higher ladder rating count more in the usage stats. But then the calculations become immensely more difficult. (Primarily, what the weight curve should be.)
    I remember the weight factor being talked about on Smogon too, and while I think its a good idea, I'm not sure of how plausible it is. that being said, it would make the banning thing more of a formula where if a pokemon ends up with too high of a rating it is banned. This is a bit too complex ad I honestly think it should be simpler than this. Another problem is, say I get a 1500 rating and the next highest rating is 1300. Needless to say I must have battled alot. Also, because I battled over 500 times AND my team stayed the same AND my rating is so high, whatever pokemon I use will be nearly automatically deemed Uber. Even if I use a Delibird and win with the other 5 pokemon I still would be getting into OU at least.

    I think the 7 tier system itself is fine, in the fact that right now 5 tiers isn't really enough. I would think we need a BL2 for NU, a tier under NU, and even a BL3 for that, considering the amount of pokemon we have now. However, I dont think we should theorymon pokemon into tiers. I think they should be based on usage and work our way down.

    Quote Originally Posted by evkl View Post
    But more to the point, I fail to see how we won't keep everything under 20% by fiat. If it's the rule, it's the rule, and we'll ban stuff until we get to that distribution. From any tier, in any system. And we can quibble about whether it should be 20% or 22% or 25% or 15%, but to me it seems quite easy to put a line in the sand, say "we will [ban/promote up] a Pokemon that gets over X% activity in a month," and just do it. I really think you're taking an entirely too narrow view of the metagame if you think it will constantly centralize around one strategy and one set of Pokemon.
    I'm not debating where the line should be. I'm saying that if in some crazy case where 40% of the people are using doryuuzu, but people think that metagame is ideal, fair enough, why not play it? This is unlikely to happen though.

    You seem to think balance=fun. Is that true? Here is a question for you:

    What is the definition of an ideal metagame?

    Before we lay a foundation like we are doing now, we have to ask and answer this question and move towards this.

    I would say an ideal metagame is one in which it is completely skill based, and where the person who plays better and has the better team will always win. A metagame where there is no one pokemon which has no counters or ways of prevention, or requires more than one pokemon to stop consistently. There should also be multiple pokemon capable of stopping this one pokemon, if there is one. Where luck does not play a part, or a large part, and skill is much more important. Also, I would want a metagame which has as many pokemon viable as possible.

    Quote Originally Posted by Gi-gi-gi-giaru! View Post
    I think that what coolking is saying is that drawing a line in the sand and declaring that as the standard is easy to do, but very hard to justify.
    Not exactly, but well put.

    Quote Originally Posted by Gi-gi-gi-giaru! View Post
    My system depends on comparisons to the average. It's mathematically provable for my system that the more evenly spread out the representation of each Pokémon in a tier, the smaller the total of all the relative representations. But it does not say that 20% usage (a r.r. of 25%) is the limit at which we'll bump up a Pokémon to the next tier. Granted, my proposal of "twice the fair share" is pretty arbitrary too, but at least it depends on actual data - mainly, the average expected representation. Meanwhile, the limit of 20% will grow weaker and weaker as the number of Pokémon increases - because, after all, there's diversity in numbers.
    I'm going to have to re-look at your proposal since I must have missed the "twice the fair share" part.

    Quote Originally Posted by evkl View Post
    The point I was trying to make is that at some point it's going to be a line in the sand, but I'd rather have the line in the sand be automated (20% usage in OU, RR of 25% in a tier, whatever) than have people picking and choosing what stays and what goes to create artificial "balance".
    Evkl, I have a few questions for you:

    1. Would you say that the current gen4 OU metagame is balanced?
    2. Would you say that the current gen4 OU metagame is fun?
    3. Do you think Scizor and heatran are broken in gen4 OU and should be Uber?
    4. (slightly unrelated) Do you think Wobbufett and Wynaut are broken and should be made Uber regardless of BST.

    My answers would be yes, yes, no, and maybe, but we should test it. The only change I may want to see in gen4 is manaphy being added in, and even then I'm not sure.

    In other words, I think people picking made a pretty good balance. Artificial, yes, but the reason I ask these questions are because according to the system we will be following, the Uber tier will likely end up looking like this:

    Mewtwo
    Mew
    Ho-oh
    Lugia
    Latias
    Latios
    Kyogre
    Groudon
    Rayquaza
    Deoxys (all forms)
    Dialga
    Palkia
    Giratina
    Giritina-O
    Darkrai
    Arceus
    Manaphy
    Shaymin-s
    Salamence
    Wobbufett
    Garchomp
    Heatran
    Tyranitar
    Scizor

    Are the last three pokemon really Uber? Does being used more mean a pokemon is Uber? What Overpowers a metagame?

    According to you, it is anything used on more than 1/5 or some other number mount of teams. I disagree. In fact, I like Smogon's portrait of an Uber, or rather, the way they got to it. Why is Reyquaza an Uber, despite the fact that it has counters in OU? this one Smogon definitely got right. So why change it?
    Lord Clowncrete likes this.
    If we are having a battle and I disconnect, it is 100% unintentional; my power goes off often and each time my modem resets.

    Gone until Aug, 10 or something for a trip w/out net access. C u after that. If I suddenly dissapear after that you can bet I got bad grades on my IGCSE's, since results are coming out the week afterwards.

Page 4 of 14 FirstFirst ... 23456 ... LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •